Practical Corpus Linguistics – Online Materials & Resources

This page contains links to the online materials/exercises accompanying my textbook Practical Corpus Linguistics. In future, I’m also planning to add links to some of the relevant resources, such as concordance programs, web-interfaces to generally accessible corpora, etc.

In addition, to keep the textbook up-to-date even if some of the resources originally described there may change, revised information containing the most recent changes to program interfaces, latest program versions, etc., will be posted here.

Online Exercises

  1. Understanding Encoding: Character Sets
  2. Understanding File Formats & Their Properties
  3. Cleaning Written Data
  4. Concordancing
  5. Regular Expressions
  6. Understanding Units in Texts



Luckily, over quite a few years, nobody has actually reported any errata in the book to me, apart from yesterday (Wed 06-Jul-2022), when one of my students pointed out that on p. 176, just prior to Exercise 66, I wrote “[...], where we investigate potential differences in the use of positions in economics texts.”, where of course it should read ‘prepositions’.

Use of Editors on Different Operating Systems

Fri 13-Mar-2020 11:35:08: I recently discovered that Komodo Edit is now available for all Operating Systems discussed in the book. While I would still recommend using Notepad++ on Windows, if you should work on Mac and/or Linux systems, I would recommend using this editor there in favour over any of the other options I discuss.

The New BYU Interface

In May 2016, just about 3 months after the publication of the book, the BYU corpora interface underwent a rather drastic change, partly to make it more user-friendly for mobile phones. In the following, I shall try to summarise these changes inasmuch as they affect the content of the descriptions in the book, so as to allow readers to carry work through the exercises in the book using the new interface, rather than having to resort to switching to the old one.

The BYU interface is used in various places throughout the book, mainly as an interface to COCA, but also to carry out comparisons between COCA and the BNC, starting from section 8.2 (p. 132). Instead of the original frame-based display depicted in Fig. 8.4 (p. 133), the interface now has one basic window, as shown below.

Once you run a query by clicking on the Find matching strings button, you are taken to the FREQUENCY ‘tab’, which essentially looks like the top right-hand frame in the original figure in the book. Selecting the desired results from the frequency list and clicking on the CONTEXT button will then produce the output from the bottom right-hand frame, only this time on the CONTEXT ‘tab’.

The search syntax has also been changed extensively, most notably removing most of the square bracket options, and introducing some abbreviations.

More to come soon...

Notes (most recent ones first)

  1. 07-Jul-2016 14:40:52: Mark Davies has recently (May 2016) changed the BYU interfaces to a new design without frames, as well as introducing some other changes. The general idea behind this is to make everything more user-friendly, but, sadly for us, the queries described in the book will now frequently no longer work in exactly the same way. To be able to do all the exercises based on the descriptions in the book, you will currently need to access the old BYU/COCA interface. I will try to put together some information about how to work with the new interface here within the next few months, so please be patient in the meantime...
  2. Please note that the corpora website referred to on page 21 is now no longer maintained by David Lee, but that I have taken over administration and maintenance in January 2016. The short URL still works, though, but if you have trouble in accessing this, you can also use/bookmark the full address.