Pre-electronic Corpora

Corpus-based research is often assumed to have begun in the early 1960s with the availability of electronic, machine-readable corpora. However, before then, there was a considerable tradition of corpus-based linguistic analysis of various kinds occurring in five main fields of scholarship. (Kennedy 1998: 13)

For a historical dictionary database, have a look at Ian Lancashire’s work on An Early Modern English Dictionaries Corpus 1499-1659.

Biblical and literary studies

Alexander Cruden’s Concordance of the Authorized Version of the Bible (1736)

Lexicography

Samuel Johnson’s Dictionary of the English Language (Restricted Access)
– 150,000 illustrative citations
web version of the dictionary now available (Restricted Access)

The Oxford English Dictionary (1928)
– 5 million citations totalling c.50 million words to illustrate the meanings and uses of the 414,825 entries.

Noah Webster’s An American Dictionary of the English Language (1828)
1st edition of Merriam-Webster is now available on the web.
Merriam-Webster’s 3rd edition: a corpus of over 10 million citation slips.
Online access to the dictionary.

Dialect studies

The English Dialect Dictionary (Wright, 1898–1905)

The Existing Phonology of English Dialects (Ellis, 1889)

Language education studies

Thorndike, E.L. (1921). Teachers Workbook.
– a corpus of 4.5 million words from 41 different sources to make a word frequency list.

Thorndike & Lorge (1944). The Teacher’s Workbook of 30,000 Words.
– 18 million words from a wider range of textual sources.

Grammatical studies

Jepersen, Otto. (1909-49). A Modern English Grammar on Historical Principles. I–VII.

Kruisinga, E. (1931–32). A Handbook of Present-Day English.

Poutsma, H. (1926–29). A Grammar of Late Modern English.

Fries, C.C. (1940). American English Grammar.

Fries, C.C. (1952). The Structure of English.
a 250,000-word corpus of recorded telephone conversations.

The Survey of English Usage (SEU) Corpus (Quirk, 1968)
- 1-million-word corpus
- 50% written/50% spoken
- spoken part: published separately in electronic form in the 1980s as the London-Lund Corpus
- Description of the Survey Corpus is here.

Did you find this useful? Do let me know, if you want to encourage me to keep updating the site.