First-Generation English Corpora & Their ‘Diachronic’ Counterparts

First-generation corpora are almost exclusively written, and generally modelled on the Brown corpus. Their size is usually 1 million words, which is relatively small by today’s standards. The Brown corpus essntially set the standard for these early corpora, which is why the are sometimes also referred to as the ‘Brown family’ of corpora. The Brown and LOB corpora were also, or are still being, complemented by corpora of the same design and varieties by corpora from time periods either after or prior to the original versions.
The Brown Corpus 1 million words (500 samples, each about 2,000 words of continuous written American English, from texts published in the US in 1961 (completed in 1964). The corpus manual is available at ICAME. Accessible through CQPweb, and downloadable from various sources, including the NLTK data page.
FROWN (Freiburg-Brown Corpus of American English) 1990s analogue to the Brown corpus.
AmE06 Corpus 2006 analogue to the Brown Corpus. The corpus is available via CQPweb.
B-BROWN 1931 analogue to the Brown Corpus. Apparently, not released yet.
The Lancaster-Oslo/Bergen (LOB) Corpus 1 million words (same design as Brown), texts published in 1961, British English, compiled between 1970 and 1978. Description/manual (ICAME).
FLOB (Freiburg-LOB Corpus of British English) 1990s analogue to the LOB.
BE06 Corpus 2006 analogue to LOB.
BLOB-1931 Corpus 1931 analogue to the FLOB. Accessible through CQPweb.
The Kolhapur Corpus of Indian English 1978, Indian English (same design as Brown). Manual (ICAME).
The Wellington Corpus of Written New Zealand English Most of the material from texts published in 1986 or 1987, but covers the years 1986-1990. Based on same design as Brown/LOB, but some modifications. Manual of information (ICAME).
The Australian Corpus of English (ACE) Material from 1986. Based on same design as Brown/LOB, but some modifications. Manual of information (ICAME).
The Corpus of English-Canadian Writing Project at Queens University in Kingston, Ontario. Same design as Brown/LOB (with the addition of the categories of feminism and computing). A textbank of 3 million words of Canadian English from magazines, books, and newspapers, gathered beginning in 1984, and representing a wide variety of genre categories in common with the LOB and Brown corpora (plus "Feminism" and "Computing").
For more information, contact Margery Fee, Strathy Language Unit, 207 Stuart Street, Room 316, Rideau Building, Queen’s University, Kingston, Ontario, Canada K7L 3N6; email: feem@qucdn.bitnet
The London-Lund Corpus (LLC) Spoken British English recorded from 1953 to 1987; prosodically transcribed spoken part of SEU corpus (87 texts: 435,000 words) plus 13 more texts = total of c. 500K words (510,576). Description (ICAME).

Many of the above corpora are available on the ICAME CD-ROM.

Any comments? Let me know.