Bookmarks for Corpus-based Linguists

CBL & Lg Teaching, CALL, Vocab | On-line Grammars | Miscellany | Other Bookmark Sites| [Bookmarks HOME]

Teaching & Miscellaneous Links

Corpus-based Language Teaching, CALL, Vocab Research, etc.

(see also References/Journals section)

On-line Resources for Classroom Concordancing, Text profiling, etc.

Compleat Lexical Tutor
(by Tom Cobb)

Web-based suite of tools for data-driving self-learning (mainly for vocabulary); past any text of interest and get a self-teaching text linked to speech, dictionary, concordance, and self-test resources. Tools include a concordancer, a phrase (n-gram) extractor, VocabProfile (tells you how many words in the text come from the following four frequency levels: (1) the list of the most frequent 1000 word families, (2) the second 1000, (3) the Academic Word List, and (4) words that do not appear on the other lists), a vocab-level-based cloze passage generator and a traditional nth-word cloze builder. [alt. URL]


Freeware word profiling program (for Windows and Macintosh OS X); works like Paul Nation’s Range program.

Virtual Language Centre
(PolyU Hong Kong)

Useful site for ESL/EFL learners; has a web concordancer (various corpora including Brown, LOB and learner writing samples), web-based word-frequency profiler, text-to-speech engine, and instant-lookup dictionary (with bilingual English-Chinese definitions for many words in the lexicon). Also has language notes, tests, exercises, cloze passages for English.

Turbo Lingo

Similiar to the Compleat Lexical Tutor (above). Useful for generating concordances and frequency lists of web pages. Enter a web page or copy and paste a text into a textbox, and then choose to generate a concordance, frequency list, phonotactics, etc.

* For more on-line corpora and web-concordancing in the classroom, go to my list of Free, web-accessible corpora (on my "Corpora" page). For word frequencies, try the BNC Frequency lists (or see other listings here). For frequently recurring phrases (ngrams) see Constantin Orasan’s site, where you can list or compare (side-by-side) frequency lists and ngrams (recurring phrases) for various subcorpora/text genres (including business texts). It’s not a user-friendly/well-designed web page as it currently stands, but the functionality is very useful for teachers and learners looking for frequently occurring words and phrases.

Phraseological/Vocabulary-related Tools and Bibliographies


The most accessible site for students (and most pedagogically useful) because it straightaway gives you a list of collocations for your search word/phrase, instead of concordances; results are categorized by POS-based patterns and by approximate sense clusters, and graph bars give an indication of how common each combination is. Results are based on a 80K-word subset of the BNC.

Variation In English Words And Phrases (VIEW)

Web site that allows word-, phrase- or part-of-speech-based searches of the BNC with genre-restrictions and can list collocations. Requires registration (free) after about 20 searches.

Phrases in English (PIE)

Web site that incorporates a database of all 1-6-grams (phrases 1 to 6 "words" long) with part-of-speech (POS) codes occurring three or more times in the 100-million-word British National Corpus (BNC). You can explore English phraseology either through lists of forms and their frequencies or by searching for specific forms or collocations, e.g. 2-grams of the pattern "ADJ work", to find the most frequent adjectives describing work. PIE also offers a phrase pattern discovery tool, "phrase-frames": sets of variants of an n-gram identical except for one word (wildcard symbol *), e.g., "the * of the", with variants such "the end of the", "the rest of the", "the top of the", "the nature of the". Over the next year PIE will add: (i) Click on an n-gram in the query results to see concordances from the BNC (ii) POS-grams and POS-frames for studying the relative productivity of phrase structures (iii) Filtering by text type (domain, genre, target audience) for contrastive studies (iv) Query by regular expression (currently only wildcards are supported)


An on-line, searchable collection of verb-noun collocations to aid students of writing; rather limited in scope.


Aims to help learners increase their vocabulary and find the right words while writing (shows examples from corpora, links up with dictionary definitions and word relationships, collocations, etc.)

Vocabulary analysis software

Paul Nation’s lexical analysis suite of programs

Vocabulary Acquisition Research Group Archive (VARGA)

Headed by Paul Meara and Nuria Lorenzo-Dus at Swansea; includes a bibliographical database covering work in the area of vocabulary acquisition in a second language

Second Language Vocabulary Resources page

Rob Waring’s pages. Includes a bibliography of second language acquisition and learning (overlaps with the VARGA database above, but covers a bit more).

* See also the links to word lists and frequency lists in my section on Software, Tools, Frequency Lists (click on the left menu)

Teaching Grammar: On-line Courses

The Internet Grammar of English (IGE)

An on-line pedagogical reference grammar written primarily for university undergraduates
(based at The Survey of English Usage, University College London)

Chemnitz Internet Grammar

A hypertext learning environment for exploring some aspects of English Grammar which often prove troublesome for German learners. Has rules and explanations, exercises and authentic examples from the Chemnitz Translation Corpus, linked together. The grammar is entirely in English, except for the bilingual examples

Notes, FAQs, teaching materials and other CBL resources

A Ten-step Introduction to Concordancing through the Collins Cobuild Corpus Concordance Sampler

Useful introduction by James Thomas

Concordancing in the classroom

Notes on using concordancing in the classroom

ICT4LT (Information & Communications Technology for Language Teachers)

(-- click on "List of Modules") A useful collection of materials (basic, intermediate & advanced) for training UK language teachers in information technology and CALL. To view the site in languages other than English (Italian, Finnish and Swedish), click here.

Joseph Rézeau’s "Data-driven Learning" Page

Some examples of multilingual concordance exercises (French and English)

MICASE-based teaching materials

Some pedagogical materials based on MICASE (Michigan Corpus of Academic Spoken English). Mainly for advanced (university-level) ESL/EFL and EAP students/instructors

Mike Scott’s software tools

A number of programs esp. relevant for language teachers

Passapong Sripicharn’s Data-driven learning page

some DDL materials

Tim Johns' Kibbitzer Pages

By one of the key pioneers of classroom concordancing. Kibbitzer = short discussion of a language point (lexico-grammatical and discoursal).

* See also the References section (on left menu) for on-line articles on the use of concordancing in the classroom.

Course outlines, research projects

* For corpus-based linguistics (i.e. less ELT-oriented) courses, see the "Courses" page

* For pedagogical textbooks based on corpora, look at the Cambridge University Press selection here.


(Computer-Assisted Language Learning; not necessarily related to corpora)


A self-access interactive English tutorial (teaching learners to recognize and correct common errors in their English)


Archive of software for Computer Assisted Language Learning (CALL) of English as a Second Language (ESL) maintained at The Institute For Education, La Trobe University, Melbourne, Australia.

CALL software list

At the TESOL CALL Interest Section; searchable databank of CALL software CALL software

Links to free ESL software, including concordancers.

Claire Bradin Siskin’s General Links for CALL

collection of links for CALL; see also the syllabus for her introductory course on CALL

Hot Potatoes

A suite of applications enabling you to create interactive multiple-choice, short-answer, jumbled-sentence, crossword, matching/ordering and gap-fill exercises for the Web. Free of charge for non-profit educational users who make their pages available on the web.

Virtual CALL Library

Offers links to a wide range of downloadable shareware CALL programs for PC users

John and Muriel Higgins’s Page

CALL & Linguistics Links

Dictionaries & Glossaries (on-line)

OneLook Dictionary Search

My favourite dictionary link, because it searches a whole flotilla of on-line dictionaries in one go, and lets you decide which dictionary’s definition you want to look at.

Collins Cobuild Student’s Dictionary
(free until further notice...)

a monolingual dictionary for learners of English, which has been converted into a complex database containing not only the restructured text of the over 31,000 definitions of the printed edition but also additional entries drafted especially for the online version; contains pronunciation sound files

Chinese-English Dictionary on-line

input text in English or Mandarin Chinese (Pinyin, Big5, GB encodings) and get definitions in the other language. Has pronunciation sound files for the Mandarin words.

Dictionarium (all languages)

A meta-site; a directory of non-printed reference works available online for various languages

Longman Web Dictionary

Claims to be the biggest and most up-to-date ESL dictionary in the world, with entries for ecotourism, keep it real, scrunchie, woolly liberal, redux (American English) and the British slang meaning of pants.

Merriam-Webster On-line Dictionary

Includes the unabridged version and a thesaurus. American English; has audio links

Oxford Advanced Learner’s Dictionary (OALD)

80,000 references, British and American English.

Cambridge Advanced Learner’s Dictionary

Another one for learners; unlike their rival dictionary (OALD), this one seems to include only contemporary pronunciations of words (e.g. entries for poor and zoology)

Search several dictionaries/glossaries/gazettes at the same time. The dictionaries used here are very old, however (e.g. from Webster’s 1913 edition).

English, Medical, Legal, and Computer Dictionaries, Thesaurus, Encyclopedia, a Literature Reference Library, and a Search Engine all in one.

Has dictionaries, glossaries and more for several languages.


Yet another dictionary and thesaurus site; includes a children’s dictionary.

When all else fails, you could try the meta-site for obscure dictionaries/glossaries, Or, if it’s swear words and colourful language you’re after, try The Alternative Dictionaries (but go with an open mind! This site contains lists of slang, "dirty" words and other "bad language", in several languages).

Miscellaneous Language Teaching/ESL/EFL/EAP Meta-sites and Links


A meta-site with links for on-line EAP (English for Academic Purposes) resources

Using English for Academic Purposes (UEfAP)

An online guide for international students

ESL Cafe (Dave Sperling)

Various resources for ESL/EFL students and teachers

Corpus-based books from Cambridge ELT

Cambridge books using analysis of the Cambridge International Corpus and extracts from Cambridge International Corpus texts

Linguistic Funland (

A meta-site with links for TESOL/TESL/TEFL, CALL, concentrating on Internet resources


A comprehensive(?) catalogue of language-related Internet resources

Software, guides & resources for teachers and students, job vacancies

AGORA Language Marketplace

Links to commercial companies, commercial language-learning/CALL software’s FAQ page

FAQ for people new to linguistics & literature

+ Don’t forget the TALC (Teaching And Language Corpora) conference, held every two years. Click on the "People, Places & Conferences" page for more info.


FrameNet lexical database

A lexicon-building effort in which researchers (1) study words; (2) describe the frames or conceptual structures which underlie these; (3) examine sentences, using a very large corpus of contemporary English that contains these words; and (4) record the ways in which information from the associated frames are expressed in these sentences.

Text Analysis Info page

provides information on qualitative and quantitative text analysis software, esp. those for "content analysis"

Tuscan Word Centre

A non-profit Association which organises one-week high-level courses (some funding available) for language researchers and workers in the language industries.Concentrates on the use of electronic corpora for different purposes, including: translation, automatic or machine-aided language processing, tagging, parsing etc., language teaching support, language learning assistance, lexicography and language reference

Other Useful Bookmark Sites

(as if you haven’t had enough already!)

Manuel Barbera’s Corpora and Corpus-based Computational Linguistics links pages

Huge collection of links. Particularly good for links to non-English corpora (both well-known and lesser-known languages).

Gateway to Corpus Linguistics on the Internet (Yvonne Breyer)

a Germany-based collection of links, organised in a similar fashion to this site

Mike Barlow’s Corpus Linguistics links page

Probably the first bookmarks site for CBL; good esp. for non-English corpora, but has lots of dead links

SIL Links

Links from the Summer Institute of Linguistics.

Texto! (click the link "Corpus et trucs")

Web site (in French) devoted to corpus-based linguistics. It mainly focuses on XML for corpus work, with various hints, a presentation of the TEI, a tutorial for using XSLT and other tools (MS Word for XML annotation, regexp and some NLP tools.).

If you need help with file formats for some of the downloads, [click here]

Did you find this useful? Most people, sadly, don’t bother to let me know, but if you want to encourage me to keep updating the site, drop me a line.

[TOP of this page]

Back to HOME[Bookmarks HOME]