This page consists of two sections, one listing offline concordance programs & the other web-based concordance facilities. Most of these programs these days offer more than just allowing you to run concordances, but often also include facilities for producing frequency lists, calculating collocations, etc.
Offline Concordancers
These concordancers can be downloaded and run on your own computer, provided they are designed to run on your operating system or through an emulator.
The best free concordancer for Windows, Mac OS X and Linux that I know of. Some commercial programs may have a couple more features, but this one’s free, so don’t complain! Pros: works with all languages (fully Unicode compliant); allows full regular expressions (for very complex searches); does word lists, n-grams/clusters, collocations, and keywords (by comparing against a reference corpus); does distribution plots of occurrences within each file; can handle lemma lists; can handle XML-type and underscored_tag-type part-of-speech tags; the developer continually improves it and is open to feedback (and may I emphasize that it’s free?). Cons: (at the moment...) very minimal support for SGML/XML/HTML corpora (it simply ignores rather than intelligently mines structural tags) but that’s a problem common to most concordancers. |
|
WordSmith Tools (v. 8) |
Mike Scott’s impressive Windows-based set of tools, including concordancer, word list, keyword list, as well as concgrams. Extrememly fast, and offers a bewildering set of options for configuration and more advanced work than I can describe here. Limitations:
|
#LancsBox (v. 6.0) |
A fairly comprehensive Java-based free tool that runs on all major operating systems. Includes facilities for concordancing, investigating dispersion, analysing collocations and displaying them as networks, word lists & n-grams. |
MonoconcEsy (v.2.2) |
A fairly full-featured Windows concordancer. Free for individuals carrying out non-commercial research. One feature not available (compared to MonoConc Pro) is Corpus comparison (keywords). If you need that or if you use MonoConc Pro 2.2 and want a copy for your students, you can download the MonoConc Pro Semester Version, which will expire after around 9 or 10 months. |
MonoConc Pro (v.2.2) |
Concordancer for Windows with powerful (regular expression) search facilities. Good points: ability to show/hide tags; colour-codes collocates within the main concordance window itself; handles many languages (including Chinese, Japanese and Korean); the Advanced Collocations feature (similar to WordSmith’s clusters feature, but does other things too) is great. Not-so-good points: not as flexible/customisable as WordSmith. |
Simple Corpus Tool (v. 3.0) |
My own free concordancer, mainly designed to handle data annotated in DART format, but also usable with other types of plain text. Apart from concordancing, word/n-gram analysis, collocational and keyphrase analysis, also allows the user to edit/annotate corpus texts in basic XML, do n-gram analyses that allow for ignoring and re-interpolating tags/fillers, as well as do user-defined feature counts based on regular expressions. Concordance analyses are hyperlinked to the original files. In contrast to most other concordancers, these files are then directly editable to include annotations. Version 3 comes with a fairly extensive manual. For older versions, you can refer to my presentation given at CoLTA 2015 for features and explanations. |
A free multi-lingual concordance tool that supports different encodings. Originally developed for Arabic, and the interface can be switched between English and Arabic. Only has very basic concordancing and frequency analysis functionality. Java-based, so platform-independent. Downside: Loading a corpus essentially means loading a single file, as no multi-select or folder selection are available. Thus, the only way to select a real corpus appears to be to copy a number of files together into a single file. |
|
KWIC concordance lines, word clusters, collocation analysis, and word counts. Integrated with R. Only runs on Mac OS X. |
|
Multilingual Parallel Concordancer for Windows. It uses truly parallel texts; that is texts which relate to the same source. Priced at £40 for educational licence. |
|
WConcord is a fast and easy to use concordancer for unlimited amounts of text. It allows the user to load multiple plain text files (.txt) and create concordances based on simple or complex search patterns. Searches can be stored in a simple file format and called again for later searches over other corpora. The search facility has some capabilities for handling regular expressions which are described in the accompaying help file. WConcord also creates word frequency lists. It provides plain frequency information as well as the cumulative frequency of the tokens in a corpus. A special feature of WConcord is its ability to create collocation statistics. This function calculates the frequencies of co-occurrence of a node word (the search item(s)) with its collocates. The results can be exported in a format that can be imported into a spreadsheet or a database for further processing. |
|
JConcorder is Java software for building and managing word catalogues – created by parsing text documents – and generating concordances therefrom. It is now available in beta version, either as an application or as an applet version. |
|
A tool designed to create, interrogate and visualise parsed corpora. It’s got both a graphical interface (http://interrogator.github.io/corpkit/) and an API (https://github.com/interrogator/corpkit). The user starts with plain text files in corpora/subcorpora (i.e. folders/subfolders). In corpkit, the user can then leave them as plain, have them tokenised, or fully analysed by CoreNLP, which includes POS, lemma, constituency, dependency, etc. |
|
free concordancer for Windows & MacOS X |
|
TextStat (Matthias Hüning) |
freeware concordancer; reads ASCII/ANSI texts (in different encodings). HTML files (directly from the internet) and MS Word and and OpenOffice files (no conversion needed). Produces word frequency lists & concordances (uses regular expressions). Includes a web-spider which reads as many pages as you want from a particular web site and puts them in a TextSTAT-corpus. The news-reader puts news messages in a TextSTAT-readable corpus file. |
Multilingual Concordancer (MLTC) (Scott Piao) |
free: MLCT (Multilingual Corpus Toolkit) is a JAVA software package with a GUI (Graphical User Interfce). It provides various useful functionalities for building and processing corpora, including sentence boundary detection, concordancing, collocation extraction etc. To run the program, user needs to install the Java Runtime Environment (JRE). |
NoSketchEngine, |
NoSketch Engine is an open-source project combining Manatee and Bonito into a powerful and free corpus management system. It is essentially a limited version of the software empowering the Sketch Engine service, a commercial variant offering word sketches, thesaurus, keyword computation, user-friendly corpus creation and other
features. |
TAPoR is a gateway to the tools used in sophisticated text analysis and retrieval. |
|
A general purpose XML-aware search engine (Windows platform) that will operate on any corpus of well-formed XML documents as well as plain text files (best used with TEI-conformant documents); Unicode-compliant, so works with any language provided the relevant Unicode font is installed on the system. Originally developed at OUCS for use with the British National Corpus. |
|
a text database engine for analyzed or annotated text; supports storage and retrieval of any kind of text plus annotations/analyses of that text. Linguistic analyses are its primary target, and here syntactic analyses are in focus (although other linguistic levels are supported, too). It excels in storing and querying structured data, supporting multiple hierarchies of embedding over the same text. Its powerful query language is built around sequence and embedding as the primary structuring operations. It implements the EMdF database model and the MQL query language. |
|
IMS Corpus Workbench (CWB) |
Excellent corpus query system (my personal favourite) for SunOS 4.1.x, Solaris 2.x/Linux; powerful (full regular expression searches). Fast (indexed) concordancer with both command-line (including batch mode) & X-windows interface; Free for educational use.[Query Syntax & Examples here] |
Concordance (R.J.C. Watt) |
concordancer for Windows; has facility for publishing concordances to the web; supports non-European character sets (inc. Chinese, Japanese & Korean; currently [18:10 11-Feb-2016] not available). |
Web-based Concordancers
BNCweb (CQP edition) |
The most powerful and user-friendly free interface to the British National Corpus (XML World Edition): a browser-based tool for exploring the BNC. Incorporates genre categories as set out in David Lee’s BNC Index and access to the audio recordings for more than 5 million words of spoken data. For more information on how to work with audio data, see the Searching Audio Data guide (also available directly from within BNCweb). There is a manual/textbook that accompanies this tool: Hoffmann, Sebastian, Evert, Stefan, Smith, Nicholas, Lee, David & Ylva Berglund Prytz. (2008). Corpus Linguistics with BNCweb: A Practical Guide. Frankfurt am Main: Peter Lang. (Publisher’s site is here.) |
BYU-BNC |
allows word-, phrase- or part-of-speech-based searches of the British National corpus (BNC) with genre-restrictions; allows wildcards and "fuzzy matches". (Formerly called VIEW: Variation In English Words And Phrases) |
web-based suite of tools for data-driving self-learning (mainly for vocabulary). The online tools allow any reader with an Internet connection to transform any text of interest into a self-teaching text linked to speech, dictionary, concordance, and self-test resources. You paste a text/corpus into one of the tools provided and get results via your browser. Tools include a concordancer, a phrase (n-gram) extractor, VocabProfile (tells you how many words in the text come from the following four frequency levels: (1) the list of the most frequent 1000 word families, (2) the second 1000, (3) the Academic Word List, and (4) words that do not appear on the other lists), a vocab-level-based cloze passage generator and a traditional nth-word cloze builder. |
|
Just The Word (Sharp) |
Simplest and most pedagogically accessible tool for ESL/EFL learners based on the British National Corpus (BNC). Enter a word and get back a bunch of collocations & colligations, sorted into similarity groups. (Based on a 80-million-word subset of the BNC.) |
Phrases in English (PIE) |
PIE incorporates a database of all 1-6-grams (phrases 1 to 6 "words" long) with part-of-speech (POS) codes occurring three or more times in the 100-million-word British National Corpus (BNC). You can explore English phraseology either through lists of forms and their frequencies or by searching for specific forms or collocations, e.g. 2-grams of the pattern "ADJ work", to find the most frequent adjectives describing work. PIE also offers a phrase pattern discovery tool, "phrase-frames": sets of variants of an n-gram identical except for one word (wildcard symbol *), e.g., "the * of the", with variants such "the end of the", "the rest of the", "the top of the", "the nature of the". Over the next year PIE will add: (i) Click on an n-gram in the query results to see concordances from the BNC (ii) POS-grams and POS-frames for studying the relative productivity of phrase structures (iii) Filtering by text type (domain, genre, target audience) for contrastive studies (iv) Query by regular expression (currently only wildcards are supported). |
A web-based search tool that can be loaded directly with corpora created using SACODEYL Annotator. |
|
Sketch Engine (Lexical Computing) |
Sketch Engine is a corpus manager and text analysis software. It is a paid service with a 30-day free trial. |
SKELL (Sketch Engine for Language Learning) |
Searches more than one billion words of English from news, scientific papers, Wikipedia articles, fiction books, web pages, blogs. Three functions: (1) Examples [concordance]: search for a word or a phrase and get the most presentable sentences for it. (2) Word sketch [collocations & colligations]: a list of words which occur frequently together with the searched word. (3) Similar words (not only synonyms) are words used in similar contexts visualized with a word cloud. Also available for Russian and Czech, German, Italian, and Estonian. |
Turbo Lingo (Danko Sipka) |
free web-browser-based concordancer. You can get concordances and frequency lists of entire Web pages (by entering a URL), or by pasting a text into the input box. Also features "1x1phonotactics" and "1x1 lex. combinatorics". |
* The above represent just a personal selection. There are many more out there. Kennedy (1998: 258-267) lists and describes quite a number of them.
* See also: Using the Web as a corpus
If you found this web site useful, or found an outdated link, don’t forget to let me know.