Corpora for language learners

A text corpus can be extremely valuable to language students and teachers. Traditionally, their main source of information about language, apart from their coursebook, has been a dictionary. Unfortunately, print dictionaries can only contain a limited amount of information. Online dictionaries are no longer limited by space but are still limited by the publisher’s budget. As a result, they may not contain the word or phrase the user needs to check and, more importantly, they fail to include all the different contexts. This is exactly where corpora become invaluable.

A large corpus contains so many occurrences of a word that a large variety of contexts is included. The user can check the usual topics, types of text but also prepositions and collocations which are typically related to the word. Corpora help us find out about how words and phrases are used by real users of the language.

All information is generated automatically by analyzing billions of words of authentic text produced by real users of the language.

Examples

  • Use word sketch to find out which adjectives are best to be used with the word meeting and display example sentences.
  • Use thesaurus for suggestions of alternatives for the word wonderful.
  • Use concordance to check whether your own phrase is actually found in authentic texts.

Corpora

These are the recommended corpora for language learning but feel free to use different ones or even build your own.

TenTen corpora – large and the most versatile corpora for general use in many languages.

SKELL corpora – specialized learner-friendly corpora created from the TenTen corpora by only filtering sentences suitable for teaching languages. (see also GDEX)

SKELL

Sketch Engine for Language Learning (SKELL) is a simple interface for searching SKELL corpora. It uses Sketch Engine technology but only includes functionality relevant to language learning. More information on SKELL – examples and collocations for learners of English

The interface can be used free of charge on skell.sketchengine.eu

THOMAS, James Edward. Discovering English with Sketch Engine (DESkE), 2015.

Adam Kilgarriff, F. Charalabopoulou, M. Gavrilidou, J.B. Johannessen, S. Khalil, S.J. Kokkinakis, R. Lew, S. Sharoff, R. Vadlapudi and E. Volodina. Corpus-based vocabulary lists for language learners for nine languages. In Language Resources and Evaluation, volume 48, issue 1, March 2014, pp. 121–163.