Multicultural London English corpus

The Multicultural London English corpus is a spoken English corpus made up of transcripts collected in London. The corpus represents Multicultural London English, the sociolect of English comprised of new English varieties in London from the late 20th century. It contains transcripts of informal conversation-like interviews with 1 or 2 speakers and a fieldworker and some self-recordings. The transcripts are from two ESRC-funded projects: Linguistic Innovators, and Multicultural London English.

This English corpus consists of 2.4 million words which are divided into separate subcorpora based on the nationality of speakers.

For more details about the speakers and the research projects from which these transcripts derive, see the bibliography.

Part-of-speech tagset

The Multicultural London English corpus was processed using TreeTagger with the Penn TreeBank tagset.

Tools to work with the London English corpus

A complete set of tools is available to work with this spoken English corpus to generate:

  • word sketch – English collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

Bibliography

Cheshire, J., Kerswill, P., Fox, S. and Torgersen, E. (2011). Contact, the feature pool and the speech community: The emergence of Multicultural London English. In Journal of Sociolinguistics 15, pp. 151–196.

Search the Multicultural London English corpus

Sketch Engine offers a range of tools to work with this Multicultural London English corpus.

Concordance from Cambridge Learner corpus

Other English corpora

Explore our largest English corpus with 30+ billion words.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.