Multicultural London English corpus
The Multicultural London English corpus is a spoken English corpus made up of transcripts collected in London. The corpus represents Multicultural London English, the sociolect of English comprised of new English varieties in London from the late 20th century. It contains transcripts of informal conversation-like interviews with 1 or 2 speakers and a fieldworker and some self-recordings. The transcripts are from two ESRC-funded projects: Linguistic Innovators, and Multicultural London English.
This English corpus consists of 2.4 million words which are divided into separate subcorpora based on the nationality of speakers.
For more details about the speakers and the research projects from which these transcripts derive, see the bibliography.
The Multicultural London English corpus was processed using TreeTagger with the Penn TreeBank tagset.
Tools to work with the London English corpus
A complete set of tools is available to work with this spoken English corpus to generate:
- word sketch – English collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- keywords– terminology extraction of one-word
- text type analysis – statistics of metadata in the corpus
Cheshire, J., Kerswill, P., Fox, S. and Torgersen, E. (2011). Contact, the feature pool and the speech community: The emergence of Multicultural London English. In Journal of Sociolinguistics 15, pp. 151–196.
Search the Multicultural London English corpus
Sketch Engine offers a range of tools to work with this Multicultural London English corpus.
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.