London English corpus

The London English corpus contains transcripts of audio recordings made for two ESRC-funded projects: Linguistic Innovators (2004–2007) and Multicultural London English: The Emergence, Acquisition and Diffusion of a New Variety (2007–2010). It consists of 2.4 million words, divided into separate subcorpora: Linguistic Innovators, Multicultural London English corpus.

The Linguistic Innovators subcorpus contains transcripts of two sets of recordings, made in inner London (Hackney) and outer London (Havering). In each location, there are 8 speakers aged 70 and above, and about 50 speakers aged 16–19. The Multicultural London English corpus contains transcripts of recordings of 127 speakers aged 4, 8, 12, 16-19, 20–30 and 40–50, made in inner London (Hackney, Haringey and Islington). All recordings were of informal conversation-like interviews with 1 or 2 speakers and a fieldworker, plus some self-recordings from 16–19 year olds. Transcripts from speakers aged 70 and above in inner London represent traditional London English; transcripts from other speakers in inner London represent the new London dialect known as Multicultural London English.

For further details about the speakers, the two projects and Multicultural London English see the bibliography.

Part-of-speech tagset

The Multicultural London English corpus was processed using TreeTagger with the Penn TreeBank tagset.

To cite the projects:

Kerswill, P., Cheshire, J., Fox, S., &Torgersen, E. (2004-2007). Linguistic innovators: The English of adolescents in London. ESRC Research Project, RES-000-23-0680.

Kerswill, P., Cheshire, J., Fox, S., & Torgersen, E. (2007-2010). Multicultural London English: The emergence, acquisition and diffusion of a new variety. ESRC Research Project, RES-062-23-0814.

Tools to work with the London English corpus

A complete set of tools is available to work with this spoken English corpus to generate:

  • word sketch – English collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • keywords– terminology extraction of one-word
  • text type analysis – statistics of metadata in the corpus

Cheshire, J., Kerswill, P., Fox, S., & Torgersen, E. (2011). Contact, the feature pool and the speech community: The emergence of Multicultural London English. Journal of Sociolinguistics, 15, 151–196.

Cheshire, Jenny, Sue Fox, Paul Kerswill and Eivind Torgersen (fc, 2024) ‘Multicultural London English’, in Kingsley Bolton (ed), Wiley Blackwell Encyclopedia of World Englishes. London: Wiley Blackwell. London English’, in Kingsley Bolton (ed), Wiley Blackwell Encyclopedia of World Englishes. London: Wiley Blackwell.

Fox, S., & Torgersen, E. (2018). Language change and innovation in London: Multicultural London English. In N. Braber & S. Jensen (Eds.), Sociolinguistics in England (pp. 189-213). Basingstoke: Palgrave Macmillan.

Search the London English corpus

Sketch Engine offers a range of tools to work with this London English corpus.

Concordance from Cambridge Learner corpus

English Trends corpus

Explore our largest English corpus, which totals over 80 billion words and grows automatically every week.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.