Multicultural London English corpus
The London English corpus is a spoken English corpus made up of transcripts collected in London. The corpus represents Multicultural London English, the sociolect of English comprised of new English varieties in London from the late 20th century. It contains transcripts of informal conversation-like interviews with 1 or 2 speakers and a fieldworker and some self-recordings. The transcripts are from two ESRC-funded projects: Linguistic Innovators, and Multicultural London English.
This English corpus consists of 2.4 million words which are divided into separate subcorpora based on the nationality of speakers.
For more details about the speakers and the research projects from which these transcripts derive, see the bibliography.
The London English corpus was processed using TreeTagger with the Penn TreeBank tagset.