COMPAS: Corpus of the news articles related to immigration
The COMPAS Corpus is an English corpus made up of texts collected from the daily newspaper articles about immigration. In total there were collected 132,242 articles about immigrants, migrants, asylum seekers, and refugees that had appeared in the UK’s national newspapers from 2006 to 2013.
The corpus was extended in 2016. There were added texts from the period 1985–2005 and 2014–2015. This version consists of 260 million words from 354,661 articles.
The access to the corpus is restricted. Please contact Dr William L Allen (Centre on Migration, Policy, and Society at the University of Oxford) at firstname.lastname@example.org who can grant you access to this corpus. Then forward his answer to our support email [
support(a)sketchengine.eu] so that we could set up the access for you.
COMPAS corpus in detail
The UK national press can be divided into three main categories: tabloids, midmarkets, and broadsheets. The list of all newspapers within the corpus includes: Daily Mail, Daily Mirror, Daily Star, Daily Star Sunday, Financial Times, Mail on Sunday, Sunday Express, Sunday Mirror, The Daily Telegraph, The Express, The Guardian, The Independent, The Independent on Sunday, The Observer, The People, The Sun, The Sunday Telegraph, The Sunday Times, The Times.
The documents in the corpus contain the following meta fields:
- date – In the form of yyyy-mm-dd
- publication – Name of the publication from where the text is taken
- title – Title of the article
- month – Contains the month in which the content was posted.
- language – English ( this is the case for all the articles )
- year – Contains the year in which the content was posted.
- quarter – Contains information about the quarter of the year in which it was posted. represented by q1,q2,q3 and q4.
The COMPAS corpus was lemmatized and PoS tagged by TreeTagger using English Penn TreeBank tagset.