English Corpus for SkELL is a text corpus specially built up for the English SkELL interface available at skell.sketchengine.eu. The corpus does not contain whole documents but only sentences sorted according to their text quality. This score was computed by the GDEX system.
The corpus is made up of Wikipedia articles, selected parts of English Web 2013 corpus and Timestamped web corpus and English websites crawled by the WebBootCat tool. These sources provide a good example of how English is used in everyday, standard, formal and professional context over 1 billion words in more than 57 million sentences.
no. of words
English Web 2013
Timestamped web corpus
British National Corpus
The corpus is accessible to all users with a subscription plan and site licence members (not to trial users).
Tools to work with the English SKELL corpus
A complete set of tools is available to work with this English corpus for SKELL to generate:
word sketch – English collocations categorized by grammatical relations
thesaurus – synonyms and similar words for every word
keywords – terminology extraction of one-word and multi-word units
word lists – lists of English nouns, verbs, adjectives etc. organized by frequency