How to download a corpus

User corpora

Sketch Engine can be used to build a text corpus, have it POS-tagged and lemmatized and download the corpus in plain text or vertical file formats. Only user corpora can be downloaded from Sketch Engine.

How to download
  • To download a corpus, open the corpus by clicking its name.
  • Click on Manage corpus in the left menu.
  • Click on Download corpus in the left menu.
File formats for corpus download
  • a plain text file – this is the plain text version without pos tags or lemmas but including all structures and structural attributes
  • vertical file – this is the corpus in vertical format with both pos tags, lemmas and structures and attribute. This format is best for preserving as much information as possible.

Preloaded corpora

Preloaded corpora in Sketch Engine cannot be downloaded but word embeddings computed from these corpora for the purpose of language modelling and similar applications are available for download from our word embeddings page.

Users can also download word lists, n-gram lists and other language data generated from these corpora.

On-demand corpus building

If you would like a tailor-made corpus made to your specifications, please contact us.