How to download a corpus

User corpora, i.e. the corpora which the user builds, can be downloaded.

Preloaded corpora cannot be downloaded from the interface. They can be provided at a fee or licensed. Please contact us.

User corpora

Sketch Engine can be used to build a text corpus, have it POS-tagged and lemmatized and download the corpus in plain text or vertical file formats. Only user corpora can be downloaded from Sketch Engine.

How to download
  • Select the corpus if you have not done so.
  • Go to corpus dashboard
  • Click on MANAGE CORPUS
  • Click on DOWNLOAD
File formats for corpus download
  • a plain text file – this is the plain text version without pos tags or lemmas but including all structures and structural attributes
  • vertical file – this is the corpus in vertical format with both pos tags, lemmas and structures and attribute. This format is best for preserving as much information as possible.
  • TMX – this format is only availble with parallel copora

Preloaded corpora

Preloaded corpora in Sketch Engine cannot be downloaded but word embeddings computed from these corpora for the purpose of language modelling and similar applications are available for download from our word embeddings page.

Users can also download word lists, n-gram lists and other language data generated from these corpora.

On-demand corpus building

If you would like a tailor-made corpus made to your specifications, please contact us.