What is corpus compilation?

Each user corpus has to be compiled before it can be used. Compilation involves applying several tools that process the corpus data so that the complete Sketch Engine functionality is available. It involves the computation of Word Sketches, thesaurus, n-grams and trends.

Some other tools may be applied to specific languages or special-use corpora.

Recompile a corpus

A corpus has to be recompiled each time new data are added or new functionality is to be made available, for example, new word sketch grammar or term grammar should be applied to the data.

When new tesxs are added to the corpus or deleted from the corpus, the change will not be reflected in search results unless the corpus is recompiled.

How to (re-)compile a corpus

A user corpus created from the web (WebBootCaT) is compiled automatically. Sometimes, however, it might be necessary to start compilation manually.

A corpus created by uploading files has to be compiled manually.

Follow these steps:

  • select the corpus
  • go to the dashboard dashboard
  • click MANAGE CORPUS
  • click Compile build
  • (optional, for advanced users) use Expert settings to set the compilation parameters, use the tooltips in the interface to learn about the settings
  • click COMPILE

  • you might also be invited to upgrade your corpus at this point to bring in new functionality, upgrading will trigger compilation automatically
  • click Compile to start compiling the corpus