What is corpus compilation?

Each user corpus has to be compiled before it can be used. A corpus can be copiled again with new settings at any moment and as many times as required.

Compilation involves applying several tools that process the corpus data so that the complete Sketch Engine functionality is available. It involves the computation of Word Sketches, thesaurus, n-grams and trends. It also processes the metadata in the corpus or new metadata which the user may have added using the document annotation tool.

Some other tools may be applied to specific languages or special-use corpora.

Recompile a corpus

A corpus has to be recompiled each time new data are added or new functionality is to be made available, for example, new word sketch grammar or term grammar should be applied to the data.

When new texts are added to the corpus or deleted from the corpus, the change will not be reflected in search results unless the corpus is recompiled.

How to (re-)compile a corpus

A user corpus created from the web (WebBootCaT) is compiled automatically. Sometimes, however, it might be necessary to start compilation manually.

A corpus created by uploading files has to be compiled manually.

Follow these steps:

  • select the corpus
  • go to the dashboard dashboard
  • click Compile build
  • (optional, for advanced users) use Expert settings to set the compilation parameters, use the tooltips in the interface to learn about the settings
  • click COMPILE

  • you might also be invited to upgrade your corpus at this point to bring in new functionality, upgrading will trigger compilation automatically
  • click Compile to start compiling the corpus