manage corpus

Manage a corpus

To reach the corpus management screen, follow these steps:

  • select the corpus if it is not selected already
  • go to the corpus dashboard dashboard

Preloaded corpora

Only the Subcorpora option is active. No other actions are allowed.

User corpora

All of the following actions are allowed.

This option displays the list of file folders and documents included in the corpus.

Folder operations

Folders can be deleted, all files inside the folder will also be deleted. Compile folder after the deletion for the changes to take effect.

File operations

Use the local menu of each document to

  • view the content of the document
  • change the document encoding
  • add, edit or delete metadata
  • download the document
  • delete the document

When multiple documents are selected, it is possible to perform these operations with all the selected files:

  • add, edit or delete metadata
  • delete the selected documents

Adding new data

To add more data to the corpus, go to MANAGE CORPUS screen and select the Make bigger option.

This will start the built-in corpus building tool to let you add more data to your corpus one or more of these options:

  • uploading files
  • downloading content from the web
  • copy and pasting text

The procedure of adding data via those methods is the same as building a new corpus via those methods.


Create a corpus from the web

Create a corpus by uploading files

By default, a user corpus is only accessible to the user who built it. No one else has access.

The user can, however, grant access to the corpus to individually selected users or to all members of a multiuser account (site licence). The user must be a member of the site licence or its administrator to use this option.

A corpus can be shared as:

  • read only – seaching and analysis is allowed
  • upload – same as read only but the user can also upload new data to the corpus
  • full access – same as upload but the user can also delete data from the corpus or change the corpus configuration

Only user corpora can be downloaded in one of these formats:

  • plain text
  • vertical file with tags and lemmas
  • TMX (parallel corpora only)

The download of preloaded corpora is not allowed from the interface. To download a preloaded corpus, contact us to get a quotation of the licencing fee.

When a new corpus is built or when data are added or removed from an existing corpus, the corpus has to be compiled first for the changes to take effect.

see Compiling a corpus

Only user corpora can be deleted. A deleted corpus cannot be recovered.

Both a preloaded corpus and user corpus can be divided into smaller parts called subcorpora. Use this option to see the existing subcorpora, delete them or build new ones.

see Create a subcorpus

This opens the corpus configuration file which holds advanced settings about how the corpus should be handled by Sketch Engine.

Logs (or log files) record information about the processing of the corpus such as compilation or the procedure of downloading content from the web.

They are mainly useful for experts when solving technical problems with the corpus.

On the Manage corpus screen, click the pencil icon edit next to the corpus name.

More options

Learn more about enhancing your corpus by visiting the Fine-tune your corpus page.