Manage a corpus

You can only manage your own corpora. To access the corpus management options:

  • click Home
  • click My corpora
  • click the wrench icon next to the corpus name

The screen will then look similar to this:

Manage corpus

(1) options at the top

The line at the top gives you 4 options:

  • expand your corpus by uploading files
  • expand your corpus by downloading texts from the internet
  • compile the corpus
  • search the corpus

Expanding your corpus

You can more texts to your corpus using any of the available methods. You can combine the methods, i.e. part of the corpus can be from uploaded files, part form the internet and part from the translation memory.

Add new file
will let you upload more files

Add data from web (WebBootCaT)
lets you use WebBootCat to find and download more relevant texts from the internet, more information»

Compile corpus

see no. 3 below

Search corpus

This is the equivalent of clicking Search in the left menu. Gives you access to a standard search to create a concordance.

(2) Show corpus files

Will take you back to the screen shown in the screenshot which shows the files in the corpus. Each line is one occassion of adding texts to corpus. For example, each uploading (even if multiple files) is one line, each use of WebBootCaT is one line.

(3) Compile corpus

A corpus needs to be compiled (=processed) each time new texts are added or when the user wants t use a new sketch grammar.

The settings give the user the option to to select the xml tags that should be used as structures in your corpus. You also need to specify the structure used for references which will be used to enclose the data from each file that you uploaded. This must be different to any of the other structure names that you have already used in your file. By default this is doc.

  • You also have the check box option to use the program “onion” which will automatically remove duplicate content from your corpus. If you opt to use onion then you can specify which structure the program will consider when removing duplicates (for example, at the document, paragraph or sentence level).

(4) Configure corpus

(5) Set sketch grammar

You can select the sketch grammar from a list of preloaded grammars or write your own sketch grammar (see Writing a sketch grammar).

(6) Set subcorpora

Set subcorpus definitions

You can define subcorpora of your corpora (see an example of Subcorpus definition file).

(7) Download corpus

Download the corpus as text or in vertical format. Vertical format is useful if you want to retain any of the structures for uploading back into Sketch Engine.

User corpora

Corpora created by users can be fully accessed and downloaded by the user who created the corpus or with whom they share the corpus.

Preloaded corpora

Preloaded corpora can be searched but cannot be downloaded.

(8) Share corpus

User corpora are not public. The user can, however, grant permission to other users to access their corpora. This has to be done for each user corpus separately.

Access privileges

Sharing a corpus grants access to the corpus to users or groups of users or everyone in the site licence.

User groups

The user group function is a practical solution when access has to be granted repeatedly to the same group of users. To create or edit a group, the user group menu item can be located in the main left menu.

Site licences

If you are a site licence administrator, you can share the corpus with all members of the site licence with one click.

Permission options
  • read only (they can view but not change),
  • upload files (they can view and add new data) or
  • full (they will have full access and can change the configuration or recompile the corpus as well as add data to it; however, they cannot remove the original data and edit metadata)

(9) View logs

View the results of compiling the corpus or WebBootCat.