(1) options at the top
The line at the top gives you 4 options:
- expand your corpus by uploading files
- expand your corpus by downloading texts from the internet
- compile the corpus
- search the corpus
Expanding your corpus
You can more texts to your corpus using any of the available methods. You can combine the methods, i.e. part of the corpus can be from uploaded files, part form the internet and part from the translation memory.
Add new file
will let you upload more files
Add data from web (WebBootCaT)
lets you use WebBootCat to find and download more relevant texts from the internet, more information»
Compile corpus
see no. 3 below
Search corpus
This is the equivalent of clicking Search in the left menu. Gives you access to a standard search to create a concordance.
(2) Show corpus files
Will take you back to the screen shown in the screenshot which shows the files in the corpus. Each line is one occassion of adding texts to corpus. For example, each uploading (even if multiple files) is one line, each use of WebBootCaT is one line.
(3) Compile corpus
A corpus needs to be compiled (=processed) each time new texts are added or when the user wants t use a new sketch grammar.
The settings give the user the option to to select the xml tags that should be used as structures in your corpus. You also need to specify the structure used for references which will be used to enclose the data from each file that you uploaded. This must be different to any of the other structure names that you have already used in your file. By default this is doc.
- You also have the check box option to use the program “onion” which will automatically remove duplicate content from your corpus. If you opt to use onion then you can specify which structure the program will consider when removing duplicates (for example, at the document, paragraph or sentence level).
(4) Configure corpus
- Configure corpus: Configure the corpus either using the interface for a few options or you can first select Expert mode and in Expert mode you can manually edit the corpus configuration file (see Corpus Configuration File: Overview and Corpus Configuration File: All Features).
(5) Set sketch grammar
You can select the sketch grammar from a list of preloaded grammars or write your own sketch grammar (see Writing a sketch grammar).
(6) Set subcorpora
Set subcorpus definitions
You can define subcorpora of your corpora (see an example of Subcorpus definition file).
(7) Download corpus
Download the corpus as text or in vertical format. Vertical format is useful if you want to retain any of the structures for uploading back into Sketch Engine.
User corpora
Preloaded corpora
(8) Share corpus
User corpora are not public. The user can, however, grant permission to other users to access their corpora. This has to be done for each user corpus separately.
Access privileges
Sharing a corpus grants access to the corpus to users or groups of users or everyone in the site licence.
User groups
The user group function is a practical solution when access has to be granted repeatedly to the same group of users. To create or edit a group, the user group menu item can be located in the main left menu.
Site licences
If you are a site licence administrator, you can share the corpus with all members of the site licence with one click.
Permission options
- read only (they can view but not change),
- upload files (they can view and add new data) or
- full (they will have full access and can change the configuration or recompile the corpus as well as add data to it; however, they cannot remove the original data and edit metadata)
(9) View logs
View the results of compiling the corpus or WebBootCat.