Manage a corpus
To reach the corpus management screen, follow these steps:
- select the corpus if it is not selected already
- go to the corpus dashboard dashboard
- click MANAGE CORPUS
Only the Subcorpora option is active. No other actions are allowed.
All of the following actions are allowed.
This option displays the list of file folders and documents included in the corpus.
Folders can be deleted, all files inside the folder will also be deleted. Compile folder after the deletion for the changes to take effect.
Use the local menu of each document ⋯ to
- view the content of the document
- change the document encoding
- add, edit or delete metadata
- download the document
- delete the document
When multiple documents are selected, it is possible to perform these operations with all the selected files:
- add, edit or delete metadata
- delete the selected documents
Adding new data
To add more data to the corpus, go to MANAGE CORPUS screen and select the Make bigger option.
Make bigger add_circle
This will start the built-in corpus building tool to let you add more data to your corpus one or more of these options:
- uploading files
- downloading content from the web
- copy and pasting text
The procedure of adding data via those methods is the same as building a new corpus via those methods.
By default, a user corpus is only accessible to the user who built it. No one else has access.
The user can, however, grant access to the corpus to individually selected users or to all members of a multiuser account (site licence). The user must be a member of the site licence or its administrator to use this option.
A corpus can be shared as:
- read only – seaching and analysis is allowed
- upload – same as read only but the user can also upload new data to the corpus
- full access – same as upload but the user can also delete data from the corpus or change the corpus configuration
Only user corpora can be downloaded in one of these formats:
- plain text
- vertical file with tags and lemmas
- TMX (parallel corpora only)
The download of preloaded corpora is not allowed from the interface. To download a preloaded corpus, contact us to get a quotation of the licencing fee.
When a new corpus is built or when data are added or removed from an existing corpus, the corpus has to be compiled first for the changes to take effect.
Only user corpora can be deleted. A deleted corpus cannot be recovered.
Both a preloaded corpus and user corpus can be divided into smaller parts called subcorpora. Use this option to see the existing subcorpora, delete them or build new ones.
This opens the corpus configuration file which holds advanced settings about how the corpus should be handled by Sketch Engine.
Logs (or log files) record information about the processing of the corpus such as compilation or the procedure of downloading content from the web.
They are mainly useful for experts when solving technical problems with the corpus.
Rename a corpus edit
On the Manage corpus screen, click the pencil icon edit next to the corpus name.