When you create a corpus from the Sketch Engine interface (see the help page on creating a corpus), at the second step you are asked to specify a configuration template so that Sketch Engine has the information needed to interpret your data. Depending on the language of your corpus (selected at the first step), various preloaded configuration templates will be offered. These are existing templates that you can use without having to write your own. The configuration templates rely on preprocessors available for the language that can tokenise, part-of-speech tag and lemmatise your data and produce a word sketch. To help guide your choice, under each preloaded configuration template you can see icons (V, T, L and S) which indicate the extent that the configuration template can produce:
- V – vertical file; whether there’s an associated tool (POS-tagger or tokeniser) which can convert the uploaded documents to vertical file format required by Sketch Engine
- T – tags
- L – lemmata
- S – word sketches
these icons are coloured green, yellow and red to indicate the extent that the output incorporates these features, with green meaning fully available, yellow partial and red meaning unavailable.