Token annotation

For most languages, Sketch Engine tags and lemmatizes tokens in user corpora automatically. If the tools are not available for a language, the user can supply tags or lemmas manually. Automatically generated tags and lemmas can also be corrected manually.

Procedure in a nutshell

(If your corpus is in Sketch Engine, first download it in vertical format.)

  • Open the corpus in a plain text editor or annotation software.
  • Add metadata to tokens and save the file.
  • Upload it to Sketch Engine where metadata will be processed into token attributes automatically.


Tokens annotated with these attributes:

  • token ID
  • word
  • lowercase lemma
  • part of speech
  • dependency ID
0 Golden golden adjective 1
1 gate gate noun 2
2 bridge bridge noun -

Structure annotation and token annotation

If the corpus contains metadata for structures, each structure will be listed on a separate line together with the related metadata like this:

<named_entity type="geography" subtype="bridge">
0 Golden adjective 1
1 gate noun 2
2 bridge noun -