Token annotation

For most languages, Sketch Engine tags and lemmatizes tokens in user corpora automatically. If the tools are not available for a language, the user can supply tags or lemmas manually. Automatically generated tags and lemmas can also be corrected manually.

Procedure in a nutshell

(If your corpus is in Sketch Engine, first download it in vertical format.)

  • Open the corpus in a plain text editor or annotation software.
  • Add metadata to tokens and save the file.
  • Upload it to Sketch Engine where metadata will be processed into token attributes automatically.

Example

Tokens annotated with these attributes:

  • word
  • lowercase lemma
  • part of speech
Golden	golden	adjective
gate	gate	noun
bridge	bridge	noun

Structure annotation and token annotation

If the corpus contains metadata for structures, each structure will be listed on a separate line together with the related metadata like this:

<named_entity type="geography" subtype="bridge">
Golden	golden	adjective
gate	gate	noun
bridge	bridge	noun
</named_entity>