Token annotation
For most languages, Sketch Engine tags and lemmatizes tokens in user corpora automatically. If the tools are not available for a language, the user can supply tags or lemmas manually. Automatically generated tags and lemmas can also be corrected manually.
Procedure in a nutshell
(If your corpus is in Sketch Engine, first download it in vertical format.)
- Open the corpus in a plain text editor or annotation software.
- Add metadata to tokens and save the file.
- Upload it to Sketch Engine where metadata will be processed into token attributes automatically.
Example
Tokens annotated with these attributes:
- token ID
- word
- lowercase lemma
- part of speech
- dependency ID
0 Golden golden adjective 1
1 gate gate noun 2
2 bridge bridge noun -
Structure annotation and token annotation
If the corpus contains metadata for structures, each structure will be listed on a separate line together with the related metadata like this:
<named_entity type="geography" subtype="bridge">
0 Golden adjective 1
1 gate noun 2
2 bridge noun -
</named_entity>