For most languages, Sketch Engine tags and lemmatizes tokens in user corpora automatically. If the tools are not available for a language, the user can supply tags or lemmas manually. Automatically generated tags and lemmas can also be corrected manually.
Procedure in a nutshell
(If your corpus is in Sketch Engine, first download it in vertical format.)
- Open the corpus in a plain text editor or annotation software.
- Add metadata to tokens and save the file.
- Upload it to Sketch Engine where metadata will be processed into token attributes automatically.
Tokens annotated with these attributes:
- lowercase lemma
- part of speech
Golden golden adjective gate gate noun bridge bridge noun
Structure annotation and token annotation
If the corpus contains metadata for structures, each structure will be listed on a separate line together with the related metadata like this:
<named_entity type="geography" subtype="bridge"> Golden golden adjective gate gate noun bridge bridge noun </named_entity>