Annotating your corpus

To annotate a corpus means to add information (metadata) about the text. This information can relate to structures (documents, paragraphs, sentences etc.) or individual tokens.

structures
(metadata)
tokens
(lemmas, tags etc.)
who needs this? 99.9 % of users
who need to annotate
only 0.1 % of users
who need to annotate
anotated segment text of any length between one token and the whole corpus exactly one token
used for year of publication
source (website, book, newspaper)
author name
register (formal, informal)
type of named entity (polititian, actor…)
and an endless list of other options
part of speech tags
lemmas
(or other information that always relates to one token and never to a sequence of tokens)
automatic vs. manual manual, possibly helped by the built-in annotation tool automatic using taggers and lemmatizers in Sketch Engine

manual only necessary if Sketch Engine does not have automatic tools
OR
if the automatic tags and lemmas require customisation

Annotation tool

The built-in annotation tool allows adding metadata to documents easily.

metadata annotation