Shallow tagging | Sketch Engine

A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

However, there are still languages that do not have a part-of-speech tagging tool, or that cannot be tagged with an existing tagger.

For such cases, we developed a simple part-of-speech notation called shallow tagging, which is based on regular expressions and the frequency properties of tokens. Once a corpus is tagged with this simple tagset, it can be processed with Universal Sketch Grammar prepared by Siva Reddy, Adam Kilgarriff and Pavel Rychlý.

Part-of-speech tagsets

used in Sketch Engine

Tagset legend for shallow tagging

An example of a tag in the CQL concordance search box: [tag="FREQ"] finds the 200 most frequent words in the language.

FREQ	frequent words (200 most frequent word in the language)
CONTENT	other words
CRD	numerals
PUN	punctuation
OTHER	other

Tagset legend for shallow tagging

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine