A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

However, there are still languages that have not a part-of-speech tagging tool or we cannot tag them with an existing tagger.

In this case, we developed a simple part-of-speech notation called shallow tagging which is based on regular expressions and frequency properties of tokens. Once a corpus is tagged with this simple tagset, it can be processed with Universal Sketch Grammar prepared by Siva Reddy, Adam Kilgarriff, Pavel Rychlý.

Part-of-speech tagsets

used in Sketch Engine

Tagset legend for shallow tagging

An Example of a tag in the CQL concordance search box: [tag="FREQ"] finds the 200 most frequent words in the language.

FREQ frequent words (200 most frequent word in language)
CONTENT other words
CRD numerals
PUN punctuations
OTHER other