A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
However, there are still languages that have not a part-of-speech tagging tool or we cannot tag them with an existing tagger.
In this case, we developed a simple part-of-speech notation called shallow tagging which is based on regular expressions and frequency properties of tokens. Once a corpus is tagged with this simple tagset, it can be processed with Universal Sketch Grammar prepared by Siva Reddy, Adam Kilgarriff, Pavel Rychlý.