A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Danish ePOS part-of-speech tagset
Danish ePos part-of-speech tagset is used to mark morphological categories in Danish corpora annotated by taggers following the tag set, e.g. TreeTagger with the respective model trained using the ePAROLE corpus.
Subclassifications of particular PoS have a fixed position within the tag. For example, in the case of common nouns, the number marker is always found at position 3, definitness at position 4, case at position 5, and gender at position 6. Example:
NC:siuc:--:---- represents (in the order of positions) a noun, common, singular, indefinite, unmarked case, common gender. See the full tagset specification in Jørg Asmussen: Design of The ePOS Tagger, Technical Report, DSL, 2015.