A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Danish ePOS part-of-speech tagset

Danish ePos part-of-speech tagset is used to mark morphological categories in Danish corpora annotated by taggers following the tag set, e.g. TreeTagger with the respective model trained using the ePAROLE corpus.

Subclassifications of particular PoS have a fixed position within the tag. For example, in the case of common nouns, the number marker is always found at position 3, definitness at position 4, case at position 5, and gender at position 6. Example: NC:siuc:--:---- represents (in the order of positions) a noun, common, singular, indefinite, unmarked case, common gender. See the full tagset specification in Jørg Asmussen: Design of The ePOS Tagger, Technical Report, DSL, 2015.

The basic structure of an ePOS tag is:

CLASS:nominal:verbal:additional

An Example of a tag in the CQL concordance search box: [tag="NC:sigc:.*"] finds all common nouns which meet conditions: singular number, indefiniteness, genitive case and common gender.  e.g. verdens, finnes (note: please make sure that you use straight double quotation marks)

Class and subclass

% can be replaced with a mark of inflectional part of the tag

means not defined

POS Subcategory POS tag example
V Verb I infinitive VI:—-:-%:—-
F finite VF:—-:%%:—-
M imperative VM:—-:–:—-
G gerund VG:%%%%:–:—
P participle VP:%%%%:%-:—-
T past part. VT:siu#:%-:—-
D adv. part. VD:—-:%-:—-
A Adjective C common AC:%%%%:–:%—
D adverbial AD:—-:–:%—
L Numeral C cardinal LC:–%-:–:—-
O ordinal LO:–%%:–:—-
N Noun C common NC:%%%%:–:—-
P proper NP:%%%%:–:—-
P Pronoun C reciprocal PC:%-%-:–:—-
M demonstrative PM:%-%%:–:—-
I indefinite PI:%-%%:–:—-
O possessive PO:%–%:–:-%%%
P personal PP:%-%%:–:-%%-
R relative PR:%-%%:–:—-
D Adverb D-:—-:–:%—
I Interjection I-:—-:–:—-
T Preposition T-:—-:–:—-
C Conjunction C coordinating CC:—-:–:—-
S subordinating CS:—-:–:—-
U Unique I inf.marker UI:—-:–:—-
S som/der US:—-:–:—-
E Lexical element W word formation EW:—-:–:—-
M Inflectional ending N attached to a noun MN:%%%%:–:—-
V attached to a verb MV:—-:%%:—-
A attached to an adj. MA:%%%%:–:%—
X Residual S symbol XS:—-:–:—-
F foreign XF:—-:–:—-
Y tagging error XY:—-:–:—-

Inflectional part of the tag

Nominal markers

Position Marker Category Tag
1. Number (NUM) singular s
plural p
2. Definiteness (DEF) indefinite i
definite d
3. Case (CAS) unmarked u
genitive g
fossilized f
personal pronouns only nominative n
(accusative is identical with unmarked) u
4. Gender (GEN) common c
neuter n

Verbal markers

Position Marker Category Tag
1. Tense (TMP) present s
past t
2. Voice (VOC) active a
passive p

Additional markers

Position Marker Category Tag
1. Degree (DEG, adjectives and some adverbs) positive p
comparative c
superlative s
absolute superlative a
2. Person (PER, personal and possessive pronouns) first 1
second 2
third 3
3. Reflexiveness (RFL, personal and possessive pronouns) yes y
no n
4. Possessor (POS, possessive pronouns) singular s
plural p

Source: Jørg Asmussen: Design of The ePOS Tagger, Technical Report, DSL, 2015.

Danish text corpora in Sketch Engine

Sketch Engine offers dozens of Danish language corpora.

or