A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
This Czech part-of-speech tagset is available in Czech corpora and also Slovak corpora annotated by Majka morphological analyzer.
An Example of a tag in the CQL concordance search box: [tag="k1.*nP.*"] finds all nouns in plural, e.g. lidé, roky (note: please make sure that you use straight double quotation marks)
The whole tag is comprised of pairs – attribute and its value – the attribute is represented by a single lower case (n for numbers) and its value by a single capital letter (P for plural). Each tag starts with the 2 characters representing part of speech, e.g. k1 means noun, k2 means adjective, etc. This Czech PoS tagset is called “attributive” because each attribute consists of attribute-value pairs, e.g. gF means gender (g) feminine (F). The order of attributes and their values is canonical as follows:
kegncpamdxytzw~
This means that gender (g) precedes number (n) which is before case (c) etc. For example, the tag "k2.*gFnSc7.*" searches for all feminine singular adjectives in the instrumental cases (the incorrect form would be "k2.*nSgFc7.*" when number is before gender).
See the whole POS tagset summary in pdf. (It is obsolete in some parts.)
Czech part-of-speech tagset overview
This part-of-speech tagset is also used for Slovak corpora processed by Majka morphological analyzer (tagger).
Common attributes
| Part of speech (k) | |
| k1 | noun |
| k2 | adjective |
| k3 | pronoun |
| k4 | number |
| k5 | verb |
| k6 | adverb |
| k7 | preposition |
| k8 | conjunction |
| k9 | particle |
| k0 | interjection |
| kA | abbreviation |
| kI | punctuation |
Example to find all verbs: [tag="k5.*"]
Negation (adjectives, verbs, adverbs)
| Negation (e) | |
| eA | Affirmation |
| eN | Negation |
Example to find all feminine verbs in negative forms: [tag="k5eNgF.*"]
Gender (nouns, adjectives, pronouns, numbers)
| Gender (g) | Example | |
| gM | Animate masculine | |
| gI | Inanimate masculine | |
| gN | Neuter | |
| gF | Feminine | |
| gR | Family (surname)* | Havlovi |
Example to find all neuter nouns: [tag="k1gN.*"] or all masculine nouns [tag="k1g(M|I).*"]
Person (pronouns, verbs)
| Person (p) | |
| p1 | First |
| p2 | Second |
| p3 | Third |
Example to find all third-person pronouns: [tag="k3p3.*"]
Number (nouns, adjectives, pronouns, numerals)
| Number (n) | |
| nS | Singular |
| nP | Plural |
Example to find all plural numbers: [tag="k4.*nP.*"]
Case (nouns, adjectives, pronouns, numerals, prepositions)
| Case (c) | |
| c1 | nominative |
| c2 | genitive |
| c3 | dative |
| c4 | accusative |
| c5 | vocative |
| c6 | locative |
| c7 | instrumental |
Example to find all instrumental adjectives in plural: [tag="k2.*nPc7.*"]
Degree (adjectives, adverbs)
| Degree (d) | |
| d1 | Positive |
| d2 | Comparative |
| d3 | Superlative |
Example to find all comparative adjectives: [tag="k2.*d2.*"]
Stylistic flag (nouns, adjectives, pronouns, numerals, verbs, adverbs, prepositions, conjuctions, particles)
| Stylistic flag (w) | |
| wA | Archaism |
| wB | Poeticism |
| wC | Only in corpora |
| wE | Expressive |
| wH | Conversational |
| wK | Bookish |
| wO | Regional |
| wR | Rare |
| wZ | Obsolete |
noun (k1) subclassification
For example: [tag="k1xP.*"]
| Description | Example | |
| x | special paradigm | |
| P | – | půl, čtvrt |
pronoun (k3) subclassification
| Type (x) | |
| xP | personal |
| xO | possessive |
| xD | demonstrative |
| xT | deliminative |
| Type (y) | |
| yF | reflexive |
| yQ | interrogative |
| yR | relative |
| yN | negative |
| yI | indeterminate |
number (k4) subclassification
| Type (x) | |
| xC | cardinal |
| xO | ordinal |
| xR | reproductive |
| Type (y) | |
| yN | Negative |
| yI | Indeterminate |
verb (k5) subclassification
| Aspect (a) | |
| aP | Perfect |
| aI | Imperfect |
| Type (m) | |
| mF | infinitive |
| mI | present Indicative |
| mR | imperative |
| mA | past participle (active participle) |
| mN | passive participle (n/t-participle) |
| mS | present transgressive (present) |
| mD | past transgressive |
| mB | future indicative |
adverb (k6) subclassification
| Type (x) | |
| xD | Demonstrative |
| xT | Delimitative |
| Type (y) | |
| yQ | Interrogative |
| yR | Relative |
| yN | Negation |
| yI | Indeterminate |
| *type (t) | |
| tS | Status |
| tD | Modal |
| tT | Expresses time |
| tA | Expresses respect |
| tC | Expresses reason |
| tL | Expresses place |
| tM | Expresses manner |
| tQ | Expresses extent |
conjunction (k8) subclassification
| Type (x) | |
| xC | Coordinate |
| xS | Subordinate |
punctuation (kI) subclassification
| punctuation list (x) | |
| x. | .?! |
| x, | ,:; |
| x” | “„“‚ ‘ |
| x( | ({[< |
| x) | )}]> |
| x~ | ~$%^&-_+=|/# etc. |
Further tag features
| Tag | Note |
| wH | 795 |
| rD,rD | INF : ADJ-cí |
| rD,rD | INF : ADJ-ší |
| rD,rD,rD,rD | INF : ADJ-ý : SUBST-í : ADJ-n//-t |
| rD,rD,rD | INF : SUBST-í : ADJ-cí |
| rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-cí : SUBST-í : ADJ-ý : ADJ-n//-t |
| rD,rD,rD | INF : SUBST-í : ADJ-ší |
| rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ší : ADJ-ší : SUBST-í : ADJ-ý : ADJn//-t |
| rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-cí |
| rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-cí : SUBST-í : ADJ-ý : ADJn//-t |
| rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-ší |
| rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-ší : SUBST-í : ADJ-ý : ADJn//-t |
| rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-cí |
| rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-cí : ADJ-cí |
| rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-ší |
| rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-ší : ADJ-ší |
| rD,rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : SUBST-í : ADJ-ý : ADJn//-t |
| : ADJ-cí | |
| rD,rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : SUBST-í : ADJ-ý : ADJn//-t |
| : ADJ-ší | |
| rD,rD,rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : SUBST-í : ADJ-ý : ADJn//-t |
| : ADJ-ší : ADJ-ší | |
| rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : SUBST-í : ADJ-ý : ADJ-n//-t : ADJcí |
| rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : SUBST-í : ADJ-ý : ADJ-n//-t : ADJší |
| _,hF | SUBST : FEMPOSS |
| _,hM | SUBST : MASKPOSS |
| _,_,hM,hF,_,hR | M : F : Mpřivl : Fpřivl : rodina : Rpřivl |
| wZ | Obsolete |
| wB | Poeticism |
| tQ | Expresses extent |
| tA | Expresses respect |
| tL | Expresses place |
| tT | Expresses time |
| tC | Expresses reason |
| tM | Expresses manner |
| tD | Modal adverb |
| tS | Status adverb |
| wR | Rare |
| hT | Represents thing |
| hP | Represents person |
| xC | Cardinal numeral |
| xO | Ordinal numeral |
| xR | Reproductive numeral |
| yQ | Interrogative |
| yR | Relative |
| xD | Demonstrative |
| yN | Negative |
| xT | Delimitative |
| yI | Indeterminate |
| xP | Personal pronomina |
| yF | Reflexive pronomina |
| xO | Possessive pronomina |
| xC | Coordinate conjunction |
| xS | Subordinate conjunction |
| c1 | Preposition with first case |
| c2 | Preposition with second case |
| c3 | Preposition with third case |
| c4 | Preposition with fourth case |
| c6 | Preposition with sixth case |
| c7 | Preposition with seventh case |
| aP | Perfect |
| aI | Imperfect |
| aB | Biaspectual |
| wH | Conversational |
| wN | Dialectal |
Reference
JAKUBÍČEK, Miloš, Vojtěch KOVÁŘ a Pavel ŠMERK. Czech Morphological Tagset Revisited. In Horák, Rychlý. Proceedings of Recent Advances in Slavonic Natural Language Processing 2011. Brno: Tribun EU, 2011, pp. 29-42, 14 s. ISBN 978-80-263-0077-9.




