A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

This Czech part-of-speech tagset is available in Czech corpora annotated by Majka morphological analyzer.

An Example of a tag in the CQL concordance search box[tag="k1.*nP.*"] finds all nouns in plural, e.g. lidé, roky (note: please make sure that you use straight double quotation marks)

The whole tag is comprised of pairs – attribute and its value – the attribute is represented by a single lower case (for numbers) and its value by a single capital letter (P for plural). Each tag starts with the 2 characters representing part of speech, e.g. k1 means noun, k2 means adjective, etc. This Czech PoS tagset is called “attributive” because each attribute consists of attribute-value pairs, e.g. gF means gender (g) feminine (F). The order of attributes and their values is canonical as follows:

kegncpamdxytzw~

This means that gender (g) precedes number (n) which is before case (c) etc. For example, the tag "k2.*gFnSc7.*" searches for all feminine singular adjectives in the instrumental cases (the incorrect form would be "k2.*nSgFc7.*” when number is before gender).

See the whole POS tagset summary in pdf.

Czech text corpora

Sketch Engine offers dozens of Czech corpora.

Czech part-of-speech tagset overview

Common attributes

Part of speech (k)
k1 noun
k2 adjective
k3 pronoun
k4 number
k5 verb
k6 adverb
k7 preposition
k8 conjunction
k9 particle
k0 interjection
kA abbreviation
kI punctuation

Example to find all verbs: [tag="k5.*"]

Negation (adjectives, verbs, adverbs)

Negation (e)
eA Affirmation
eN Negation

Example to find all feminine verbs in negative forms: [tag="k5nPgF"]

Gender (nouns, adjectives, pronouns, numbers)

Gender (g) Example
gM Animate masculine
gI Inanimate masculine
gN Neuter
gF Feminine
gR Family (surname)* Havlovi

Example to find all neuter nouns: [tag="k1gN.*"] or all masculine nouns [tag="k1g(M|I).*"]

Person (pronouns, verbs)

Person (p)
p1 First
p2 Second
p3 Third

Example to find all third-person pronouns: [tag="k3p3.*"]

Number (nouns, adjectives, pronouns, numerals)

Number (n)
nS Singular
nP Plural

Example to find all plural numbers: [tag="k4.*nP.*"]

Case (nouns, adjectives, pronouns, numerals, prepositions)

Case (c)
c1–7 First–Seventh

Example to find all instrumental adjectives in plural: [tag="k2.*nPc7.*"]

Degree (adjectives, adverbs)

Degree (d)
d1 Positive
d2 Comparative
d3 Superlative

Example to find all comparative adjectives: [tag="k2.*d2.*"]

Stylistic flag (nouns, adjectives, pronouns, numerals, verbs, adverbs, prepositions, conjuctions, particles)

Stylistic flag (w)
wA Archaism
wB Poeticism
wC Only in corpora
wE Expressive
wH Conversational
wK Bookish
wO Regional
wR Rare
wZ Obsolete

noun (k1) subclassification

For example: [tag="k1xP.*"]

Description Example
x special paradigm
P půl, čtvrt

pronoun (k3) subclassification

Type (x)
xP personal
xO possessive
xD demonstrative
xT deliminative
Type (y)
yF reflexive
yQ interrogative
yR relative
yN negative
yI indeterminate

number (k4) subclassification

Type (x)
xC cardinal
xO ordinal
xR reproductive
Type (y)
yN Negative
yI Indeterminate

verb (k5) subclassification

Aspect (a)
aP Perfect
aI Imperfect
Type (m)
mF Infinitive
mI Present Indicative
mR Imperative
mA Active part. (past)
mN Passive part.
mS Adv. part. (present)
mD Adv. part. (past)
mB Futreu indicative

adverb (k6) subclassification

Type (x)
xD Demonstrative
xT Delimitative
Type (y)
yQ Interrogative
yR Relative
yN Negation
yI Indeterminate
*type (t)
tS Status
tD Modal
tT Expresses time
tA Expresses respect
tC Expresses reason
tL Expresses place
tM Expresses manner
tQ Expresses extent

conjunction (k8) subclassification

Type (x)
xC Coordinate
xS Subordinate

punctuation (kI) subclassification

punctuation list (x)
x. .?!
x, ,:;
x” “„“‚ ‘
x( ({[<
x) )}]>
x~ ~$%^&-_+=|/# etc.

Further tag features

Tag Note
wH 795
rD,rD INF : ADJ-cí
rD,rD INF : ADJ-ší
rD,rD,rD,rD INF : ADJ-ý : SUBST-í : ADJ-n//-t
rD,rD,rD INF : SUBST-í : ADJ-cí
rD,rD,rD,rD,rD,rD INF : SUBST-í : ADJ-cí : SUBST-í : ADJ-ý : ADJ-n//-t
rD,rD,rD INF : SUBST-í : ADJ-ší
rD,rD,rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ší : ADJ-ší : SUBST-í : ADJ-ý : ADJn//-t
rD,rD,rD,rD INF : SUBST-í : ADJ-ý : ADJ-cí
rD,rD,rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ý : ADJ-cí : SUBST-í : ADJ-ý : ADJn//-t
rD,rD,rD,rD INF : SUBST-í : ADJ-ý : ADJ-ší
rD,rD,rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ý : ADJ-ší : SUBST-í : ADJ-ý : ADJn//-t
rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-cí
rD,rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-cí : ADJ-cí
rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-ší
rD,rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-ší : ADJ-ší
rD,rD,rD,rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ý : ADJ-n//-t : SUBST-í : ADJ-ý : ADJn//-t
: ADJ-cí
rD,rD,rD,rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ý : ADJ-n//-t : SUBST-í : ADJ-ý : ADJn//-t
: ADJ-ší
rD,rD,rD,rD,rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ý : ADJ-n//-t : SUBST-í : ADJ-ý : ADJn//-t
: ADJ-ší : ADJ-ší
rD,rD,rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ý : SUBST-í : ADJ-ý : ADJ-n//-t : ADJcí
rD,rD,rD,rD,rD,rD,rD INF : SUBST-í : ADJ-ý : SUBST-í : ADJ-ý : ADJ-n//-t : ADJší
_,hF SUBST : FEMPOSS
_,hM SUBST : MASKPOSS
_,_,hM,hF,_,hR M : F : Mpřivl : Fpřivl : rodina : Rpřivl
wZ Obsolete
wB Poeticism
tQ Expresses extent
tA Expresses respect
tL Expresses place
tT Expresses time
tC Expresses reason
tM Expresses manner
tD Modal adverb
tS Status adverb
wR Rare
hT Represents thing
hP Represents person
xC Cardinal numeral
xO Ordinal numeral
xR Reproductive numeral
yQ Interrogative
yR Relative
xD Demonstrative
yN Negative
xT Delimitative
yI Indeterminate
xP Personal pronomina
yF Reflexive pronomina
xO Possessive pronomina
xC Coordinate conjunction
xS Subordinate conjunction
c1 Preposition with first case
c2 Preposition with second case
c3 Preposition with third case
c4 Preposition with fourth case
c6 Preposition with sixth case
c7 Preposition with seventh case
aP Perfect
aI Imperfect
aB Biaspectual
wH Conversational
wN Dialectal


Reference

JAKUBÍČEK, Miloš, Vojtěch KOVÁŘ a Pavel ŠMERK. Czech Morphological Tagset Revisited. In Horák, Rychlý. Proceedings of Recent Advances in Slavonic Natural Language Processing 2011. Brno: Tribun EU, 2011, pp. 29-42, 14 s. ISBN 978-80-263-0077-9.