A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Russian multilingual MULTEXT-East specifications, version 4 are available in Russian corpora.

These specifications follow the (draft) Version 4 of the multilingual MULTEXT-East specifications, which can be found on http://nl.ijs.si/ME.

The basic idea is that for each major category (Noun, Verb, Adjective, etc) the specifications define a fixed set of attributes (Case, Number, Gender, Animacy, etc), each with its set of values (e.g. masculine, feminine, neuter). Each category-dependent attribute is assigned a position, and each of its values a one letter code, so a complete morphosyntactic description of a word can be encoded by a MorphoSyntactic Descriptions (MSDs). For instance, the attribute-value specification Category = Noun, Type = common, Gender = masculine, Number = singular, Case = accusative, Animate = no corresponds to the MSD Ncmsan. In case a certain attribute is not appropriate for a given combination of features or for a particular lexical item, its code is the hyphen, e.g. Afpns-s, where the case for Adjective qualificative positive neuter singular is undefined, when in the short form.

Therefore, the tag Vmip3p-m-e- is to be interpreted, character by character, as follows:

V Category Verb
m Type main
i VForm indicative
p Tense present
3 Person third
p Number plural
Gender
m Voice media
Definiteness
e Aspect perfective
Case

An Example of a tagin the CQL concordance search box: [tag=”Vmip3p-m-“] finds examples like: (сигареты) курятся, (книги) читаются or (фильмы) смотрятся (note: please make sure that you use straight double quotation marks)

Basic overview of Russian tagset

noun N.*
verb V.*
adjective A.*
pronoun P.*
adverb R.*
adposition S.*
conjunction C.*
numeral M.*
particle Q.*
interjection I.*
abbreviation Y.*
residual X.*

Content

P Attribute (en) Value (en) Code (en)
0 CATEGORY Noun N
1 Type common c
proper p
2 Gender masculine m
feminine f
neuter n
common c
3 Number singular s
plural p
4 Case nominative n
genitive g
dative d
accusative a
vocative v
locative l
instrumental i
5 Animate no n
yes y
6 Case2 partitive p
locative l

source: Russian Noun

P Attribute (en) Value (en) Code (en)
0 CATEGORY Verb V
1 Type main m
auxiliary a
2 VForm indicative i
imperative m
conditional c
infinitive n
participle p
gerund g
3 Tense present p
future f
past s
4 Person first 1
second 2
third 3
5 Number singular s
plural p
6 Gender masculine m
feminine f
neuter n
7 Voice active a
passive p
media m
8 Definiteness short­art s
full­art f
9 Aspect progressive p
perfective e
biaspectual b
10 Case nominative n
genitive g
dative d
accusative a
locative l
instrumental i

source: Russian Verb

P Attribute (en) Value (en) Code (en)
0 CATEGORY Adjective A
1 Type qualificative f
possessive s
2 Degree positive p
comparative c
superlative s
3 Gender masculine m
feminine f
neuter n
4 Number singular s
plural p
5 Case nominative n
genitive g
dative d
accusative a
locative l
instrumental i
6 Definiteness short­-art s
full-­art f

source: Rusian Adjective

P Attribute (en) Value (en) Code (en)
0 CATEGORY Pronoun P
1 Type personal p
demonstrative d
indefinite i
possessive s
interrogative q
relative r
reflexive x
negative z
nonspecific n
2 Person first 1
second 2
third 3
3 Gender masculine m
feminine f
neuter n
4 Number singular s
plural p
5 Case nominative n
genitive g
dative d
accusative a
vocative v
locative l
instrumental i
6 Syntactic_Type nominal n
adjectival a
adverbial r
7 Animate no n
yes y

source: Russsian Pronoun

P Attribute (en) Value (en) Code (en)
0 CATEGORY Adverb R
1 Degree positive p
comparative c
superlative s

source: Russian Adverb

P Attribute (en) Value (en) Code (en)
0 CATEGORY Adposition S
1 Type preposition p
2 Formation simple s
compound c
3 Case genitive g
dative d
accusative a
locative l
instrumental i

source: Russian Adposition

P Attribute (en) Value (en) Code (en)
0 CATEGORY Conjunction C
1 Type coordinating c
subordinating s
2 Formation simple s
compound c
3 Coord_Type sentence p
words w
4 Sub_Type negative z
positive p

source: Russian Conjunction

P Attribute (en) Value (en) Code (en)
0 CATEGORY Numeral M
1 Type cardinal c
ordinal o
multiple m
collect l
2 Gender masculine m
feminine f
neuter n
3 Number singular s
plural p
4 Case nominative n
genitive g
dative d
accusative a
locative l
instrumental i
5 Form digit d
roman r
letter l
6 Animate no n
yes y

source: Russian Numeral

P Attribute (en) Value (en) Code (en)
0 CATEGORY Particle Q
1 Formation simple s
compound c

source: Russian Particle

P Attribute (en) Value (en) Code (en)
0 CATEGORY Interjection I
1 Formation simple s
compound c

source: Russian Interjection

P Attribute (en) Value (en) Code (en)
0 CATEGORY Abbreviation Y
Syntactic_Type nominal n
adverbial r
2 Gender masculine m
feminine f
neuter n
3 Number singular s
plural p
paucal c
4 Case nominative n
genitive g
dative d
accusative a
locative l
instrumental i

source: Russian Abbreviation

P Attribute (en) Value (en) Code (en)
0 CATEGORY Residual X

source: Russian Residual


Appendix A Index of Categories
Appendix B Index of Attributes
Appendix C Index of Values
Appendix D Lexical MSDs

(This page was taken from MULTEXT-East Home Page)

Russian text corpora in Sketch Engine

Sketch Engine offers dozens of Russian language corpora.

or