Russian part-of-speech tagset

A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

This is the Russian multilingual MULTEXT-East specifications tagset version 4 that is used in Russian corpora tagged by RFTagger.

These specifications follow the (draft) Version 4 of the multilingual MULTEXT-East specifications, which can be found at http://nl.ijs.si/ME.

The basic idea is that for each major category (Noun, Verb, Adjective, etc) the specifications define a fixed set of attributes (Case, Number, Gender, Animacy, etc), each with its set of values (e.g. masculine, feminine, neuter). Each category-dependent attribute is assigned a position, and each of its values a one letter code, so a complete morphosyntactic description of a word can be encoded by a MorphoSyntactic Descriptions (MSDs). For instance, the attribute-value specification Category = Noun, Type = common, Gender = masculine, Number = singular, Case = accusative, Animate = no corresponds to the MSD Ncmsan. In case a certain attribute is not appropriate for a given combination of features or for a particular lexical item, its code is the hyphen, e.g. Afpns-s, where the case for Adjective qualificative positive neuter singular is undefined, when in the short form.

Russian tagsets

used in Sketch Engine

about Sketch Engine

Therefore, the tag Vmip3p-m-e- is to be interpreted, character by character, as follows:

V	Category	Verb
m	Type	main
i	VForm	indicative
p	Tense	present
3	Person	third
p	Number	plural
–	Gender	–
m	Voice	media
–	Definiteness	–
e	Aspect	perfective
–	Case	–

An Example of a tagin the CQL concordance search box: [tag=”Vmip3p-m-e-“] finds examples like: (сигареты) курятся, (книги) читаются or (фильмы) смотрятся (note: please make sure that you use straight double quotation marks)

Basic overview of Russian tagset

noun	N.*
verb	V.*
adjective	A.*
pronoun	P.*
adverb	R.*
adposition	S.*
conjunction	C.*
numeral	M.*
particle	Q.*
interjection	I.*
abbreviation	Y.*
residual	X.*

Content

Noun

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Noun	N
1	Type	common	c
		proper	p
2	Gender	masculine	m
		feminine	f
		neuter	n
		common	c
3	Number	singular	s
		plural	p
4	Case	nominative	n
		genitive	g
		dative	d
		accusative	a
		vocative	v
		locative	l
		instrumental	i
5	Animate	no	n
		yes	y
6	Case2	partitive	p
		locative	l

source: Russian Noun

Verb

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Verb	V
1	Type	main	m
		auxiliary	a
2	VForm	indicative	i
		imperative	m
		conditional	c
		infinitive	n
		participle	p
		gerund	g
3	Tense	present	p
		future	f
		past	s
4	Person	first	1
		second	2
		third	3
5	Number	singular	s
		plural	p
6	Gender	masculine	m
		feminine	f
		neuter	n
7	Voice	active	a
		passive	p
		media	m
8	Definiteness	shortart	s
		fullart	f
9	Aspect	progressive	p
		perfective	e
		biaspectual	b
10	Case	nominative	n
		genitive	g
		dative	d
		accusative	a
		locative	l
		instrumental	i

source: Russian Verb

Adjective

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Adjective	A
1	Type	qualificative	f
		possessive	s
2	Degree	positive	p
		comparative	c
		superlative	s
3	Gender	masculine	m
		feminine	f
		neuter	n
4	Number	singular	s
		plural	p
5	Case	nominative	n
		genitive	g
		dative	d
		accusative	a
		locative	l
		instrumental	i
6	Definiteness	short-art	s
		full-art	f

source: Rusian Adjective

Pronoun

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Pronoun	P
1	Type	personal	p
		demonstrative	d
		indefinite	i
		possessive	s
		interrogative	q
		relative	r
		reflexive	x
		negative	z
		nonspecific	n
2	Person	first	1
		second	2
		third	3
3	Gender	masculine	m
		feminine	f
		neuter	n
4	Number	singular	s
		plural	p
5	Case	nominative	n
		genitive	g
		dative	d
		accusative	a
		vocative	v
		locative	l
		instrumental	i
6	Syntactic_Type	nominal	n
		adjectival	a
		adverbial	r
7	Animate	no	n
		yes	y

source: Russsian Pronoun

Adverb

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Adverb	R
1	Degree	positive	p
		comparative	c
		superlative	s

source: Russian Adverb

Adposition

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Adposition	S
1	Type	preposition	p
2	Formation	simple	s
		compound	c
3	Case	genitive	g
		dative	d
		accusative	a
		locative	l
		instrumental	i

source: Russian Adposition

Conjunction

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Conjunction	C
1	Type	coordinating	c
		subordinating	s
2	Formation	simple	s
		compound	c
3	Coord_Type	sentence	p
		words	w
4	Sub_Type	negative	z
		positive	p

source: Russian Conjunction

Numeral

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Numeral	M
1	Type	cardinal	c
		ordinal	o
		multiple	m
		collect	l
2	Gender	masculine	m
		feminine	f
		neuter	n
3	Number	singular	s
		plural	p
4	Case	nominative	n
		genitive	g
		dative	d
		accusative	a
		locative	l
		instrumental	i
5	Form	digit	d
		roman	r
		letter	l
6	Animate	no	n
		yes	y

source: Russian Numeral

Particle

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Particle	Q
1	Formation	simple	s
		compound	c

source: Russian Particle

Interjection

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Interjection	I
1	Formation	simple	s
		compound	c

source: Russian Interjection

Abbreviation

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Abbreviation	Y
	Syntactic_Type	nominal	n
		adverbial	r
2	Gender	masculine	m
		feminine	f
		neuter	n
3	Number	singular	s
		plural	p
		paucal	c
4	Case	nominative	n
		genitive	g
		dative	d
		accusative	a
		locative	l
		instrumental	i

source: Russian Abbreviation

Residual

P	Attribute (en)	Value (en)	Code (en)
0	CATEGORY	Residual	X

source: Russian Residual

Appendix A Index of Categories
Appendix B Index of Attributes
Appendix C Index of Values
Appendix D Lexical MSDs

(This page was taken from MULTEXT-East Home Page)

Russian text corpora in Sketch Engine

Sketch Engine offers dozens of Russian language corpora.

Basic overview of Russian tagset

Content

Russian text corpora in Sketch Engine

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine

Russian part-of-speech tagset – multilingual MULTEXT-East specifications, version 4

Basic overview of Russian tagset

Content

Russian text corpora in Sketch Engine

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine