A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Russian multilingual MULTEXT-East specifications, version 4 are available in Russian corpora.
These specifications follow the (draft) Version 4 of the multilingual MULTEXT-East specifications, which can be found on http://nl.ijs.si/ME.
The basic idea is that for each major category (Noun, Verb, Adjective, etc) the specifications define a fixed set of attributes (Case, Number, Gender, Animacy, etc), each with its set of values (e.g. masculine, feminine, neuter). Each category-dependent attribute is assigned a position, and each of its values a one letter code, so a complete morphosyntactic description of a word can be encoded by a MorphoSyntactic Descriptions (MSDs). For instance, the attribute-value specification Category = Noun, Type = common, Gender = masculine, Number = singular, Case = accusative, Animate = no corresponds to the MSD Ncmsan. In case a certain attribute is not appropriate for a given combination of features or for a particular lexical item, its code is the hyphen, e.g. Afpns-s, where the case for Adjective qualificative positive neuter singular is undefined, when in the short form.