A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense, etc.) of each token in a text corpus.

Part-of-speech tagset for Indian Languages such as Bengali, Hindi, Kannada, Telugu, etc. created in terms of the Indian Language Machine Translation (ILMT) project comprising various Indian languages.

An Example of a tag in the CQL concordance search box[tag="NN.*|NST"] finds all nouns, e.g. ಮೇಲೆ, ಬಗ್ಗೆ  (note: please make sure that you use straight double quotation marks)


PoS Tag Description Note/Example
CC Conjunction (co-ordinating and subordinating) bole (Bangla)
CL Classifier  
DEM Demonstrative  
ECH Echo word  
INJ Interjection  
INTF Intensifier  
JJ Adjective  
NEG Negation  
NN Noun  
NNP Proper noun
NST Noun denoting spatial or temporal expressions  
PRP Pronoun  
PSP Postposition  
QC Cardinal number  
QF Quantifier bahut, tho.DA, kam (Hindi)
QO Ordinal number  
RB Adverb *Only manner verb
RDP Reduplication  
RP Particle bhI, to, hI, jI, hA.N, na,
SYM Special symbol  
UNK Unknown  
UT Quotative ani (Telugu), endru (Tamil), bole/mAne (Bangla), mhaNaje (Marathi), mAne (Hindi)
VAUX Verb Auxiliary  
VM Verb Main
WQ Question Word c
*C (XC) compound where X is a variable of the
type of the compound of which the current word is a member of

Source: crawled from Wayback Machine at http://ltrc.iiit.ac.in/tr031/posguidelines.pdf

Hindi part-of-speech tagset scheme in detail

Each PoS tag is composed of the main PoS tag written in capital letters (e.g. NN – noun) and five further categories separated by a dot providing detailed information about the particular token. Unused categories are replaced with a dot (e.g. NNP.unk.… – proper noun unknown).

For example, a noun tag NN.n.m.sg.3.d consists of the following categories and their values.

category value description
main PoS tag NN noun
coarse PoS tag n noun
gender m masculine
number sg singular
person 3 the third person
case d direct

To find all possible main PoS tags, see the list above.

The list of coarse POS tags follows:

value description
adj adjective
adv adverb
avy avvya – indeclinable and some functional words, e.g. या
n noun
num numeral
pn pronoun
psp postposition
punc punctuation
unk unknown
v verb

The list of values of the gender category:

value description
any any gender
f feminine
m masculine
n neuter
punc punctuation
. not applicable

The list of values of the number category:

value description
any any number
pl plural
sg singular
. not applicable

The list of values of the person category:

value description
any any person
1 the first person
2 the second person
2h the second person honorific
3 the third person
. not applicable

The list of values of the case category:

value description
any any case
d direct
o oblique
. not applicable

Source: https://bitbucket.org/sivareddyg/hindi-part-of-speech-tagger/src/master/README.md

Corpora of Indian languages

Sketch Engine offers dozens of corpora of Indian languages.