A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense, etc.) of each token in a text corpus.
Part-of-speech tagset for Indian Languages such as Bengali, Hindi, Kannada, Telugu, etc. created in terms of the Indian Language Machine Translation (ILMT) project comprising various Indian languages.
An Example of a tag in the CQL concordance search box: [tag="NN.*|NST"] finds all nouns, e.g. ಮೇಲೆ, ಬಗ್ಗೆ (note: please make sure that you use straight double quotation marks)
Tagset
| PoS Tag | Description | Note/Example |
|---|---|---|
| CC | Conjunction (co-ordinating and subordinating) | bole (Bangla) |
| CL | Classifier | |
| DEM | Demonstrative | |
| ECH | Echo word | |
| INJ | Interjection | |
| INTF | Intensifier | |
| JJ | Adjective | |
| NEG | Negation | |
| NN | Noun | |
| NNP | Proper noun | |
| NST | Noun denoting spatial or temporal expressions | |
| PRP | Pronoun | |
| PSP | Postposition | |
| QC | Cardinal number | |
| QF | Quantifier | bahut, tho.DA, kam (Hindi) |
| QO | Ordinal number | |
| RB | Adverb | *Only manner verb |
| RDP | Reduplication | |
| RP | Particle | bhI, to, hI, jI, hA.N, na, |
| SYM | Special symbol | |
| UNK | Unknown | |
| UT | Quotative | ani (Telugu), endru (Tamil), bole/mAne (Bangla), mhaNaje (Marathi), mAne (Hindi) |
| VAUX | Verb Auxiliary | |
| VM | Verb Main | |
| WQ | Question Word | c |
| *C (XC) | compound | where X is a variable of the type of the compound of which the current word is a member of |
Source: crawled from Wayback Machine at http://ltrc.iiit.ac.in/tr031/posguidelines.pdf
Hindi part-of-speech tagset scheme in detail
Each PoS tag is composed of the main PoS tag written in capital letters (e.g. NN – noun) and five further categories separated by a dot providing detailed information about the particular token. Unused categories are replaced with a dot (e.g. NNP.unk.… – proper noun unknown).
For example, a noun tag NN.n.m.sg.3.d consists of the following categories and their values.
| category | value | description |
| main PoS tag | NN | noun |
| coarse PoS tag | n | noun |
| gender | m | masculine |
| number | sg | singular |
| person | 3 | the third person |
| case | d | direct |
To find all possible main PoS tags, see the list above.
The list of coarse POS tags follows:
| value | description |
| adj | adjective |
| adv | adverb |
| avy | avvya – indeclinable and some functional words, e.g. या |
| n | noun |
| num | numeral |
| pn | pronoun |
| psp | postposition |
| punc | punctuation |
| unk | unknown |
| v | verb |
The list of values of the gender category:
| value | description |
| any | any gender |
| f | feminine |
| m | masculine |
| n | neuter |
| punc | punctuation |
| . | not applicable |
The list of values of the number category:
| value | description |
| any | any number |
| pl | plural |
| sg | singular |
| . | not applicable |
The list of values of the person category:
| value | description |
| any | any person |
| 1 | the first person |
| 2 | the second person |
| 2h | the second person honorific |
| 3 | the third person |
| . | not applicable |
The list of values of the case category:
| value | description |
| any | any case |
| d | direct |
| o | oblique |
| . | not applicable |
Source: https://bitbucket.org/sivareddyg/hindi-part-of-speech-tagger/src/master/README.md
or




