A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense, etc.) of each token in a text corpus.
Part-of-speech tagset for Indian Languages such as Bengali, Hindi, Kannada, Telugu, etc. created in terms of the Indian Language Machine Translation (ILMT) project comprising various Indian languages.
An Example of a tag in the CQL concordance search box: [tag="NN.*|NST
"]
finds all nouns, e.g. ಮೇಲೆ, ಬಗ್ಗೆ (note: please make sure that you use straight double quotation marks)
Tagset
PoS Tag | Description | Note/Example |
---|---|---|
CC | Conjunction (co-ordinating and subordinating) | bole (Bangla) |
CL | Classifier | |
DEM | Demonstrative | |
ECH | Echo word | |
INJ | Interjection | |
INTF | Intensifier | |
JJ | Adjective | |
NEG | Negation | |
NN | Noun | |
NNP | Proper noun | |
NST | Noun denoting spatial or temporal expressions | |
PRP | Pronoun | |
PSP | Postposition | |
QC | Cardinal number | |
QF | Quantifier | bahut, tho.DA, kam (Hindi) |
QO | Ordinal number | |
RB | Adverb | *Only manner verb |
RDP | Reduplication | |
RP | Particle | bhI, to, hI, jI, hA.N, na, |
SYM | Special symbol | |
UNK | Unknown | |
UT | Quotative | ani (Telugu), endru (Tamil), bole/mAne (Bangla), mhaNaje (Marathi), mAne (Hindi) |
VAUX | Verb Auxiliary | |
VM | Verb Main | |
WQ | Question Word |
Source: crawled from Wayback Machine at http://ltrc.iiit.ac.in/tr031/posguidelines.pdf
or