A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Indian part-of-speech tagset created in terms of the Indian Language Machine Translation (ILMT) project comprising various Indian languages.
or
An Example of a tag in the CQL concordance search box: [tag="N.*"]finds all nouns, e.g. ಮೇಲೆ, ಬಗ್ಗೆ (note: please make sure that you use straight double quotation marks)
Tagset
| Category | Subcategory | Part-of-speech tag |
| NOUN | Common | NC.* |
| Proper | NP.* | |
| Verbal | NV.* | |
| Spatio-temporal | NST | |
| VERB | Main | VM.* |
| Auxiliary | VA.* | |
| PRONOUN | Pronominal | PPR.* |
| Reflexive | PRF.* | |
| Reciprocal | PRC.* | |
| Relative | PRL.* | |
| Wh-pronoun | PWH.* | |
| NOMINAL MODIFIER | Adjective | JJ.* |
| Quantifier | JQ.* | |
| DEMONSTRATIVE | Absolute | DAB.* |
| Relative | DRL.* | |
| Wh | DWH.* | |
| ADVERB | Manner | AMN.* |
| Location | ALC.* | |
| PARTICIPLE | Verbal (Adverbial) | LV.* |
| Conditional | LC.* | |
| PARTICLE | Coordinating | CCD.* |
| Subordinating | CSB.* | |
| Classifier | CCL.* | |
| Interjection | CIN.* | |
| Others | CX.* | |
| Postposition | PP | |
| Punctuation | PU | |
| RESIDUAL | Foreign word | RDF |
| Symbol | RDS | |
| Others | RDX | |
Attributes and their tags
| ATTRIBUTE SYMBOL | Valuesymbol | ||
| NUMBERNUM | Singularsg | Pluralpl | |
| PERSONPER | First1 | Second2 | Third3 |
| TENSETNS | Presentprs | Pastpst | Futurefut |
| CASE MARKERCSM | Accusativeacc | Genitivegen | Locativegen |
| ASPECTASP | Simplesmp | Progressiveprg | Perfectpft |
| MOODMOOD | Declarativedcl | Imperativeimp | Habitualhab |
| FINITENESSFIN | Finitefin | Non-finitenfn | Infiniteifn |
| DISTRIBUTIVEDSTB | Yesy | Non | |
| DEFINITENESS | Yesy | Non | |
| EMPHATICEMPH | Yesy | Non | |
| NEGATIVENEG | Yesy | Non | |
| HONORIFICITYHON | Yesy | Non | |
| NUMERALNML | Ordinalord | Cardinalcrd | Non-numeralnnm |
| REALIS | Realisrls | Irrealisils | |
Common value for all the attributes:
- Not-applicable (0)
– When any value is not applicable to the category or the relevant morpho-syntactic feature is not available.
– When the category is a binary valued category, i.e., the values of a particular Attribute are ‘yes’ and ‘no’ as in the case of Emphatic, Negative, Definiteness etc.; annotate/select the value as ‘yes’ only when the morphological attribute is present. Otherwise, annotate as ‘no’. - Undecided or doubtful (x)
– when the annotator is not sure about a possible attribute, instead of marking on the basis of doubt, tag it as ‘x’, e.g., inherently ambiguous cases would be given priority of the contexts; but if they still remain disambiguated, annotate the attributes to be ‘x’.
Source: https://catalog.ldc.upenn.edu/docs/LDC2010T16/Annotation_Guidelines_for_Bangla.pdf
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.




