A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Penn Treebank tagset
Sanskrit Penn Treebank part-of-speech tagset is available in Sanskrit corpora annotated by the tool TreeTagger that was developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart and containing modifications developed by Sketch Engine.
The following table shows Sanskrit Penn TreeBank part-of-speech tagset including Sketch Engine differences
An Example of a tag in the CQL concordance search box: [xpos="JJ"]
finds all adjectives, e.g. mahat, priya (note: please make sure that you use straight double quotation marks)
POS Tag | Description | Example |
CAD | adverb | ca, tatra, punaḥ, na |
CADP | preverbs | abhi, pra |
CCD | coordinating conjunction | ca (‘and’), tu (‘but’), vā (‘or’) |
CCM | particles for comparison | iva (‘like’), yathā (‘as, like’) |
CEM | emphatic particle | eva (‘indeed’), ha (‘really’) , u (‘and’) |
CGDA | absolutive gerund | gam/gatvā (‘go/having gone’), kṛ, dṛś, śru |
CGDI | infinitive | gam/gantum (‘go/to go’), dṛś/dṛśe (‘see/to see’), śru, kṛ |
CNG | negation | na (‘not’), mā (‘no’) |
CQT | quotation particle | iti (‘thus’) |
CSB | subordinating conjunction | yat (‘that’), yadi (‘if’), yadā (‘when’) |
CX | other adverbs and indeclinables | itthā (‘so, thus’), tathā (‘such’), tatas (‘further, here and there’), hi (‘of course, for’) |
JJ | adjective | mahat, mahātman, priya |
JQ | quantifying adjective | bahu (‘many, much’), sarva (‘all, every,..’) |
KDG | gerundive | gam/gantavya (‘to go/to be gone’), kṛ/kartavya (‘to do/to be done’) |
KDP | participle | gam/gacchat (‘to go/going’), dṛś/darzana (‘to see/seeing ‘) |
NC | common noun | deva, loka, agni |
NUM | number | aṣṭan, tri, śata |
PPP | past participle | gam/gata (‘to go/gone’), smṛ/smṛta (‘to remember/remembered’) |
PPR | personal pronoun | mad (‘my, mine’), tvad (‘ urs’) |
PPX | other words inflected like pronouns | para (‘other’), itara (‘another’), eka (‘one’), ubh (‘both’) |
PRC | reciprocal pronoun | paraspara (‘mutually’), anyonya (‘each other’), ekaika (‘each one’) |
PRD | demonstrative pronoun | tad (‘that’), idam (‘this’), adas (‘thus’) |
PRI | indefinite pronoun | kaścit (‘someone’), kaścana (‘anyone’) |
PRL | relative pronoun | yad (‘what/which’) *** [yad ca / ‘and what, what’s more’] |
PRQ | interrogative pronoun | ka (‘who’), katama (‘which/what’), katara (‘which’) |
V | finite verbal form | bhū, vac, as, kṛ, gam |
Source: https://github.com/OliverHellwig/sanskrit