A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Tibetan part-of-speech tagset is available in Tibetan corpora annotated with a Rule-based Part-of-speech Tagger for Classical Tibetan developed by a research project ‘Tibetan in Digital Communication’ hosted at SOAS, University of London.

An Example of a tag in the CQL concordance search box[tag="n.prop"] finds all proper nouns, e.g. བོད་, རབ་འབྱོར་(note: please make sure that you use straight double quotation marks)

Basic part-of-speech tagset

POS categories POS tag
Adjectives adj
Adverbs adv..*
Case markers case..*
Clitics cl..*
Converbs cv..*
Demonstratives, determiners, etc. d..*
Nouns n..*
Negation neg
Numbers num..*
Pronouns p..*
Verbs (and verbal nouns) v..*  (n.v..*)

Detailed POS tagset

POS tag Description
adj adjective
adv.dir directional adverb
adv.intense intensive adverb
adv.mim mimetic adverb
adv.proclausal proclausal adverb
adv.temp temporal adverb
case.abl ablative (affix -las after a noun phrase)
case.agn agentive (affixes -kyis, -gyis, -gis, -yis, -s)
case.all allative (affix -la after a noun phrase)
case.ass associative (affix -daṅ after a noun phrase)
case.comp comparative (affixes -bas and -pas after a noun phrase)
case.ela ellative (affix -las after a noun phrase)
case.gen genitive (affixes -kyi, -gyi, -gi, -yi, -ḥi)
case.loc locative (affix -na after a noun phrase)
case.nare quotative (affixes -na, -re)
case.term terminative (affixes -du, -tu, -su, -ru, -r)
cl.lta  clitic lta in the combinations lta ste and na lta
cl.tsam  the clitics -tsam
cl.focus  the focus clitics ni
cl.quot  the quotative clitics ces
cv.abl affix -las after a verb stem
cv.agn affixes -gis
cv.all affix -la after a verb stem
cv.are affix -ta-re and its allomorphs after a verb stem
cv.ass affix -da? after a verb stem
cv.ela affix -las after a verb stem
cv.fin affixes -to
cv.gen affixes -gi
cv.imp affixes -cig
cv.impf affixes -ci?
cv.loc affix -na after a verb stem
cv.ques affixes -tam and its allomorphs.
cv.sem affixes -te
cv.term affixes -tu
d.dem  demonstratives
d.det  determiners
d.emph  emphatics
d.indef indefinites
d.plural plurals
d.tsam tsam
dunno  a word that we have not been able to analyze
interj interjection
n..* noun
n.count  lexical nouns
n.mass  mass nouns
n.prop  proper nouns
n.rel  relator nouns
n.v.aux auxiliary verbal noun
n.v.cop copula verbal noun
n.v.fut future verbal noun
n.v.fut.n.v.past future/past verbal noun
n.v.fut.n.v.pres future/present verbal noun
n.v.imp imperative verbal noun
n.v.invar invariable verbal noun
n.v.neg negative verbal noun
n.v.past past verbal noun
n.v.past.n.v.pres past/present verbal noun
n.v.pres present verbal noun
neg two negation prefixes ma and mi
num.* numeral
num.card cardinal number
num.ord ordinal number
numeral numeral
p.indef  indefinite pronouns
p.interrog  interrogative pronouns
p.pers  personal pronouns
p.refl personal reflexive
punc punctuation mark
sent end of sentence punctuation
v.aux  auxiliary verbs
v.cop  copula verbs
v.cop.neg negative copula verb
v.fut  future verb stem
v.fut.v.past  future/past verb stem
v.fut.v.pres  future/present verb stem
v.imp imperative verb stem
v.invar invariable verb stem
v.neg the inherently negative verb med
v.past past verb stem
v.past.v.pres past/present verb stem
v.pres present verb stem

Note: word forms with and without tsheg (e.g. ཐོག་ and ཐོག) are separate lexical entries, but they are both normalized to the same form in attribute “notsheg”.


http://larkpie.net/tibetancorpus/ http://eprints.soas.ac.uk/18282/2/1%20POS%20categories.pdf


Garrett, Edward and Hill, Nathan W. and Zadoks, Abel (2014) ‘A Rule-based Part-of-speech Tagger for Classical Tibetan.’ Himalayan Linguistics, 13 (1). pp. 9-57. (CC BY-NC-ND 4.0)