A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Indonesian TreeTagger PoS Tagset
Indonesian tagset is available in Indonesian corpora annotated by the tool TreeTagger (with the Indonesian parameter file) developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart.
The following table shows the Indonesian Penn TreeBank part-of-speech tagset.
An Example of a tag in the CQL concordance search box:
[tag="PR"] finds all demonstrative pronouns, e.g. ini, itu, sini (note: please make sure that you use straight double quotation marks)
|CC||Coordinating conjunction, also called coordinator.
Coordinating conjunction links two or more syntactically equivalent parts of a sentence. Coordinating conjunction can link independent clauses, phrases, or words.
|dan, tetapi, atau|
Cardinal numbers, i.e. numerals which are the answers to the question “How much?” or “How many?”, include:
a. cardinal units, e.g. dua ‘two’,
b. group numbers, e.g. juta ‘million’,
c. full numbers, e.g. enam ‘six’ and 7916, d. fractions, e.g. sepertiga ‘one-third’,
e. decimal numbers, e.g. 0,025 and 0,525, f. indefinite numbers, e.g. banyak ‘many’, g. collective numbers, e.g. kedua ‘both’,
berpuluh-puluh ‘tens’, and ribuan ‘thousands’, h. dates, and
|dua, juta, enam, 7916, sepertiga, 0,025, 0,525, banyak, kedua, ribuan, 2007, 25|
Ordinal number indicates an ordered position in a series, e.g. ketiga ‘third’.
|ketiga, ke-4, pertama|
|DT||Determiner / article.
Article is a determiner, i.e. grammatical unit which limits the potential referent of a noun phrase, whose basic role is to mark noun phrases as either definite or indefinite.
|para, sang, si|
Foreign word is a word which comes from foreign language and basically is not yet included in Indonesian dictionary.
If a foreign word is part of a proper noun or name, that word will be labeled NNP.
|climate change, terms and conditions|
A preposition links word or phrase and constituent in front of that preposition and results prepositional phrase.
|dalam, dengan, di, ke, oleh, pada, untuk|
Adjectives, i.e. words which describe, modify, or specify some properties of the head noun of the phrase, include:
a. conditions, e.g. bersih ‘clean’,
b. sizes, e.g. panjang ‘long’ and kecil ‘small’, c. colors, e.g. hitam ‘black’,
d. durations, e.g. lama ‘long (duration)’, e. distances, e.g. jauh ‘far’,
f. emotions or feelings, e.g. marah ‘angry’, g. senses, e.g. manis ‘sweet’,
h. membership of a group, e.g. nasional ‘national’, and
i. shapes, e.g. bulat ‘round’.
|bersih, panjang, hitam, lama, jauh, marah,
suram, nasional, bulat
|MD||Modal and auxiliary verb.||boleh, harus, sudah, mesti, perlu|
|NEG||Negation.||tidak, belum, jangan|
Nouns, i.e. words which refer to human, animal, thing, concept, or understanding, include: a. flora and fauna, e.g. monyet ‘monkey’, b. locative nouns and nouns which indicate a
place or direction, e.g. bawah ‘beneath’, c. nouns which indicate time, e.g. sekarang ‘now’, and
d. currencies which are not written in the form of symbols, e.g. rupiah.
Proper noun is a specific name of a person, thing, or place. Proper nouns include:
a. personal name, e.g. Boediono,
b. the name of a geographical place, e.g. Laut Jawa,
c. the name of a country, state, or region, e.g. Indonesia,
d. the name of organization, institution, or company, e.g. Bank Mandiri,
e. stock symbols, e.g. BBKP,
f. the names of months, e.g. Januari,
g. the days of the week, e.g. Senin,
h. the name of the feast, e.g. Idul Fitri,
i. the name of competition, championship, award, or historical event, e.g. Piala Dunia, and
j. the title of a work, television show, or movie, e.g. Lord of the Rings: The Return of the King.
Proper noun which is written in foreign language is labeled NNP.
Abbreviated proper noun is labeled NNP.
If a proper noun consists of more than one words or parts, each word or part of that proper noun will be labeled NNP.
|Boediono, Laut Jawa, Indonesia, India,
Januari, Senin, Idul
Fitri, Piala Dunia, Liga Primer, Lord of the
Rings: The Return of the King
|NND||Classifier, partitive, and measurement noun.
Classifiers classify nouns into particular noun class, e.g. orang ‘man’.
Partitives indicate particular amount of something based on the way it is measured, assembled, or processed, e.g. tetes ‘drop’.
Measurement nouns refer to size, distance, volume, speed, weight, or temperature, e.g. ton ‘ton’.
|orang, ton, helai, lembar|
|PR||Demonstrative pronoun .
Demonstrative pronouns imply “pointing to” or “demonstrating” the object they refer to, e.g. ini ‘this’.
|ini, itu, sini, situ|
Personal pronouns, i.e. pronouns which refer to people, include:
a. the first person singular pronoun, e.g. saya ‘I’, b. the first person exclusive plural pronoun, e.g. kami ‘we (exclusive)’,
c. the first person inclusive plural pronoun, e.g. kita ‘we (inclusive)’,
d. the second person singular pronoun, e.g. kamu ‘you’,
e. the second person plural pronoun, e.g. kalian ‘you plural’,
f. the third person singular pronoun, e.g. dia ‘he, she’, and
g. the third person plural pronoun, e.g. mereka ‘they’.
|saya, kami, kita, kamu, kalian, dia, mereka|
|RB||Adverb.||sangat, hanya, justru, niscaya, segera|
In this research, POS tag RP marks emphatic particle, i.e. particle which confirms interrogative, imperative, or declarative sentences.
|pun, -lah, -kah|
|SC||Subordinating conjunction, also called subordinator.
Subordinating conjunction links two or more clauses and one of the clauses is a subordinate clause.
|sejak, jika, seandainya, supaya, meski, seolah olah, sebab, maka,
tanpa, dengan, bahwa, yang, lebih …
daripada …, semoga
Symbols, which are labeled SYM, include mathematical symbols, e.g. +, and currency symbols, e.g. IDR.
|IDR, +, %, @|
Interjection expresses feeling or state of mind and has no relation with other words syntactically.
|brengsek, oh, ooh, aduh, ayo, mari, hai|
Verbs, which are labeled VB, include transitive verbs, intransitive verbs, active verbs, passive verbs, and copulas.
If a verb consists of foreign word verb and Indonesian affixes, the resulted verb is labeled VB, e.g. di-arrange ‘arranged’.
|merancang, mengatur, pergi, bekerja, tertidur|
Question word distinguishes sentence as interrogative.
A question, called indirect question, can be placed within a declarative sentence as subordinate clause. Thus, question word, which links indirect question and the main clause in a declarative sentence, becomes subordinating conjunction and is labeled SC.
|siapa, apa, mana,
kenapa, kapan, di mana, bagaimana, berapa
A word or part of a sentence which its category is unknown or uncertain is labeled X.
Typo is also labeled X.
|Z||Punctuation.||“…”, ?, .|