A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Irish Universal dependencies tagset
It is a list of part-of-speech tags for Irish
This is the Irish RFTagger part-of-speech tagset that is used in Irish corpora annotated by the tool RFTagger trained on Irish Universal Dependencies Treebank.
An Example of a tag in the CQL concordance search box: [tag="(NOUN|PROPN).*"]
finds all nouns and proper names, e.g. cás, scoileanna (note: please make sure that you use straight double quotation marks)
Universal Dependencies tags for Irish
POS tag | Description |
ADJ | adjective |
ADP | adposition |
ADV | adverb |
AUX | auxiliary |
CCONJ | coordinating conjunction |
DET | determiner |
INTJ | interjection |
NOUN | noun |
NUM | numeral |
PART | particle |
PRON | pronoun |
PROPN | proper noun |
PUNCT | punctuation |
SCONJ | subordinating conjunction |
SYM | symbol |
VERB | verb |
X | other |
UD Morphological features for Irish
For example, to find all feminine nouns including proper names in the genitive form use the following CQL query: [tag="(NOUN|PROPN).*" & gender="Fem" & case="Gen"] or [morpho="Noun.*Case=Gen.*Gender=Fem.*"]
- not applicable – it means a particular feature should not be used, and thus its value is empty, in such cases the attribute
morpho
uses the value “.-“
Morphological feature | Description | Part of speech | Value | Example |
Abbr | Abbreviation refers to shortened forms of words or phrases. It also includes acronyms. | adjective, adverb, noun, numeral, other, proper noun, symbol | not applicable Yes |
Dr EUR |
Aspect | Aspect is a verbal property indicating how the action takes place over time (i.e., completed, repeated, habitual, ongoing). Irish language displays two grammatical aspects: Imperative aspect (Imp) and habitual aspect (Hab). | verb | not applicable Hab Imp |
bhionn (Hab) ritheadh (Imp) |
Case | Case is a grammatical category referring to the syntactic or semantic function that the specific part of speech carries out within the sentence. Irish has 4 different cases: nominative, dative, genitive, and vocative. | adjective, determiner, noun, proper noun | not applicable Nom Gen Dat Voc |
d’fhear (Nom-NOUN)
na (Gen-DET) Éirinn (Dat-PROPN) dhil (Voc-ADJ) |
Definite | Definiteness is a semantic property referring to whether the referent is identifible or not in the context.The feature applies to 3 parts of speech and takes 2 values (only 1 in case of DET and PROPN). | determiner, noun, proper noun | not applicable Def Ind |
thaca (Def-NOUN)
Gaeltachta (Def-PROPN) dlithe (Ind-NOUN) |
Degree | Adjectives provide description of entities and possibility to compare such entities by means of the so-called degrees of comparison. The feature, only applicable in Irish to adjectives, can take 3 different values: comparative (Cmp), superlative (Sup), and positive (Pos). Some words have combined values of the feature. | adjective | not applicable Cmp Cmp, Sup Sup Pos |
mó (Cmp, Sup) mór (Pos) |
Dialect | Irish has 3 main dialects, i.e., 3 different varieties of the same language. The feature occurs with 9 part-of-speech tags and takes 3 values: Munster, Ulster, and Connaught. | adverb, adposition, auxiliary verb, determiner, noun, particle, pronoun, proper noun, verb | not applicable Munster Ulster Connaught |
deineadh (Munster-VERB)
fá (Ulster-ADP) caidé (Connaught-PRON) |
Foreign | The feature refers to foreign words (i.e., foreignisms). It takes the value boolean “yes” when a foreign word is present. The feature is used with 8 part-of speech tags. | adjective, adposition, determiner, other, noun, pronoun, proper noun, symbol | not applicable Yes |
Education (NOUN) all (DET) |
Form | Form is a language-specific feature covering morphology of direct/indirect relative markers, and initial mutation. The feature occurs with 13 part-of-speech tags and with 12 different values: direct (direct), indirect (indirect), eclipsis (Ecl), emphatic (Emp), lenition (Len), h-prefix (HPref), vowel form (VF). Some words have combined values of the feature.
**7 values in UD; 12 in sketch engine (+Cmpd, Cop, Part, Vnoun, Inf) |
adjective, adposition, adverb, auxiliary verb, determiner, noun, numeral, other, particle, pronoun, proper noun, subordinating conjunction, verb | not applicable Cmpd Cop Direct Direct, Emp Ecl Ecl, Emp Ecl, Indirect Emp Emp, Len HPref Indirect Inf Len Part Vnoun VF |
ba (Cop-PART)
mBaile (Ecl-PROPN) thri (Len-NUM) |
Gender | Gender is typically a lexical element of nouns and an inflectional feature of other parts of speech (e.g., adjectives) that mark agreement with nouns.The feature applies to 7 parts of speech and has two different features: masculine (Masc) and feminine (Fem). Some words have combined values of the feature. | adjective, adposition, auxiliary verb, determiner, noun, pronoun, proper noun | not applicable Fem Masc Fem, Masc |
hArdeaglaise (Fem-NOUN)
daonna (Masc-ADJ) |
Mood | Mood is the verbal feature expressing the attitude of speakers towards what is being conveyed (e.g., assessment, desire, command). The feature is universal except for Int value which is language-specific. The feature has 5 different values and 1 combination for some words is detected (i.e., Cnd, Int). | auxiliary verb, particle, verb | not applicable Cnd Cnd, Int Imp Ind Int Sub |
seol (Imp-Verb) lean (Ind-VERB) go (Sub-PART) |
NounType | Plurals in Irish are formed in a variety of ways depending on gender, number, and case, as well as noun type. NounType is a language-specific feature which affects both nouns and adjectives. It applies to 3 parts of speech and can take 3 different values: strong plurals (Strong), weak plurals (Weak), broad consonants (NotSlender), slender consonants (Slender). | adjective, noun, proper noun | not applicable NotSlender Slender Strong Weak |
mblianta (Strong-NOUN)
cearta (NotSlender-ADJ) Cliath (Weak-PROPN) |
Number | Number is a grammatical category indicating a quantity. The feature occurs with 9 part-of-speech tags and can take 2 values: plural (Plur) and singular (Sing). | adjective, adposition, auxiliary verb, determiner, noun, particle, pronoun, proper noun, verb | not applicable Plur Sing |
thaca (Sing-NOUN) cearta (Plur-ADJ) na (Plur-DET) |
NumType | Numerals can take different forms according to the language system involved. In Irish, the feature occurs with 1 part of speech (i.e., numerals) and can take two different values: cardinal (Card) and ordinal (Ord). | numeral | not applicable Card Ord |
trí (Card) dtríú (Ord) |
PartType | Irish makes use of a wide range of different particles performing various functions which are also reflected in their forms. The feature is language-specific. It occurs with only 1 part of speech (particles) and 11 different values are detected: adverbial (Ad), comparative (Comp), complementizer (Compl), copular (Cop), degree (Deg), infinitive (Inf), numeral (Num), patronym (Pat), superlative (Sup), verbal (Vb), vocative (Voc). | particle | not applicable Ad Cmpl Comp Cop Deg Inf Num Pat Sup Vb Voc |
gur (Vb) go (Cmpl) a (Inf) |
Person | Person is the way of referring to someone taking part in an event. The feature is universal except for “0” value, which is language-specific. The feature occurs with 5 parts of speech and takes 4 different values: zero person (0) for impersonal statements, first person (1) regarding the addresser, second person (2) regarding the addressee, third person (3) regarding neither the addresser nor the addressee. | adposition, auxiliary verb, determiner, pronoun, verb | not applicable 0 1 2 3 |
not applicable |
Polarity | Polarity refers to words occurring in either positive or negative utterances. The feature applies to 3 parts of speech and takes 1 value: negative. | auxiliary verb, particle, verb | not applicable Neg |
nach ndearna |
Poss | The feature tells us whether an item is possessive or not. The feature is used with 2 parts of speech and takes 1 (boolean) value: Yes. | adposition, determiner | not applicable Yes |
mo (DET) ina (ADP) |
PrepForm | The feature is language-specific and refers to cases where a preposition combines with a noun to give a compound preposition. It occurs with 2 parts of speech and takes only 1 value: compound preposition (Cmpd). | adposition, noun | not applicable Cmpd |
aice measc feadh |
PronType | Pronominal type applies to pronouns and other pronominal forms (e.g., determiners). The feature is universal except for the value “Emp” which is language-specific. It occurs with 7 different parts of speech and takes 7 different values: article (Art), relative pronoun, determiner, numeral or adverb (Rel), demonstrative pronoun, determiner, numeral or adverb (Dem), indefinite pronoun, determiner, numeral or adverb (Ind), personal or possessive personal pronoun or determiner (Prs), interrogative pronoun, determiner, numeral or adverb (Int), emphatic determiner (Emp). | adposition, adverb, auxiliary verb, determiner, particle, pronoun, verb | not applicable Art Dem Emp Ind Int Prs Rel |
an (Art-DET) atá (Rel-VERB) sin (Dem-AUX) |
Reflex | The category tells us whether the item is reflexive or not. It applies to 2 parts of speech and takes 1 boolean value: Yes. | pronoun, proper noun | not applicable Yes |
féin |
Tense | Tense is a grammatical category typical of verbs. It tells us whether the action occurs in the past, present or future. In Irish, it occurs with 5 different parts of speech and takes 3 values: present tense (Pres), past tense (Past), future tense (Fut). | adverb, auxiliary verb, particle, subordinating conjunction, verb | not applicable Pres Past Fut |
bhfuil (Pres-VERB) sular (Past-SCONJ) beidh (Fut-VERB) |
Typo | The feature is language-specific and refers to erroneous mispellings leading to unexpected word forms. It applies to 11 parts of speech and takes 1 boolean value: Yes. | adjective, adposition, adverb, coordinating conjunction, determiner, noun, numeral, particle, pronoun, proper noun, verb | not applicable Yes |
aistharraingt (NOUN)
amhain (ADJ) said (PRON) |
VerbForm | Form of verb or deverbative is the category indicating those forms having features from both verbs and other parts of speech. The category applies to 6 parts of speech and takes and occurs with 4 different values: infinitive (Inf), copula (Cop), participle (Part), verbal noun (Vnoun). | adjective, auxiliary verb, noun, particle, pronoun, subordinating conjunction | not applicable Inf Cop Part Vnoun |
beartaithe (Part-ADJ)
arbh (Cop-AUX) tabhairt (Inf-NOUN) |
Source: https://universaldependencies.org/ga/index.html