A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense, etc.) of each token in a text corpus.

CAMeL Arabic part-of-speech tagset

CAMeL Arabic part-of-speech tagset is available in Arabic corpora annotated by the CAMeL tool which is a set of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

The following table shows the Arabic CAMeL part-of-speech tagset

An Example of a tag in the CQL concordance search box: [tag="adj"] finds all adjectives, e.g. متحد, كامل (note: please make sure that you use straight double quotation marks)

PoS tag Description
abbrev abbreviation
adj adjective
adv adverb
adv_interrog interrogative adverb
adv_rel relative adverb
conj conjunction
conj_sub subordinating conjunction
digit digital numbers
foreign foreign
interj interjection
noun noun
noun_prop proper noun
noun_quant quantity noun
part particle
part_det demonstrative particle
part_focus focus particle
part_fut future marker particle
part_interrog interrogative particle
part_neg negative particle
part_verb verbal particle
part_voc vocalized particle
prep preposition
pron pronoun
pron_dem demonstrative pronoun
pron_interrog interrogative pronoun
pron_rel relative pronoun
punc punctuation
verb verb
verb_pseudo pseudo verb
xxx other

Source: https://camel-tools.readthedocs.io/en/stable/reference/camel_morphology_features.html

Arabic corpora

Sketch Engine provides access to more than 10 Arabic corpora.