A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

French TreeTagger part-of-speech tagset is available in French corpora annotated by the tool TreeTagger that was developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart.

An Example of a tag in the CQL concordance search box[tag="VER:cond"] searches all verb conditionals, e.g. serait, pourrait (note: please make sure that you use straight double quotation marks)


Tag Description
ABR abreviation
ADJ adjective
ADV adverb
DET:ART article
DET:POS possessive pronoun (ma, ta, …)
INT interjection
KON conjunction
NAM proper name
NOM noun
NUM numeral
PRO pronoun
PRO:DEM demonstrative pronoun
PRO:IND indefinite pronoun
PRO:PER personal pronoun
PRO:POS possessive pronoun (mien, tien, …)
PRO:REL relative pronoun
PRP preposition
PRP:det preposition plus article (au,du,aux,des)
PUN punctuation
PUN:cit punctuation citation
SENT sentence tag
SYM symbol
VER:cond verb conditional
VER:futu verb futur
VER:impe verb imperative
VER:impf verb imperfect
VER:infi verb infinitive
VER:pper verb past participle
VER:ppre verb present participle
VER:pres verb present
VER:simp verb simple past
VER:subi verb subjunctive imperfect
VER:subp verb subjunctive present

Source: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html

French text corpora in Sketch Engine

Sketch Engine offers dozens French language corpora.