MATAS: the Morphologically Annotated Lithuanian Corpus
The Morphologically Annotated Lithuanian Corpus (MATAS) is a language corpus made up of different text genres. The corpus was compiled and prepared by the Center of Computational Linguistics (CCL) at Vytautas Magnus University. The corpus consists of 739,176 words with manual annotation which indicates detail grammatical category. Texts are extracted from the Corpus of the Contemporary Lithuanian Language at CCL (100-million-word corpus).
For more information see https://clarin.vdu.lt/xmlui/handle/20.500.11821/9?show=full
MATAS corpus is manually annotated at morphological level with the following POS tagset.
Access to the corpus is only limited to academic use. To gain access, send an email to firstname.lastname@example.org with a proof of your academic affiliation.