FidaPLUS: the Slovenian Reference Corpus
The Slovenian Reference Corpus (FidaPLUS) is a language corpus made up of texts collected from various types of text domains. The FidaPLUS corpus consists of various texts, from literary novels to everyday newspapers and the language of web blogs and social media. The corpus contains around 600 million words.
The corpus is supplied by the Centre for Language Resources and Technologies, University of Ljubljana.
The FidaPLUS corpus was annotated with the software developed by the Amebis software company. This tool uses POS tagset of the MULTEXT-East Morphosyntactic Slovenian Specification version 3.