BulgarianNC: Bulgarian National Corpus
The Bulgarian National Corpus (BulgarianNC) is a Bulgarian corpus made up of texts collected from various sources such as scanned books, transcribed data, internet texts, etc. The corpus is classified according to genre, domain, source type. It consists of 419 million words in total (both web and non-web part).
In Sketch Engine, BulgarianNC is organised hierarchically as follows:
- BulgarianNC_web: The web corpus from Bulgarian NC
- BulgarianNC_nonweb: All except the web
- BulgarianNC_all: BulgarianNC_web + BulgarianNC_nonweb –> This is a test case of our new feature – the Virtual Corpus or the Super Corpus
The Bulgarian National Corpus is PoS tagged using the following Bulgarian tagset.