You are here:Home/bsWaC – Bosnian corpus from the web
bsWaC: Bosnian corpus from the web
The Bosnian web corpus (bsWaC) is a Bosnian corpus made up of texts collected from the Internet. The corpus was prepared according to standards described in the document A Corpus Factory for Many Languages (Kilgarriff et al. at LREC 2010). The corpus was created in January 2014 and its overall size is 248 million words.