You are here:Home/hrWaC – Croatian corpus from the web
hrWaC: Croatian corpus from the web
The Croatian web corpus (hrWaC) is a Croatian corpus made up of texts collected from the Internet. The corpus was prepared according to standards described in the document A Corpus Factory for Many Languages (Kilgarriff et al. at LREC 2010). The corpus was created in January 2014 with the total size over 1.2 billion words.