ttWaC: Tatar web corpus
The Tatar web corpus (ttWaC) is a Tatar corpus made up of texts collected from the Internet. The corpus was prepared according to standards described in the document A Corpus Factory for Many Languages (Kilgarriff et al. at LREC 2010).
Data was crawled by the SpiderLing web spider in 2015. The corpus consists of 200,000 words.