Get notified by email. Subscribe to news

ThaiWaC corpus

The corpus is prepared by Corpus factory method. Full details…

UKWaCsst corpus

UKWaC tagged with SuperSenseTagger (​sst-light) described in…

Gujarati web corpus (guWaC)

GuWac web as corpus is a corpus of Gujarati language (Indo-Aryan…

Patakis corpus

Patakis is a 100 million word collection of POS-tagged texts…

FinnishWaC corpus

Finnish web as corpus.

Domain Specific Corpora

These corpora are prepared from specific domains, e.g. science,…

e-flux corpus

The e-flux corpus is a web corpus of English art news digests.…