The news archive with news about Sketch Engine. Subscribe to news by email

ThaiWaC corpus

The corpus is prepared by Corpus factory method. Full details…

TurkishWaC corpus

The TurkishWaC corpus is a 32 million word collection of samples…

UKWaCsst corpus

UKWaC tagged with SuperSenseTagger (​sst-light) described in…

DANTE: A Detailed, Accurate, Extensive, Available English Lexical Database

Here we present some sample queries on the database and corresponding…

GujarathiWaC corpus

FrWac web as corpus is a corpus of Gujarati language (Indo-Aryan…

Patakis corpus

Patakis is a 100 million word collection of POS-tagged texts…

GeorgianWaC corpus

Original file owner: bharat.

FinnishWaC corpus

Finnish web as corpus.

danishWaC corpus

The corpus prepared by Corpus factory method. It has 288 million…

Domain Specific Corpora

These corpora are prepared from specific domains, e.g. science,…

ScienceBlog corpus

The ScienceBlogs corpus is a selection of posts and comments…

e-flux corpus

The e-flux corpus is a web corpus of English art news digests.…