TenTen Corpus Family
The TenTen Corpus Family (TenTen corpora) is a collection of text corpora created from the Web. TenTen corpora are prepared according to the same criteria that can guarantee quality result corpus texts and also an option to compare them with each other.
The name TenTen refers to the target corpus size 10+ billion words per language. These Tenten corpora are currently available in 30+ languages, such as English, Spanish, Japanese, Chinese, Greek, Estonian, Ukrainian etc.