A page relevant to corpora.

Pages

TED_en corpus

A corpus of transcripts of TED talks. Prepared by Akshay Min…

Scottish Gaelic Wiki corpus

Scottish Gaelic Wikipedia corpus. Downloaded in February 2015.…

Polish Web Corpus (PolishWaC)

Polish web as corpus has 103 million words and the encoding is…

Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…