BARONI, Marco, et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Language resources and evaluation, 2009, 43.3: 209-226.
Corpus factory method
Adam Kilgarriff, Siva Reddy, Jan Pomikálek, and Avinesh PVS. A corpus factory for many languages. In LREC workshop on Web Services and Processing Pipelines, Malta, May 2010.
Turkish Word sketches
Ambati, Bharat Ram, Siva Reddy, and Adam Kilgarriff (2012). Word Sketches for Turkish. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pp. 2945–2950.