SDeWaC is a subset of DeWaC. The creation of sDeWaC is described in detail here. Corpus release announcement can be viewed here.

We thank Janina Kopp and Niels Ott for parsing sDeWaC (TBExpanded. Details about pos-tagger, morphological analyser, dependency parser and computing resources used).

Word Sketches are extracted from the Dependency Parsed sDeWaC using the method described in the article below.


Bharat Ram Ambati, Siva Reddy and Adam Kilgarriff (2012). Word Sketches for Turkish. In LREC (pp. 2945–2950).