Yiddish corpus from Wikipedia

The Yiddish Wikipedia Corpus (yiwiki) is a Yiddish corpus made up of texts collected from Yiddish internet encyclopedia Wikipedia in December 2018. The corpus consists of 2 million words.

Tools to work with the Yiddish corpus

A complete set of tools is available to work with this Wikipedia Yiddish corpus to generate:

  • word lists – lists of Yiddish words organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

Search the Yiddish corpus

Sketch Engine offers a range of tools to work with this Yiddish corpus from Wikipedia.


Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.