The new Latvian Corpus 2021 now available in Sketch Engine. The corpus is enriched with part-of-speech tagging and lemmatization.
Perfect for #corpuslinguistics, #digitalhumanities, #linguistics, #lexicography, and #nlp.
We’ve published the Urdu corpus 2021 with 328 million words and topic and genre classification. Urdu is the 11th most spoken language worldwide (Ethnologue, 2025).
🔗 https://t.co/Ovhv6wJnGt#corpuslinguistics #TextAnalysis #اردو pic.twitter.com/NlHswfgr3L— Sketch Engine (@SketchEngine) January 29, 2026






