Languages in Sketch Engine

This page lists all supported languages for which there are publicly available corpora. Languages with user corpora only are not included.

The features available for each language and sometimes even for each corpus differ because corpora in Sketch Engine come from different sources.

Preloaded corpora features

Click a language blow to list the features available for the language and a list of preloaded corpora. Refer to the corpus details page for information about available features for each preloaded corpus.

User corpora features

Refer to the table below to check whether the following features will be available:

POS – Yes – user corpora will be tagged for parts of speech

WS – Yes – Word Sketches Grammar available for the language

POS – No, WS – Yes – user corpora cannot be tagged for parts of speech but some corpora were tagged outside of Sketch Engine so Word Sketch Grammar is available for the language, the user corpus can make use of Word Sketches if the corpus is tagged externally with the same tagset as the one used in the Word Sketch Grammar

Language PoS tagging Lemmatization Word sketches Terms
Afrikaans none
Albanian none
Amazigh none
Amharic none
Ancient Greek none
Arabic basic
Armenian none
Azerbaijani none
Basque none
Belarusian none
Bengali none
Bosnian none
Breton none
Bulgarian full
Burmese none
Cantonese none
Catalan full
Cebuano none
Chinese Simplified full
Chinese Traditional full
Croatian full
Cundeelee Wangka none
Czech full
Danish full
Dutch full
English full
Esperanto none
Estonian full
Filipino none
Finnish full
French full
Frisian none
Galician none
Georgian none
German full
Greek full
Gujarati none
Hausa (Boko) none
Hebrew none
Hindi none
Hungarian full
Icelandic none
Igbo none
Indonesian none
Irish none
Italian full
Japanese full
Kalaamaya none
Kannada none
Kazakh none
Khmer none
Korean basic
Kurdish (Kurmanji) none
Kurdish (Sorani) none
Kuwarra none
Kyrgyz none
Lao none
Latin none
Latvian none
Limburgish none
Lithuanian none
Macedonian none
Maduwongga none
Malay none
Malayalam none
Maldivian none
Maltese none
Mankulatjarra none
Manyjiljar none
Maori none
Marathi none
Marlpa none
Mirning none
Mongolian none
Montenegrin none
N'Ko none
Ndebele none
Nepali none
Newspeak none
Ngaanyatjarra none
Ngaju none
Ngalia none
Nganta none
Northern Sotho none
Norwegian (Mixed) full
Norwegian Bokmål full
Norwegian Nynorsk full
Nyakinyaki none
Oromo none
Pashto none
Persian none
Pintupi none
Pitjantjatjara none
Polish full
Portuguese full
Punjabi (Shahmukhi) none
Quechua none
Romanian full
Russian full
Samoan none
Sanskrit (romanised) none
Scottish Gaelic none
Serbian full
Serbian (Latin) full
Sesotho none
Setswana none
Sinhalese none
Slovak full
Slovenian full
Somali none
Spanish full
Swahili basic
Swazi none
Swedish full
Syriac none
Tagalog none
Tajik none
Talysh none
Tamil none
Tatar none
Telugu none
Thai none
Tibetan full
Tigrinya none
Tjalkatjarra none
Tjupan none
Tsonga none
Turkish none
Turkmen none
Ukrainian none
Urdu none
Uzbek none
Venda none
Vietnamese none
Wangkatja none
Warlpiri none
Welsh none
Wudjaarri none
Xhosa none
Yankunytjatjara none
Yiddish none
Yoruba none
Zulu none