Languages in Sketch Engine

This page lists all supported languages for which there are publicly available corpora. Languages with user corpora only are not included.

The features available for each language and sometimes even for each corpus differ because corpora in Sketch Engine come from different sources.

Preloaded corpora features

Click a language blow to list the features available for the language and a list of preloaded corpora. Refer to the corpus details page for information about available features for each preloaded corpus.

User corpora features

Refer to the table below to check whether the following features will be available:

POS – Yes – user corpora will be tagged for parts of speech

WS – Yes – Word Sketches Grammar available for the language

POS – No, WS – Yes – user corpora cannot be tagged for parts of speech but some corpora were tagged outside of Sketch Engine so Word Sketch Grammar is available for the language, the user corpus can make use of Word Sketches if the corpus is tagged externally with the same tagset as the one used in the Word Sketch Grammar

LanguagePoS taggingLemmatizationWord sketchesTerms
-- other -- none
Afrikaans full
Albanian none
Amharic none
Arabic full
Armenian none
Assamese none
Azerbaijani none
Bashkir none
Basque none
Belarusian none
Bengali none
Bosnian none
Breton none
Bulgarian full
Burmese none
Cantonese none
Catalan full
Cebuano none
Chinese Simplified full
Chinese Traditional full
Crimean Tatar none
Croatian full
Cundeelee Wangka none
Czech full
Danish full
Dutch full
English full
Esperanto none
Estonian full
Filipino full
Finnish full
French full
Frisian none
Galician none
Georgian none
German full
Greek full
Gujarati none
Hausa (Boko) none
Hebrew none
Hindi full
Hungarian full
Icelandic basic
Igbo none
Indonesian full
Irish full
Italian full
Japanese full
Kannada none
Kazakh none
Khmer none
Korean basic
Kurdish (Kurmanji) none
Kurdish (Sorani) none
Kyrgyz none
Lao full
Latin none
Latvian full
Limburgish none
Lithuanian none
Macedonian none
Maduwongga none
Malay none
Malayalam none
Maldivian none
Maltese none
Maori none
Marathi none
Marlpa none
Mongolian none
Montenegrin none
N'Ko none
Ndebele none
Nepali none
Newspeak none
Ngaanyatjarra none
Northern Iroquoian none
Norwegian full
Norwegian Bokmål full
Norwegian Nynorsk full
Oromo none
Pashto none
Persian none
Pitjantjatjara none
Polish full
Portuguese full
Punjabi (Gurmukhi) none
Punjabi (Shahmukhi) none
Romanian full
Russian full
Samoan none
Sanskrit (romanised) none
Scottish Gaelic none
Serbian full
Serbian (Latin) full
Sesotho none
Setswana none
Sinhalese none
Slovak full
Slovenian full
Somali none
Spanish full
Swahili basic
Swazi none
Swedish full
Syriac none
Tagalog full
Tajik none
Tamil none
Tatar none
Telugu none
Thai none
Tibetan full
Tigrinya none
Turkish none
Turkmen none
Ukrainian full
Urdu none
Uzbek none
Venda none
Vietnamese none
Welsh none
Xhosa none
Yiddish none
Yoruba none
Zulu none