Languages in Sketch Engine
At least one preloaded corpus exists in the languages listed below.
Preloaded corpora features
Click a language to see the available features. Refer to the corpus details page for information about the concrete corpus.
User corpora features
When building your own corpus, these features are supported. If your language is not listed, Sketch Engine can also accept your data but some advanced functions may not be available. More details»
POS – Yes – user corpora will be tagged for parts of speech
WS – Yes – Word Sketches Grammar available for the language
POS – No, WS – Yes – user corpora cannot be tagged for parts of speech but some corpora were tagged outside of Sketch Engine so Word Sketch Grammar is available for the language, the user corpus can make use of Word Sketches if the corpus is tagged externally with the same tagset as the one used in the Word Sketch Grammar
| Language | PoS tagging | Lemmatization | Word sketches | Terms |
|---|---|---|---|---|
| Afrikaans | ✓ | ✓ | full | ✓ |
| Albanian | ⓧ | ⓧ | none | ⓧ |
| Amharic | ⓧ | ⓧ | none | ⓧ |
| Arabic | ✓ | ✓ | full | ✓ |
| Armenian | ⓧ | ⓧ | none | ⓧ |
| Assamese | ⓧ | ⓧ | none | ⓧ |
| Azerbaijani | ⓧ | ⓧ | none | ⓧ |
| Bashkir | ⓧ | ⓧ | none | ⓧ |
| Basque | ⓧ | ⓧ | none | ⓧ |
| Belarusian | ⓧ | ⓧ | none | ⓧ |
| Bengali | ⓧ | ⓧ | none | ⓧ |
| Bosnian | ✓ | ✓ | full | ✓ |
| Breton | ⓧ | ⓧ | none | ⓧ |
| Bulgarian | ✓ | ✓ | full | ✓ |
| Burmese | ⓧ | ⓧ | none | ⓧ |
| Cantonese | ⓧ | ⓧ | none | ⓧ |
| Catalan | ✓ | ✓ | full | ⓧ |
| Cebuano | ⓧ | ⓧ | none | ⓧ |
| Chinese Simplified | ✓ | ⓧ | full | ✓ |
| Chinese Traditional | ✓ | ⓧ | full | ✓ |
| Classical Syriac | ⓧ | ⓧ | none | ⓧ |
| Crimean Tatar | ✓ | ✓ | none | ⓧ |
| Croatian | ✓ | ✓ | full | ✓ |
| Cundeelee Wangka | ⓧ | ⓧ | none | ⓧ |
| Czech | ✓ | ✓ | full | ✓ |
| Danish | ✓ | ✓ | full | ✓ |
| Dutch | ✓ | ✓ | full | ✓ |
| English | ✓ | ✓ | full | ✓ |
| Esperanto | ⓧ | ⓧ | none | ⓧ |
| Estonian | ✓ | ✓ | full | ✓ |
| Filipino | ✓ | ✓ | full | ⓧ |
| Finnish | ✓ | ✓ | full | ✓ |
| French | ✓ | ✓ | full | ✓ |
| Frisian | ⓧ | ⓧ | none | ⓧ |
| Galician | ⓧ | ⓧ | none | ⓧ |
| Georgian | ⓧ | ⓧ | none | ⓧ |
| German | ✓ | ✓ | full | ✓ |
| Greek | ✓ | ✓ | full | ✓ |
| Gujarati | ⓧ | ⓧ | none | ⓧ |
| Hausa (Boko) | ⓧ | ⓧ | none | ⓧ |
| Hebrew | ✓ | ✓ | none | ⓧ |
| Hindi | ✓ | ✓ | full | ⓧ |
| Hungarian | ✓ | ✓ | full | ✓ |
| Icelandic | ✓ | ✓ | basic | ⓧ |
| Igbo | ⓧ | ⓧ | none | ⓧ |
| Indonesian | ✓ | ✓ | full | ⓧ |
| Irish | ✓ | ✓ | full | ⓧ |
| Italian | ✓ | ✓ | full | ✓ |
| Japanese | ✓ | ✓ | full | ✓ |
| Kannada | ⓧ | ⓧ | none | ⓧ |
| Kazakh | ⓧ | ⓧ | none | ⓧ |
| Khmer | ⓧ | ⓧ | none | ⓧ |
| Korean | ✓ | ✓ | basic | ✓ |
| Kurdish (Kurmanji) | ⓧ | ⓧ | none | ⓧ |
| Kurdish (Sorani) | ⓧ | ⓧ | none | ⓧ |
| Kyrgyz | ⓧ | ⓧ | none | ⓧ |
| Lao | ✓ | ⓧ | full | ⓧ |
| Latin | ⓧ | ⓧ | none | ⓧ |
| Latvian | ✓ | ✓ | full | ✓ |
| Limburgish | ⓧ | ⓧ | none | ⓧ |
| Lithuanian | ✓ | ✓ | full | ✓ |
| Macedonian | ⓧ | ⓧ | none | ⓧ |
| Maduwongga | ⓧ | ⓧ | none | ⓧ |
| Malay | ✓ | ✓ | full | ⓧ |
| Malayalam | ⓧ | ⓧ | none | ⓧ |
| Maldivian | ⓧ | ⓧ | none | ⓧ |
| Maltese | ⓧ | ⓧ | none | ⓧ |
| Maori | ⓧ | ⓧ | none | ✓ |
| Marathi | ⓧ | ⓧ | none | ⓧ |
| Marlpa | ⓧ | ⓧ | none | ⓧ |
| Mongolian | ⓧ | ⓧ | none | ⓧ |
| Montenegrin | ⓧ | ⓧ | none | ⓧ |
| N'Ko | ⓧ | ⓧ | none | ⓧ |
| Ndebele | ⓧ | ⓧ | none | ⓧ |
| Nepali | ⓧ | ⓧ | none | ⓧ |
| Newari | ⓧ | ⓧ | none | ⓧ |
| Newspeak | ⓧ | ⓧ | none | ⓧ |
| Ngaanyatjarra | ⓧ | ⓧ | none | ⓧ |
| Northern Iroquoian | ⓧ | ⓧ | none | ⓧ |
| Norwegian | ✓ | ✓ | basic | ✓ |
| Norwegian Bokmål | ✓ | ✓ | basic | ✓ |
| Norwegian Nynorsk | ✓ | ✓ | full | ✓ |
| Oromo | ⓧ | ⓧ | none | ⓧ |
| Pashto | ⓧ | ⓧ | none | ⓧ |
| Persian | ⓧ | ⓧ | none | ⓧ |
| Pitjantjatjara | ⓧ | ⓧ | none | ⓧ |
| Polish | ✓ | ✓ | full | ✓ |
| Portuguese | ✓ | ✓ | full | ✓ |
| Punjabi (Gurmukhi) | ⓧ | ⓧ | none | ⓧ |
| Punjabi (Shahmukhi) | ⓧ | ⓧ | none | ⓧ |
| Romanian | ✓ | ✓ | full | ✓ |
| Russian | ✓ | ✓ | full | ✓ |
| Samoan | ⓧ | ⓧ | none | ⓧ |
| Sanskrit (romanised) | ⓧ | ⓧ | none | ⓧ |
| Scottish Gaelic | ⓧ | ⓧ | none | ⓧ |
| Serbian | ✓ | ✓ | full | ✓ |
| Serbian (Latin) | ✓ | ✓ | full | ✓ |
| Sesotho | ⓧ | ⓧ | none | ⓧ |
| Setswana | ⓧ | ⓧ | none | ⓧ |
| Sinhalese | ⓧ | ⓧ | none | ⓧ |
| Slovak | ✓ | ✓ | full | ✓ |
| Slovenian | ✓ | ✓ | full | ✓ |
| Somali | ⓧ | ⓧ | none | ⓧ |
| Spanish | ✓ | ✓ | full | ✓ |
| Swahili | ✓ | ✓ | basic | ⓧ |
| Swazi | ⓧ | ⓧ | none | ⓧ |
| Swedish | ✓ | ✓ | full | ✓ |
| Tagalog | ✓ | ✓ | full | ⓧ |
| Tajik | ⓧ | ⓧ | none | ⓧ |
| Tamil | ⓧ | ⓧ | none | ⓧ |
| Tatar | ⓧ | ⓧ | none | ⓧ |
| Telugu | ⓧ | ⓧ | none | ⓧ |
| Thai | ⓧ | ⓧ | none | ⓧ |
| Tibetan | ✓ | ✓ | full | ⓧ |
| Tigrinya | ⓧ | ⓧ | none | ⓧ |
| Turkish | ✓ | ⓧ | basic | ⓧ |
| Turkmen | ⓧ | ⓧ | none | ⓧ |
| Ukrainian | ✓ | ✓ | full | ✓ |
| Urdu | ⓧ | ⓧ | none | ⓧ |
| Uzbek | ⓧ | ⓧ | none | ⓧ |
| Venda | ⓧ | ⓧ | none | ⓧ |
| Vietnamese | ✓ | ⓧ | full | ✓ |
| Welsh | ⓧ | ⓧ | none | ⓧ |
| Xhosa | ⓧ | ⓧ | none | ⓧ |
| Yiddish | ⓧ | ⓧ | none | ⓧ |
| Yoruba | ⓧ | ⓧ | none | ⓧ |
| Zulu | ⓧ | ⓧ | none | ⓧ |




