A parallel corpus 40 languages
The OPUS2 parallel corpus is a set of text corpora with aligned sentences which allow searching and analysing translations between all the languages.
The parallel corpora were collected, prepared and aligned by Joerg Tiedermann in the OPUS project (see http://opus.lingfil.uu.se/). We are most grateful to him for his great work and co-operation. The data were processed by Sketch Engine using a range of lemmatisers, part-of-speech taggers and sketch grammars.
The OPUS2 corpora are the second version with m:n the alignment, which allows for just one corpus per language.
This parallel corpus can be searched and analysed monolingually or multilingually using all available tools in Sketch Engine: concordances, collocations, word lists and more.
The OPUS project in Sketch Engine contains 40 languages: Afrikaans, Albanian, Arabic, Bosnian, Bulgarian, Chinese Simplified, Chinese Traditional, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Norwegian, Persian, Polish, Portuguese, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian.