The corpus collection of 40 languages
The OPUS2 parallel corpus is a set of text corpora which have aligned sentences so sentences correspond the same sentences in other languages. OPUS project collects 40 languages. On account of this, users can check translation sentence pairs for many languages.
The parallel corpora available here have been collected, prepared and aligned by Joerg Tiedermann in the OPUS project (see http://opus.lingfil.uu.se/). We are most grateful to him for his great work and co-operation. The data was prepared for the Sketch Engine using a range of lemmatisers, part-of-speech taggers and Sketch Grammars.
The OPUS2 corpora are the second version having the alignment m:n, which allows for just one corpus per language.
OPUS an open source parallel corpus allows users to search bilingual and multilingual data in many languages, find concordances, collocations, word list and more.
The OPUS project in Sketch Engine contains 40 languages: Afrikaans, Albanian, Arabic, Bosnian, Bulgarian, Chinese Simplified, Chinese Traditional, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Norwegian, Persian, Polish, Portuguese, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian.