A general purpose multilingual corpus available in Sketch Engine
THe EUR-Lex Corpus is a multilingual corpus in all the official languages of the European Union. The corpus has been built from HTML files available in EUR-Lex database. Thanks to the coverage of a vast area of subjects, the corpus is an excellent general purpose resource for anyone looking for translation examples in many languages.
A substantial part of the documents is translated into all official languages of the European Union (currently 24). Languages which joined the EU later are represented by smaller corpora proportional to the length of their membership.
Technically speaking, the documents are segmented and aligned on paragraph level. This means that the user can search for a matching paragraph containing the translation. The paragraphs are, however, fine-grained and usually correspond with sentences which means that the user is able to search for matching sentences or very short paragraphs.
Sketch Engine offers also the smaller corpus of judgments of the European Parliament, see more.
How to get the data
The EUR-Lex corpus is released under CC-BY-NC-SA licence. Because of the file size, please email us at firstname.lastname@example.org first and we will set up a temporary download link for you. Data are supplied as vertical text with an alignment file. The total size is 220 GB. For the original documents, see the official EUR-Lex website.
For commercial use
Please contact us for a quote.
How to cite
Please, consider mentioning Lexical Computing Ltd in Acknowledgements and referring to the original paper (below) if you use EUR-Lex corpus.
Vít Baisa, Jan Michelfeit, Marek Medveď, Miloš Jakubíček: European Union Language Resources in Sketch Engine. In The Proceedings of tenth International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA). Portorož, Slovenia. 2016.
Important copyright notice
© European Union, 1998-2016
Except where otherwise stated, reuse of the EUR-Lex data for commercial or non-commercial purposes is authorised provided the source is acknowledged (see above). The reuse policy of the European Commission is implemented by the Commission Decision of 12 December 2011. Some documents, like the International Accounting Standards, may be subject to special conditions of use, which are mentioned in the respective Official Journal. For all other copyright issues regarding EUR-Lex, please contact email@example.com.