DGT Translation Memory parallel corpus

DGT-Translation Memory is a database of aligned sentences from the European Union’s legislative documents (Acquis Communautaire) in 24 EU languages. Sketch Engine offers this database as parallel corpora which can be searched. Detailed information and how to cite the corpora can be found in the bibliography.

The DGT-Translation Memory consists of 24 European languages:

Bulgarian German Polish
Czech Greek Portuguese
Danish Hungarina Romanian
Dutch Irish Croatian
English Italian Slovak
Estonian Latvian Slovenian
Finnish Lithuanian Spanish
French Maltese Swedish

The aligned texts come from a large translation memory DGT published by The European Commission.

The individual corpora have been processed by the latest processing tools available in Sketch Engine.

Tools to work with the DGT Translation Memory parallel corpus

A complete set of Sketch Engine tools is available to work with this set of parallel corpora to generate:

  • word sketch – collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • keywords – terminology extraction of one-word and multi-word units
  • word lists – lists of nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

For a more detailed description of the DGT-TM, including more statistics on the resource, see the following publication. When making reference to DGT-TM in scientific publications, please refer to:

Steinberger, R., Eisele, A., Klocek, S., Pilos, S., & Schlüter, P. (2013). DGT-TM: A freely available translation memory in 22 languagesarXiv preprint arXiv:1309.5226.

For a contrastive overview of DGT-TM and the other multilingual text resources offered for download on this site, you can read the following journal article:

Steinberger, R., Ebrahim, M., Poulis, A., Carrasco-Benitez, M., Schlüter, P., Przybyszewski, M., & Gilbro, S. (2014). An overview of the European Union’s highly multilingual parallel corporaLanguage resources and evaluation48(4), 679-707.

Search the DGT Translation Memory

Sketch Engine offers a range of tools to work with the DGT Translation Memory parallel corpus.

or

Tip

Learn to work with multilingual and parallel corpora in Sketch Engine. Refer to the user guide.

More parallel corpora

EUR-Lex 2/2016 parallel corpora – texts from the EUR-Lex database containing public EU documents

Eur-Lex judgments 12/2016 parallel corpora – judgments of the Court of Justice of the European Union

Europarl spoken parallel corpora – transcriptions of the European Parliament Proceedings

Open Parallel Corpus (OPUS) – translated texts from various sources, e.g. medical documents, subtitles, technical documentation, etc.

OpenSubtitles 2018 parallel corpora – movie subtitles from the OpenSubtitles database

United Nations Parallel Corpus (UNPC) – official records and other parliamentary documents of the United Nation

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.