Parallel corpus from tabular data

(basic user)

The simplest way to create a parallel corpus is to upload data in a tabular format such as a spreadsheet (Excel), TMX, XML, XLIFF or other similar formats.

Spreadsheet format requirements

Spreadsheets must contain language names in the first row and then aligned segments (e.g. sentences) side by side. Each column with data is treated as data for a different language, i.e. spreadsheet for 2 languages must only contain 2 columns of data, all other columns must be empty!

1. languages = columns

  • use 2 columns for 2 languages, 3 columns for 3 languages etc. All other columns must stay empty.

2. line 1 must contain the names of the languages

3. from line 2 onward, the cells must contain the aligned segments

  • if a long English sentence was translated as two short Spanish sentences, cell A should contain one sentence and the corresponding cell B two sentences
  • if two English sentences were translated as one Spanish sentence, cell A will contain two sentences and cell B one sentence
  • The data can also be aligned at the paragraph level, i.e. cell A will contain the whole paragraph and cell B will contain the whole aligned paragraph.

For complex alignment options, use the M:N alignment

Follow these steps

  • on the corpus dashboard, click NEW CORPUS
    Create and setting up parallel corpus
  • type the corpus name and choose the file
  • Upload TMX or XLS
    • other supported formats: XLIFF (v. 2.0 and higher), CSV, TSV, XLSX
      (if XLSX does not upload correctly, try opening the file in Excel and save as Excel 97-2003 Workbook)
  • on the following screen, check the languages were identified correctly
  • click Next
  • wait for the corpus to be processed, you can leave the screen and find the corpus later in My corpora

Each language in the source file will be processed into a separate monolingual corpus and aligned with the corresponding corpus in the other language(s).


To search the corpus as a parallel corpus, first select the corpus in the language that should appear on the left and then, when setting the search criteria, select the other language(s). Multiple languages can be selected to display a multilingual concordance.

Add more texts - make the corpus bigger

When adding texts to an existing corpus, they have to be added separately to each language. The same file containing aligned data has to be uploaded as many times as there are languages in the corpus. It cannot be done in one step. Each time, Sketch Engine will extract only one language corresponding to the language of the selected corpus.

  • select the first language of the multilingual corpus
  • go to dashboard dashboard
  • click I have my own texts cloud_upload
  • upload the file and see if the processing finished successfully
  • if not, Sketch Engine couldn't correctly guess the appropriate language code. In that case, click the file name, then the gear icon, select language code manually and submit the form. Now the file should be processed correctly.
  • proceed to the next screens to compile the corpus
  • when the corpus is compiled, select the second language of the multilingual corpus and follow the same procedure to upload the data in the second language

This procedure has to be repeated as many times as there are languages in the corpus.