Parallel corpus from tabular data
The simplest way to create a parallel corpus is to upload data in a tabular format such as a spreadsheet (Excel), TMX, XML, XLIFF or other similar formats.
Spreadsheet format requirements
Spreadsheets must contain language names in the first row and then aligned segments (e.g. sentences) side by side. Each column with data is treated as data for a different language, i.e. spreadsheet for 2 languages must only contain 2 columns of data, all other columns must be empty
Follow these steps
- on the corpus dashboard, click NEW CORPUS
- click MULTILINGUAL
- type the corpus name and choose the file
- Upload TMX or XLS
- other supported formats: XLIFF (v. 2.0 and higher), TSV, TAB, XLSX
(if xlsx does not upload correctly, try opening the file in Excel and save as Excel 97-2003 Workbook)
- on the following screen, check the languages were identified correctly
- click Next
- wait for the corpus to be processed, you can leave the screen and find the corpus later in My corpora
Each language in the source file will be processed into a separate monolingual corpus and aligned with the corresponding corpus in the other language(s).
To search the corpus as a parallel corpus, first select the corpus in the language that should appear on the left and then, when setting the search criteria, select the other language(s). Multiple languages can be selected to display a multilingual concordance.