Learn to use the parallel concordance tool in 4 minutes. Watch this video.
Parallel concordance — searching translations
The parallel concordance only works with parallel corpora which are aligned. The parallel concordance searches for words, phrases, tags, documents, text types or corpus structures in one language and displays the results together with aligned translated segments in another language. The translated segments usually contain the translation of the search word or phrase but the translation may not be included if the translator decided to use a different way of expressing the idea. The concordance can be sorted, filtered, counted and processed further to obtain the desired result.
The view options allow displaying additional information such as lemmas, tags and other attributes, text types (metadata) or corpus structures. Sketch Engine will also try to locate the translation and highlight it.
The CQL search on the advanced tab is used for complex searches with unspecific criteria or optional criteria.
Users can use their own data to build parallel corpora.
change search criteria
view options – display tags, lemmas, structures
change the order of lines randomly
favourites – bookmark this concordance for easy access from the dashboard
switch between KWIC view, alignment view and sentence view
query details and frequency count
sort the lines alphabetically
filter – only display lines containing or not containing something
frequency – count words, tokens, lemmas, text types…
line details – display information about each line, select the metadata that should stay displayed permanently
How to use the parallel concordance
Select a parallel corpus in one of the languages and then Parallel concordance The second language should be selected on the input form.
Visit the related Quick start guide or hover the mouse over icons, controls and other elements to display the tooltips. Click the highlighted words the functions and settings.
Watch the video from our YouTube channel.
Working with the columns (languages)
The number of languages (columns) to search simultaneously is only limited by the number of languages the corpus contains.
All tools are available for the first language. Frequency distribution, filter and sorting in the other language(s) is only available if a search word is used in that language.
All remaining tools are applied on all the languages displayed on the result screen.
How does the concordance work?
The search query must be defined for the first language, it is optional for the remaining language(s). The search always starts from the beginning of the corpus and the concordance lines are displayed in the order in which they are found in the corpus. Use the Shuffle or Random sample icons to change this.
The display of lemmas, tags and other attributes can be set in view options.
The icon to the left of each concordance line leads to text types (metadata) about the concordance lines. The user can set on or more of them to stay displayed permanently.
After the result is displayed, Sketch Engine will try to highlight the most probable word that is the translation of the search word or phrase in the first language. This may not always be accurate or may not be highlighted at all. The process is indicated with a notification at the top.
How to use the concordance
Hover the mouse over the different elements of the input form to learn about the different search options.
Working with the results
To learn to filter, sort, count, reorganize or process the results, hover the mouse over an icon or the question mark symbol (?) to display a tooltip which explains its function.
Speed and corpus size
Sketch Engine is specifically designed to handle large corpora with speed. Any search will only take a few seconds two to complete if the corpus size is under a billion words. It might take a bit of extra time for corpora over 1 billion words. Complex CQL searches containing regular expressions or frequency calculations on large corpora can take several minutes to complete.
Requirements for the concordance to work well
The concordance will work with any corpus even one which is not tokenized, lemmatized and tagged, however, adding these three features increases the usefulness immensely. Tokenization, lemmatization and tagging are carried out automatically upon uploading files to Sketch Engine provided the language is supported.
Parallel concordance will work even if the languages in the corpus are not annotated to the same level. For example, one language is tagged and lemmatized and the other is not.
The concordance of easy shuffled randomly so that sentences from different parts of corpus are on one screen. Translations are highlighted.
The above concordance but without Spanish segments where the word easy is translated by fácil. A filter excluding Spanish segments containing words starting fácil- was used. Not all translations are highlighted.
Build a parallel corpus
To learn to build a parallel multilingual corpus, see Setting up parallel and multilingual corpora