Working with concordance results

Please familiarise yourself with the  concordance result screen first.

Concordance result screen – left menu

Download concordance

The concordance lines generated from a corpus can be saved into a TXT (plain text), CSV, TSV or XML file. The data will be saved as they are displayed on the screen. Use View options to change this before saving. Saving large concordances can take some time, it is recommended that the user first saves only the first page to check the output format before saving the final concordance.

Exporting concordance is unlimited for user corpora and limited for preloaded corpora to 10,000 lines. If you need to download more lines from preloaded corpora, please contact us at and describe a purpose of your work.

Creates a subcorpus by including the structures, e.g. documents, paragraphs, sentences, from which the concordance hits originate, into a subcorpus.

direct options in the left menu:

View options –  access to the detailed settings, see below.

KWIC – (default setting) displays a KWIC concordance

Sentence – displays a concordance of complete sentences containing the search word.

Sketch Engine remembers the last used option and will use it for the next concordance search with this corpus.

Detailed settings

This dialogue opens after clicking View options in the left menu. Sketch Engine remembers the settings and will use them for the next search with any corpus.

concordance view options

Example from the British National Corpus. Different corpora use different Structures and References.

(1) attributes are additional pieces of information related to each token in a corpus. Normally, they are hidden. Tick the ones you want to display in the concordance. Refer to glossary for details. At least one attribute has to be selected.

(2) select the structures you want to see in the concordance, hold down Ctrl to select multiple structures. Structures refer to segments into which a corpus can be divided, some examples might be s for sentence, p for paragraph. The list of structures can be different for each corpus and can be found on the corpus info page.

(3) References – line source information displayed to the left of each concordance line, hold down Ctrl to select multiple references. The list of available references can be different for each corpus and can be found on the corpus info page.
References up – when ticked, the references appear above each line rather than to the left, useful when displaying many or long references.

(4) Attributes (1) can be displayed:
for the search word or phrase only
for all words in each concordance line

In addition, they can be displayed permanently or as tooltips (=when the user hovers the mouse over the word).

(5) number of concordance lines displayed on one page

(6) number of characters displayed to the right and left of the search word; when attributes (1) are selected, they are included in the number so selecting many attributes can mean that very little words will be included in the line because the attributes will use most of the characters for themselves

(7) will sort the concordance lines so that GDEX are shown first

(8) for development purposes only

(9) indicates how many random sentences from the concordance will be sorted according to GDEX, high values cause long processing times, recommended values are up to 300

(10) when ticked, each concordance line can be selected by clicking the icon, the sentence will be copied to clipboard ready for pasting

(11) when ticked, the clipboard will hold multiple sentences, not only one

(12) when the copy icon does not work, try turning flash off by placing a tick here, copied sentences will appear in a pop-up window ready to be pasted into another software

(13) when ticked, lines will have a check box for copying rather than an icon

(14) when ticked, lines will be numbered

(15) when ticked, the references (3) displayed to the left of each line will be shortened, hover the mouse over them to display the complete text

(16) sets the format for the copied sentences if the corpus supports this feature

View options – one-click copying

Sketch Engine offers a simple way of copying concordance lines to be inserted into a different application.

  • Click on View Options in the left-hand side panel.
  • Enable these settings:


  • Click on the icons to the right of concordance lines to copy them to your clipboard.
  • In the end simply use Ctrl+C/Ctrl+V or Ctrl+Insert/Shift+Insert (in Windows) or Cmd+C /Cmd+V (in Mac)


The problem can come from the Adobe Flash player used for one-click copying to access your computer’s clipboard. Some versions of the player restrict access to the clipboard, though.

The functionality was tested with Flash player 9.

For IE and Firefox, there is another option to access the clipboard without Flash (however, not so user-friendly ). Sketch Engine uses jQuery zclip for one-click copying.

To enable the non-flash copying in Fire fox:

  1. write “about:config” in the address bar
  2. click “I will be careful, I promise”
  3. put “signed.applets.codebase_principal_support” into the ‘filter’ box (we want to change the browser property with this name)
  4. double-click on the value “false” should change it to “true”
  5. when clicking a copy-icon in SkE concordance, a dialog window appears telling that a script is trying to get some abilities; tick “Remember this decision” and “Allow”
  6. restart Firefox

One-click copying should work now in Firefox in the same way as with the Flash player 9.

The other solutions are:

  1. downgrade/upgrade the Flash player to the version which is not restricted
  2. use IE or Firefox with the setting described above
  3. use one-click copying without Flash option in View options


Concordance lines can be sorted in a simple or complex way and also shuffeled. When the concordance is sorted, a new control will appear at the bottom to quickly jump to a certain value.

sorted concordance jump

Click Sort in the left menu to access detailed sort settings.

Simple sort

simple sort

select whether you want to sort by word form, lemma, tag etc.

Sort key
select whether you want to search the tokens to the left of the search word or to the right or whether you want to search the search word (node) itself.

Number of tokens to sort
will sort the given number of tokens, the setting in the screenshot will first sort the lines by the first token after the node. The lines with the same first word will then be sorted by the second token and then by the third token. To only sort by the third token, use Multilevel sort.

Ignore case
the sort will not be case sensitive

the sort will be in the reverse order

Multilevel sort

multilevel concordance sorting

Here you can specify which exact token should be used for searching and in which order.

2L = second token to the left, 1R = first token to the right etc.

Lines will first be sorted by the first criterion. The lines with the same attribute in that position will then be sorted further by the second criterion etc.

Left / Right– click to sort the concordance lines by the first word to the left / right of the search word or phrase, the lines will be grouped making them easier to observe lexical or grammatical patterns

Click Node to sort the lines by the search word or search phrase. If the node in all lines is identical, the sort has no effect.

Sort the concordance lines by the References displayed to the left of each concordance line. The sort depends on which References are selected in View options . See also Concordance result screen.

The concorance lines will be randomly reshuffled. Useful if the user wants to see different concordance lines without going to the next screen.


Click Sample to have Sketch Engine randomly select a certain number of lines. Useful when the concordance produces too many results. Random selection guarantees a representative sample of the whole concordance. Sketch Engine remembers the last sample settings and provides a shortcut to them. (Look for Last in the left menu.)

The random sample is generated using a random number generator which always starts from the same point. This means the random sample remains the same every time it is created for the same number of lines. This can be useful when several users want to work on the same sample or a user needs to recreate the same sample as used previously.

Watch this short You Tube video to learn how you can create a random sample of concordance lines from your sets of results.


Concordance lines can be filtered. The filter will exclude lines or only include lines with match the filter criteria.

predefined options (left menu)

If the searched structure contains another such structure inside, only the outer structure will be displayed as result. For example, CQL searching for "J.*"{1,3}"N.*"(default attribute: tag), i.e. a noun preceded by 1 to 3 adjectives, will find big black dog. This phrase contains another phrase which matches the search criteria – black dog. Two concordance lines will be displayed. When the filter is applied, only the former phrase will be displayed.

1st hit in doc
only the first occurrence in each document will be listed even if there may be more of them in each document

Click Filter to access these settings:

filtering the concordance

positive – only matching lines will be kept in the concordance
negative – matching lines will be excluded

selected token
select which token should be highlighted in a different colour if the token is found more than once within the search span

search span
determines how far from the node the filter should look for matching tokens

include KWIC
when ticked, the node (search word or phrase) itself will also be included in the filter

The rest of the settings is identical to the concordance query form.

Different filters can be used on top of each other until the desired outcome is achieved.


Frequencies of word forms, lemmas, tags and other attributes can be calculated from the concordance lines. You can combine up to 4 attributes for which frequencies should be calculated.

Click Frequency in the left menu to access these settings:


Multilevel Frequency Distribution

Use frequency limit to exclude low frequency items from the list.

Frequency can be calculated for the node but also for any other token up to 6 positions to the right or left of the node.


Frequencies can also be calculated for phrases or groups of words, tokens, tags or a mixture of attributes. For example, these settings will tell us the most frequent combinations of a word followed by work such as at workto work, the work etc. :

frequency example

Text Type Frequency Distribution

The text type frequency distribution will calculate frequency of the node in different texts. You can select text type, author, publication date and other attributes. Hold down CTRL and selecting more than one attribute will produce a statistic for each attribute all on one page.

Specify a frequency limit to exclude low frequency items.

What does Relative Text Type frequency on the text type frequency page mean?

The number is relative frequency of the query result divided by relative size of the particular text type. The number grows with higher frequency and lowers with bigger size of the text type. It can be interpreted as “how much more/less often the result of the query appears in this text type in comparison to the whole corpus”.

E.g. “test” has 2000 hits in the corpus. 400 of them are in the text type “Spoken”. Text type “Spoken” represents 10 % of the corpus. Then the Relative Text Type frequency will be (400 / 2000) / 0.1 = 200 % and it means “test” is twice as common in “Spoken” than in the whole corpus.

These frequency presets are directly available from the left menu:

Click to calculate the frequency of the node tags, i.e. tags related to the highlighted word(s) in the concordance. It is only useful if the concordance includes node words with various different tags, such as the word work sometimes used as a noun and sometimes as a verb. Then this option will show how often it is used as each part of speech.

Click to calculate the frequency of the node forms, i.e. the different word forms of the highlighted text in the concordance. This is only useful if various forms of the highlighted word are found. If the node is a word with no inflections, such as because, using this option will not produce any sensible result.

Click to calculate the frequency of Doc IDs. (Your corpus must have defined SHORTREF attribute in the corpus configuration file.)

Click to calculate how the node is distributed between the different text types.


The collocation tool will search the context around the node and will display the most frequent words which can be regarded as collocation candidates. This process can be slow for large concordances, it is recommended to use the Word Sketch circumstances permitting.

collocations from concordanceattribute
chose word form, tag, lempos, lemma, lc or lemma_lc


defines how many tokens to the right and to the left of the node will be included

minimum frequency in corpus
excludes words which appear in the corpus less frequently than the given value

minimum frequency in given range

excludes words which appear in the above defined range less frequently than the given value

show functions
defines which values should be shown on the result screen

Sort by
defines which value will be used for sorting

See Statistics used in Sketch Engine for an explanation of the values.


This tool will show a graph showing how the concordance lines are distributed across the concordance. This is useful to check, whether the search word is distributed evenly across the whole corpus or whether there are some places (documents) in the corpus where the word is concentrated which would suggest that the word is subject specific or the topics in the corpus are not balanced.

By default, the corpus is divided into 100 equal parts (slices), use the slider to achieve a finer division.

The column height represents a relative frequency of the search word within a concordance part (=column).

The columns are clickable and will display a concordance from the slice.

concordance distribution graph