How to generate a concordance?

Log in to Sketch Engine (or click Home) and select a corpus. Then, in the left menu, click Search to display the following screen:

concordance - basic corpus search

if you see more options, close them by clicking on the underlined links or proceed to advanced options further below

(2) type your keyword or a phrase and click (3) make concordance

The results can be sorted and filtered, frequencies can be calculated. See Working with concordance results More precise search criteria can used by following the information below.

Watch this short You Tube video to see how you can generate a concordance in Sketch Engine.

Concordance Search – advanced options

Click (2) Query types and/or Context and/or Text types links to unfold the related options.

(2) Query types

concordance query types, context and text types

Click Query types to unfold a list of available queries. Start typing into the required box, the type will be selected automatically and will be highlighted, the others will appear in grey. In the screenshot above the user decided to use the lemma search.

The simple search tries to guess what you wanted to do:

  • if you type a lemma, it behaves as a lemma search with PoS:unspecified
    e.g. typing bear will find all forms, i.e. bear (animal) but also bear (verb), bore, born, bearing
  • if you type a word form which is not a lemma, it behaves as word form search
    e.g. typing goes will only find examples of goes
  • if you type two or more words which are lemmas, it will try to find all word forms
    e.g. typing drive them will find driving them, drove them, drives them, drive them
  • if you type a word which contains a word form which is not a lemma, it behaves as the phrase search.
    e.g. typing goes home will only find examples of the phrase but not examples of went home
  • if you search a word or phrase containing an apostrophe, you have to type the part with the apostrophe separately.
    e.g. typing Jane ‘s will find Jane’s or he ‘s will find he’s
  • if you search contracted negative forms, the apostrophe is not separately.
    e.g. typing must n’t will find mustn’t or must not will find must

Wild cards are allowed. The simple search is always case insensitive. (There might be rare cases when the corpus author configured the corpus differently.)

  • will find all word forms of the word, i.e. searching for go will find go, goes, going, went, gone
  • produces no result if the search word is not a lemma, e.g. typing goes produces no results
  • only one lemma is allowed, to search for a sequence of lemmas, use CQL
  • when PoS is set to unspecified, all parts of speech will be included if the lemma corresponds to more than one PoS, e.g. searching for bear with PoS:unspecified will find both examples of the animal as well as the verb to bear
  • will find examples of a sequence of tokens exactly as it is typed, differences in spaces are ignored, i.e. in bed will produce the same results as in        bed
  • is case sensitive, i.e. buy an apple will produce different results from by an Apple
  • regular expressions are allowed, e.g. constr.* will find all words starting with constr
  • if you need to search for punctuation, e.g. to find Hallo?, use CQL because punctuation constitute a separate token and some punctuation marks are operators in regular expressions, thus searching for hallo? would find examples of the word hall  and hallo (the use of ? makes the letter o optional)

will find the exact word form as it is typed, i.e. goes will only find examples of goes but not examples of gowentgoing, etc.

regular expressions are allowed, i.e. un.*ous will find examples of all words starting with un and ending with ous with any number of characters in between

will find a sequence of characters inside a token

used for complex searches including searching for structures without specifying concrete words

the advanced concordance above was generated from the British National Corpus with this CQL query:

[lemma=”grow”] []{0,3} ~8″tomato-n”

which tells Sketch Engine to find this:
[lemma=”grow”] finds all occurences of lemma grow
[]{0,3} followed by between zero and three random tokens (words)
~8″tomato-n” finds 8 words most similar to tomato  (uses the thesaurus)

A guide to CQL searching >

(2) Context

Activate the context options by clicking the Context link. The context options can be used together with Query types and Text types options.

setting context for concordance search

in the screenshot, the user is looking for the lemma rather but wants to avoid examples of would rather or should rather so the context is specified as:

  • 1 token to the left of the search word
  • neither would nor  should should appear (lemmas must be separated by the vertical bar “|”, e.g. would|should)

window can be set to left, right or both

tokens specify a window, not an exact position, when set to 4, it means between 1 and 4 tokens from the search word

none of these items – no lemma should appear
any of these items – at least one of the lemmas must appear in the window
all of these itmes – all of these lemmas must appear in the window (not necessarily in the same order)

The part of speech filter can be used to include or exclude certain parts of speech from the context of the word.

In the screenshot, the user excluded adverbs and adjectives one token to the right of rather.

window can be set to left, right or both

tokens specify a window, not an exact position, when set to 4, it means between 1 and 4 tokens from the search word

none of these items – none of the ticked PoS should appear
any of these items – at least one of the ticked PoS must appear in the window
all of these itmes – all of the ticked PoS must appear in the window

The query found the less frequent uses of the word rather, excluding examples of

would rather

should rather

rather + adjective, e.g. rather long

rather + adverb, e.g. rather quickly

concordance: less frequent structures of rather

To exclude the rather numerous examples of rather than, it would be necessary to use CQL or to use additional filters on this result screen.

Watch this You Tube video to learn to use context with concordances:

(2) Text types

Documents in a corpus can (but do not have to) be annotated (labelled) with information indicating the type of text, time of publication, purpose etc. Different corpora contain different type and different quantity of annotation. Corpora without this type annotation will not display any text type options to select.

concordance query text type settings

an example of the Text type settings available for the British National Corpus, other corpora can have more, fewer, different or no such settings at all

OR will be used between options from the same group

AND will be used between different groups

The example will be interpreted as:

include documents which are

spoken demographic OR written to be spoken

AND at the same time are

book OR periodical

Constraint on a subcorpus has an effect on the resulting size per million, see what causes the difference in size per million when using Text Type vs. a subcorpus.