The Find X function is an experimental function currently supported only in a few corpora, i.e. British National Corpus (word sketch: people)

Find X (formerly called histograms) is a feature which enables you to see additional information in word sketch results. This information provides more details about the use of the word, e.g. noun people is usually used in the plural.

Find X (word sketch highlights) can show differences in word usage such as grammatical numbers (singular vs plural), text types (written vs spoken), grammatical cases (simple vs continuous vs passive), etc.

word sketch highlights detail

word sketch highlights

3 types of definitions

Find X can be defined in three ways.

  • (Q1) In this scenario, the frequency of the pattern specified by the CQL, with the word substituted at %s in Q1, is divided by the frequency of the word.

    freq(Q1[word]) / freq(word)

  • (Q1 and Q2) In this scenario, the frequency of the Q1 query (with the word instantiated at %s) is divided by the sum of that same frequency and the frequency of Q2 (with the word instantiated at %s).

freq(Q1[word]) / (freq(Q1[word]) + freq(Q2[word]))

  • (WS) Here the frequency of the word in the word sketch grammatical relation is divided by the frequency of the word in the entire corpus.
  • freq(WS[word]) / freq(word)

Brief documentation

Q1 – CQL query (mandatory or S1 for parts of corpora)

Q2 – CQL query (optional)

S1 – part of corpus or subcorpus (mandatory or Q1 for queries)

S1 – part of corpus or subcorpus (optional)

HR – histogram human-readable name (optional)

RE – regular expression, e.g. n$ when use lempos attribute in Q1(optional)

TH – threshold (depending on the type of definition)

CL – coloring the information, e.g. red or blue (optional)

WS – word sketch definition name, e.g. usage patterns (mandatory if used)

How to use the Find X function?

This is a facility available from the left submenu in the word list feature and related to the use of word sketch highlights in Sketch Engine.


Additionally, a regular expression (RE) can be specified for removing some words from consideration. Only the words matching the RE are considered. This is mainly for efficiency reasons.

Examples are attached. Note that you may need to alter the minimum ratio and minimum frequency to see any results.


Definition file format

HR human readable name
Q1 query_1
Q2 query_2                        # optional
RE regular_expression             # optional


 HR  human readable name
 WS wsdef_relation_name
 RE regular_expression             # optional

# All strings in the definition files starting with # are comments and are ignored to the end of the line.


searching passive forms with using lempos attribute
HR verbs that are most often passive
Q1 [lempos=="%s" & tag="VBB_T"]
RE -v$

searching plural forms with using lempos attribute

HR nouns that are most often plural
Q1 [lempos=="%s" & tag="NNS_."]
RE -n$

searching with using threshold and colours

S1 spoken
TH 50
RE -[nvj]$
CL rgb(50, 50, 100)

“spoken” should be replaced with the name of subcorpus (spaces are replaced with underscores)

Bibliographical reference

Adam Kilgarriff and Pavel Rychlý (2008). Finding the words which are most X. In Proceedings of the 13th EURALEX International Congress. Spain, July 2008, pp. 433–436