Find x is a feature enabling you to produce a ranked list of words according to a specified definition of the behaviour a user wishes to examine. The definition is used to calculate statistics and rank words according to the corpus that you have selected. Users can definitions available at existing lists or upload an input file (see format and examples below).

There are 3 scenarios for the definitions:

i) a specified CQL query:

  • (Q1) In this scenario, the frequency of the pattern specified by the CQL, with the word substituted at %s in Q1, is divided by the frequency of the word.

    freq(Q1[word]) / freq(word)

ii) a comparison of two such CQL queries:

  • (Q1 and Q2) In this scenario, the frequency of the Q1 query (with the word instantiated at %s) is divided by the sum of that same frequency and the frequency of Q2 (with the word instantiated at %s).

freq(Q1[word]) / (freq(Q1[word]) + freq(Q2[word]))

iii) a word sketch definition:

  • (WS) Here the frequency of the word in the word sketch grammatical relation is divided by the frequency of the word in the entire corpus.
  • freq(WS[word]) / freq(word)

How to use the Find X function?

Menu navigation

This is a facility available from the left submenu in the word list feature and related to the use of word sketch highlights in Sketch Engine.

findx_navigation

Additionally, a regular expression (RE) can be specified for removing some words from consideration. Only the words matching the RE are considered. This is mainly for efficiency reasons.

Examples are attached. Note that you may need to alter the minimum ratio and minimum frequency to see any results.

finxX_form

Definition file format

FindX (WS highlights) definition file format

=highlight_id
HR human readable name
Q1 query_1
Q2 query_2                        # optional
RE regular_expression             # optional

or

 =highlight_id
 HR  human readable name
 WS wsdef_relation_name
 RE regular_expression             # optional

# All strings in the definition files starting with # are comments and are ignored to the end of the line.

Examples

Examples

searching passive forms with using lempos attribute
=passive
HR verbs that are most often passive
Q1 [lempos=="%s" & tag="VBB_T"]
RE -v$

searching plural forms with using lempos attribute

=plural
HR nouns that are most often plural
Q1 [lempos=="%s" & tag="NNS_."]
RE -n$

searching with using threshold and colours

=spoken
S1 spoken
TH 50
RE -[nvj]$
CL rgb(50, 50, 100)

“spoken” should be replaced with the name of subcorpus (spaces are replaced with underscores)

Bibliographical reference

Adam Kilgarriff and Pavel Rychlý (2008). Finding the words which are most X. In Proceedings of the 13th EURALEX International Congress. Spain, July 2008, pp. 433–436