Simple maths is a method for identifying keywords of one corpus vs another. It includes a variable which allows the user to turn the focus either on higher, or lower frequency words.

With this method, users can find keywords in their texts uploaded to Sketch Engine which is the ultimate tool to explore how language works. Its algorithms analyze authentic texts of billions of words (text corpora) to identify instantly what is typical in language and what is rare, unusual or emerging usage.

Generally, the higher value (100, 1000, …) of Simple maths focuses on higher-frequency words (more common words), whereas the lower value (1, 0.1, …) of Simple maths will rather prefer the words with lower frequency (more rare words).

The statistic we use for keywords is a variant on “word W is so-and-so times more frequent in corpus X than corpus Y”. The keyness score of a word is calculated according to the following formula:

$\frac{fpm_{rm&space;focus}&space;+&space;N}{fpm_{rm&space;ref}&space;+&space;N}$

where

$fpm_{rm&space;focus}$ is the normalized (per million) frequency of the word in the focus corpus,

$fpm_{rm&space;ref}$ is the normalized (per million) frequency of the word in the reference corpus,
$N$ is the so-called smoothing parameter ($N&space;=&space;1$ is the default value).

### Example

Your focus corpus (BNC): 112,289,776 tokens
Frequency of the lemma (shard) in the corpus: 35

Relative frequency

$fpm_{rm&space;focus}&space;=&space;\frac{number~of~hits~\cdot~1,000,000}{corpus~size}&space;=&space;\frac{35~\cdot~1,000,000}{112,289,776}&space;=&space;0.3117$

Selected reference corpus (ukWaC): 1,559,716,979 tokens
Frequency of the lemma (shard) in the corpus: 263

Relative frequency

$Score&space;=&space;\frac{fpm_{rm&space;focus}&space;+&space;N}{fpm_{rm&space;ref}&space;+&space;N}&space;=&space;\frac{0.3117&space;+&space;1}{0.1686&space;+&space;1}&space;=&space;1.1224$

#### For more details see:

Adam Kilgarriff. Simple maths for keywords. In Proceedings of Corpus Linguistics Conference CL2009, Mahlberg, M., González-Díaz, V. & Smith, C. (eds.), University of Liverpool, UK, July 2009.

Statistic used in Sketch Engine (Chapter 5). Lexical Computing Ltd., 8 July 2015.