
ALDF – Average Logarithmic Distance Frequency _{[ statistics ]}
a modified frequency that prevents the result to be excessively influenced by one part of the corpus (e.g. one or more documents) which contains a high concentration of the token. If the token is evenly distributed across the corpus, ALDF and absolute frequency will be similar or identical. In comparison with ARF (Average Reduced Frequency), ALDF is calculated from the average distances between the tokens, not the frequency of the token. see also ALDF definition 
ARF – Average Reduced Frequency _{[ statistics ]}
a modified frequency which prevents the result to be excessively influenced by one part of the corpus (e.g. one or more documents) which contains a high concentration of the token. If the token is evenly distributed across the corpus, ARF and absolute frequency will be similar or identical. In comparison with ALDF (Average Logarithmic Distance Frequency), ARF is calculated from the frequency of the token, not distances between the tokens. see also ARF definition 
document frequency (docf) _{[ statistics ]}
The document frequency is the number of documents in which the token or phrase appears. If the corpus has 100 documents and 2 documents contain the word city: document number 7 contains 17 instances of city, document number 31 contains 6 instances of city, the document frequency of city is 2, because 2 documents contain the word. (more…) 
frequency _{[ statistics ]}
Frequency (also absolute frequency) refers to the number of occurrences or hits. If a word, phrase, tag etc. has a frequency of 10, it means it was found 10 times or it exists 10 times. It is an absolute figure. It is not calculated using a specific formula. compare frequency per million see also ARF document frequency Statistics used in Sketch Engine 
likelihood _{[ statistics ]}
a function of parameters of a statistical model, it plays a key role in statistical inference and is the basis for the loglikelihood function. see Statistics in Sketch Engine 
loglikelihood _{[ statistics ]}
one of the functions used in computed statistics of Sketch Engine. It is the association measures based on the likelihood function, using in tests for significance (see the loglikelihood calculator and more details) 
logDice _{[ statistics ]}
a statistic measure for identifying cooccurrence (=two items appearing together). Sketch Engine uses it to identify collocations. It expresses the typicality (or strength) of the collocation. It is used in the word sketch feature and also when computing collocations from a concordance. (more…) 
MI Score _{[ statistics ]}
The Mutual Information score expresses the extent to which words cooccur compared to the number of times they appear separately. MI Score is affected strongly by the frequency, lowfrequency words tend to reach a high MI score which may be misleading. (more…) 
minimum sensitivity _{[ statistics ]}
a statistics measure similar to logDice which is the minimum of the two following numbers:
 the number of cooccurrences divided by the frequency of the collocate
 the number of cooccurrences divided by the frequency of the node word
The minimum sensitivity number grows with a high number of cooccurrences and falls with a high number of occurrences of the individual words (node word or collocate).

overall score _{[ statistics ]}
score of the relation based on logDice in word sketches. The score is displayed in the header of each column of the relation. 
relative frequency, frequency per million _{[ statistics ]}
(also called freq/mill in the interface) is a number of occurrences of an item per million tokens, also called i.p.m. (instances per million). It is used to compare frequencies between corpora (or datasets) of different sizes.Formula
number of hits : corpus size in millions of tokens = frequency per million (an alternative calculation producing the same result) raw frequency : corpus size in tokens × 1000000 = frequency per million (more…) 
salience _{[ statistics ]}
a statistical measure of the significance of a specific token in the given context. This is measured with logDice, for more information, see section 3 of Statistics used in Sketch Engine) 
simple maths _{[ statistics ]}
The simple maths formula is used to calculate the keyness score in Sketch Engine. This score is used to identify terms, keywords and also key ngrams and key collocations. It identifies items which appear more frequently in the focus corpus than in the reference corpus. It uses relative (per million) frequencies and, therefore, makes it possible to contrast corpora of unequal sizes. see Simple maths. 
Tscore _{[ statistics ]}
Tscore expresses the certainty with which we can argue that there is an association between the words, i.e. their cooccurrence is not random. The value is affected by the frequency of the whole collocation, which is why very frequent word combinations tend to reach a high Tscore despite not being significant collocations. (more…)