
ARF – Average Reduced Frequency _{[ statistics ]}
a modified frequency which prevents the result to be excessively influenced by one part of the corpus (e.g. one or more documents) which contains a high concentration of the token. If the token is evenly distributed across the corpus, ARF and absolute frequency will be identical will be comparable. see also ARF definition 
document frequency _{[ statistics ]}
The document frequency is the number of documents in which the word or phrase appears. If the corpus has 100 documents and 2 documents contain the word city: document number 7 contains 17 instances of city, document number 31 contains 6 instances of city, the document frequency of city is 2, because 2 documents contain the word. It is not important how many documents the corpus contains or how many times the word appears in total. The document frequency can be better suited for comparison in situations when the corpus contains a small number of documents with an extremely high frequency of particular words. see also frequency frequency per million ARF Statistics used in Sketch Engine 
frequency _{[ statistics ]}
Frequency (also absolute frequency) refers to the number of occurrences or hits. If a word, phrase, tag etc. has a frequency of 10, it means it was found 10 times or it exists 10 times. It is an absolute figure. It is not calculated using a specific formula. compare frequency per million see also ARF document frequency Statistics used in Sketch Engine 
likelihood _{[ statistics ]}
a function of parameters of a statistical model, it plays a key role in statistical inference and is the basis for the loglikelihood function. see Statistics in Sketch Engine 
loglikelihood _{[ statistics ]}
one of the functions used in computed statistics of Sketch Engine. It is the association measures based on the likelihood function, using in tests for significance (see the loglikelihood calculator and more details) 
logDice _{[ statistics ]}
a statistic measure for identifying collocations. It expresses the typicality of the collocation. It is used in the word sketch feature and also when computing collocations from a concordance. It is only based on the frequency of the node and the collocate and the frequency of the whole collocation (cooccurrence of the node and collocate). logDice is not affected by the size of the corpus and, therefore, can be used to compare scores between different corpora. logDice is the preferred statistic measure for large corpora. The other traditional measures take corpus size into account and the enormous size of the current multibillionword corpora skews the score so much as to make them impractical. In detail A detailed explanation for nonstatisticians and nonmathematicians is published in this blog post: Most frequent or most typical collocations? see also logDice in Statistics used in Sketch Engine A LexicographerFriendly Association Score (paper) Tscore MI score 
MI Score _{[ statistics ]}
The Mutual Information score expresses the extent to which words cooccur compared to the number of times they appear separately. MI Score is affected strongly by the frequency, lowfrequency words tend to reach a high MI score which may be misleading. This is why Sketch Engine allows setting a frequency limit so that lowfrequency words can be excluded from the calculation. In most cases Tscore is more useful than MI score. see Concordance  Collocations see Statistics in Sketch Engine compare Tscore 
minimum sensitivity _{[ statistics ]}
a statistics measure similar to logDice which is the minimum of the two following numbers:
 the number of cooccurrences divided by the frequency of the collocate
 the number of cooccurrences divided by the frequency of the node word
The minimum sensitivity number grows with a high number of cooccurrences and falls with a high number of occurrences of the individual words (node word or collocate).

overall score _{[ statistics ]}
score of the relation based on logDice in word sketches. The score is displayed in the header of each column of the relation. 
relative frequency, frequency per million _{[ statistics ]}
(also called freq/mill in the interface) a number of occurrences (hits) of an item per million, also called i.p.m. (instances per million). It is used to compare frequencies between corpora of different sizes. number of hits : corpus size in millions of tokens = frequency per million The frequency per million is always related to the whole corpus or subcorpus, not to a text type. Restricting the query to one or more text types will affect the number of hits but the frequency per million will stay calculated using the number of tokens in the whole (sub)corpus. To relate the frequency per million to one or more text types, create a subcorpus from the text type(s) and restrict the query to this subcorpus.Example
Looking up the frequency of the word helps in the British National Corpus (112,181,015 tokens), first in the spoken Text type and then in the spoken subcorpus will produce these results.SUBCORPUS SELECTED none none spoken 11,787,138 tokens TEXT TYPE SELECTED none spoken none HITS 3,116 302 302 FREQUENCY PER MILLION 27.75 in relation to the number of tokens in the whole corpus 2.69 in relation to the number of tokens in the whole corpus 25.62 in relation to the subcorpus size POSSIBLE INTERPRETATION helps appears 27.75 times per million words in BNC ‘spoken’ helps appears 2.69 times per million in BNC helps appears 25.62 times per million in the spoken part of BNC 
salience _{[ statistics ]}
a statistical measure of the significance of a specific token in the given context. This is measured with logDice, for more information, see section 3 of Statistics used in Sketch Engine) 
simple math _{[ statistics ]}
the simple formula used for the computation and identification of terms and keywords. see Simple math. 
Tscore _{[ statistics ]}
Tscore expresses the certainty with which we can argue that there is an association between the words, i.e. their cooccurrence is not random. The value is affected by the frequency of the whole collocation which is why very frequent word combinations tend to reach a high Tscore despite not being significant collocations. In most cases, Tscore is more reliable or more useful than MI Score. see Concordance  collocations see Statistics in Sketch Engine compare MI Score