a statistic measure for identifying co-occurrence (=two items appearing together). Sketch Engine uses it to identify collocations. It expresses the typicality (or strength) of the collocation. It is used in the word sketch feature and also when computing collocations from a concordance.

It is only based on the frequency of the node and the collocate and the frequency of the whole collocation (co-occurrence of the node and collocate). logDice is not affected by the size of the corpus and, therefore, can be used to compare scores between different corpora.

logDice is the preferred statistic measure for large corpora. The other traditional measures take corpus size into account and the enormous size of the current multi-billion-word corpora skews the score so much as to make them impractical.

In bilingual terminology extraction
LogDice is also used in bilingual term extraction to identify the most probable translation.

In detail

A detailed explanation for non-statisticians and non-mathematicians is published in this blog post: Most frequent or most typical collocations?



see also

logDice in Statistics used in Sketch Engine

A Lexicographer-Friendly Association Score (paper)


MI score