a statistic measure for identifying collocations. It expresses the typicality of the collocation. It is used in the word sketch feature and also when computing collocations from a concordance.

It is only based on the frequency of the node and the collocate and the frequency of the whole collocation (co-occurrence of the node and collocate). logDice is not affected by the size of the corpus and, therefore, can be used to compare scores between different corpora.

logDice is the preferred statistic measure for large corpora. The other traditional measures take corpus size into account and the enormous size of the current multi-billion-word corpora skews the score so much as to make them impractical.

In detail

A detailed explanation for non-statisticians and non-mathematicians is published in this blog post: Most frequent or most typical collocations?



see also

logDice in Statistics used in Sketch Engine

A Lexicographer-Friendly Association Score (paper)


MI score