How does it work?
Process of comparing corpora
The compare corpora method carries out the following process for every two selected corpora (an example for attribute ‘word’):
- Finds 5000 most frequent words from corpus1 and 5000 from corpus2.
- Creates one set of these words, deletes duplicated ones.
- Counts keyness score for every word from the set. The corpus with a higher relative frequency of the word is a focus corpus. Contrarily, the corpus with a smaller relative frequency of the word is a reference corpus. Thus, the result number is always more than 1 or 1 in case the frequency is the same.
- Chooses only top 500 words according to the keyness score.
- Makes arithmetic mean (average) of their scores. The average is the number expressing a similarity of corpora displayed in the result chart.
arithmetic mean (average) – the sum of a collection of numbers divided by the count of numbers in the collection
Kilgarriff, A. (2001). Comparing corpora. International journal of corpus linguistics, 6(1), 97-133.