How does it work?
Process of comparing corpora
The compare corpora method carries out the following process for every two selected corpora:
- Finds top 5000 words according to frequency (from every corpus separately).
- Counts keyword score (for each corpus separately) for every word from unification (each word of every pair of top 5000 words).
- Chooses only top 500 words according to score (only highest values – positive or negative scores are considered).
- Makes arithmetic mean (average) of their scores. This average is the number expressing a similarity pair of corpora displayed in the result chart.
unification – collection of two or more sets, e.g. union of 2 sets A and B is the set of elements which are in A, in B, or in both A and B.
arithmetic mean (average) – the sum of a collection of numbers divided by the count of numbers in the collection
Kilgarriff, A. (2001). Comparing corpora. International journal of corpus linguistics, 6(1), 97-133.