Keyword and term extraction can be used for:
Terminology extraction extracts words which are typical for the topic of the document or corpus, i.e. they appear in the corpus more frequently than they would in general language. A large non-specialized corpus in the language is used to represent general language.
The default settings are normally sufficient for the extraction to provide high-quality results.
The results are displayed together with links to the sentences in both the focus and the reference corpora.
Extracting unique features of a corpus
This use is almost identical to terminology extraction. The difference is that the user may not only be interested in specialized lexis which is rare in general language but sometimes it might be interesting to look which medium-frequency words or even hight-frequency words are used more often than in general language. Use the slider on the advanced tab to focus the extraction on different parts of the frequency range.
Comparing two texts manually is not realistic if the two texts are very long. Even with short texts, statistical comparison can bring up phenomena which would go unnoticed when comparing manually.
Keywords and terms can be used to compare two corpora or subcorpora. When comparing subcorpora, each subcorpus can be in the same or different corpus.
The result will show what is typical of the focus (sub)corpus in comparison with the reference (sub)corpus.