• reference corpus

    reference corpus is used in keyword extraction and term extraction. A reference corpus is a corpus to which the focus corpus is compared. When using the Keywords & Terms tool, a reference corpus is preselected but the user can use a different corpus as a reference corpora.  The reference corpus can but does not have to be the same for keywords and for terms. A reference corpus can also be used with n-grams to identify n-grams typical of the focus corpus in comparison with the reference corpus. In the word sketch, a reference corpus is used to identify key collocations when using the AS A LIST option in the word sketch.   see also term keyword term extraction (definition) term extraction (quick start guide)
  • regular expressions

    a collection of special symbols that can be used to search for patterns rather than specific characters, e.g. to find all words starting, containing or ending in a specific sequence of characters, for example .*tion will find all words ending in tion and having an unlimited number of characters at the beginning read more»  
  • relative frequency, frequency per million [ statistics ]

    (also called freq/mill in the interface) is a number of occurrences of an item per million tokens, also called i.p.m. (instances per million). It is used to compare frequencies between corpora (or datasets) of different sizes.

    Formula

    number of hits : corpus size in millions of tokens = frequency per million (an alternative calculation producing the same result) raw frequency : corpus size in tokens × 1000000 = frequency per million (more…)
  • relative text type frequency

    (also called Relative density in the interface) Relative text type frequency compares the frequency in a specific text type to the frequency in the whole corpus. It shows how typical the word(s) is of a specific text type, e.g. of the spoken part of the corpus or of a particular website which the texts were downloaded from. (more…)