• salience [ statistics ]

    a statistical measure of the significance of a specific token in the given context. This is measured with logDice, for more information, see section 3 of Statistics used in Sketch Engine)
  • search attribute

    the attribute that is used for the search and creating a word list. You can have the word list of words, lemmas, tags, etc.
  • search span

    the number of tokens either side of the node that will be matched for filtering concordance. The set search span from -5 to 5 means filter all concordance lines which containing a requirement of the filter in the range of 5 tokens around the node.
  • simple math [ statistics ]

    the simple formula used for the computation and identification of terms and keywords. see Simple math.
  • stemming

    stemming is the process during which a word reduces its affixes (suffixes, prefixes, etc.) and finally, the stem only remains. Stemming is used to detect related words with the same stem, the word root which does not change in any case, number or tense. The word stems are available in Portuguese corpus ptTenTen. This analysis is processed with tools call stemmers.
  • structure

    a corpus structure refers to the segments or parts into which a corpus can be divided. Typically, a corpus is divided into sentences, paragraphs and documents but corpora can use various other structures depending on the type of corpus. see a list of common corpus structures see Dividing a corpus into smaller parts and annotating them
  • subcorpus

    a corpus can be subdivided into an unlimited number of parts called subcorpora. Subcorpora can be used to divide the corpus by the type (fiction, newspaper), media (spoken, written) or time (e.g. by years) or by any other criteria. A subcorpus can also be created from a concordance by including all concordance lines and the documents they come from into a subcorpus. How to create a subcorpus»