• n-gram

    is a sequence of items (bigram = 2 items , trigram = 3 items ...n-gram = n items). An item can refer to anything (letter, digit, syllable, token, word or others) . In the context of corpora and corpus linguistics, n-grams typically refer to tokens (or words). In linguistics, n-grams are sometimes referred to as MWEs, i.e. multiword expressions. (more…)
  • node

    (talking about collocations) central word in a collocation, e.g. strong wind consists of the collocate strong and the node wind (talking about concordances) the search word or phrase, sometimes called a query, appears in the centre of a KWIC concordance or highlighted in other types of concordances
  • non-word

    Non-words (also spelt nonwords) are tokens which do not start with a letter of the alphabet. Examples of non-words are numbers, punctuation but also tokens such as 25-hour, 16-year-old, !mportant, 3D. Tokens such as post-1945, mp3 or CO2 are words because they start with a letter. The regular expression Sketch Engine users to identify non-words is [^[:alpha:]].*  Compare word.