• n-gram

    is a sequence of a number of items (bigram = 2 items , trigram = 3 items ...n-gram = n items). An item can refer to anything (letter, digit, syllable, token, word or others), . In the context of corpora and corpus linguistics, ngrams typically refer to tokens (or words). In linguistics, ngrams are sometimes referred to as MWEs, i.e. multiword expressions. Generating a list of the most frequent n-grams will help us linguistic phenomena that might go unnoticed when using other tools. Ngrams can identify discourse markers or chunks of language which should be taught/learnt as fixed phrases in leanguage teaching. The toold to generate ngrams is the N-gram tool in Sketch Engine.
  • node

    (talking about collocations) central word in a collocation, e.g. strong wind consists of the collocate strong and the node wind (talking about concordances) the search word or phrase, sometimes called a query, appears in the centre of a KWIC concordance or highlighted in other types of concordances
  • non-word

    Non-words (also spelt nonwords) are tokens which do not start with a letter of the alphabet. Examples of non-words are numbers, punctuation but also tokens such as 25-hour, 16-year-old, !mportant, 3D. Tokens such as post-1945, mp3 or CO2 are words because they start with a letter. (There might be rare cases when the corpus author used a different definition of nonwords in their corpus. The definition is part of the corpus configuration file.) Compare word.