Glossary | Sketch Engine

collocation
strong	wind
icy	wind
cold	wind

T-scoreT-score expresses the certainty with which we can argue that there is an association between the words, i.e. their co-occurrence is not random. The value is affected by the frequency of the whole collocation, which is why very frequent word combinations tend to reach a high T-score despite [...] Read More

tag(also called part-of-speech tag, POS tag or morphological tag) is a label assigned to each token in an annotated corpus to indicate the part of speech and often also grammatical categories and morphological information. The tool used to annotate a corpus is called a tagger. A collection of [...] Read More

tagset(called also tag set) is a list of part-of-speech tags used in one corpus. In Sketch Engine, corpora in the same language tend to use the same tagset but exceptions exist. To check the tagset used, access Corpus statistics and details. See our blog about POS tags.

TBLapplication in Sketch Engine for collecting usage-example sentences to build dictionaries. Find more on the Tick Box Lexicography page

termTerms is a concept used in connection with Keywords & Terms tool. A term is a multi-word expression (consisting of several tokens) which appears more frequently in one corpus (focus corpus) compared to another corpus (reference corpus) and, at the same time, the expression has a format of [...] Read More

term baseIn connection with CAT tools, a term base is a database of subject-specific terminology and other lexical items which need to be translated consistently. The CAT tool uses the term base to check the consistency of translation, to look for untranslated segments, and to suggest (or [...] Read More

term extractionthe process of identifying subject specific vocabulary in a subject specific text usually using specialized software. The identification of one-word and multi-word terms in Sketch Engine is based on the comparison of the frequency of such words and phrases between the reference corpus and the [...] Read More

term grammarA term grammar is a set of rules written in CQL which define the lexical structures, typically noun phrases, which should be included in term extraction. The lexical structures are defined using POS tags and CQL. The use of a term grammar ensures a clean term extraction result which requires [...] Read More

text analysistext analysis (also content analysis or text analytics) is a method for analyzing (usually unstructured) text in order to extract information. The result of the text analysis is structured data. In addition to the traditional tools, Sketch Engine also offers some unique features. The [...] Read More

text miningtext mining is an automatic process of extracting information from text, such as keywords of a text or its source(s). The corresponding tools in Sketch Engine are WebBootCaT for creating corpora from the web or keywords and terms extraction which finds terminology in your texts. Read about [...] Read More

text type[We follow Biber (1989) in using text type as a generic for the many ways in which a text might be classified.] A text type refers to values assigned to structures (e.g. documents, paragraphs, sentences or others) inside a corpus. Text types can refer to the source (newspaper, book, [...] Read More

text type selectorAny search in Sketch Engine can be limited to certain text types only. The results will be taken from documents annotated with the specific text type(s). Users can include metadata in their corpora. If the metadata are in the required format, they will be converted to text types and will [...] Read More

timelineThe timeline function displays the changing of a word or phrase over time. Timelines are not a standalone tool, they are included in the Concordance and Wordlist tools. Timelines are computed the same as the graphs in Trends – a diachronic analysis of word usage, however, they can [...] Read More

TMX - Translation Memory eXchange formatTranslation Memory eXchange (TMX) is a specific XML format used for creating parallel corpora in Sketch Engine. This format is standardly used in translation memories (TM). See more about Setting up parallel corpora in Sketch Engine. An example of a TMX document (from Wikipedia), the [...] Read More

tokenA token is the smallest unit that a corpus consists of. A token normally refers to:

a word form: going, trees, Mary, twenty-five…
punctuation: comma, dot, question mark, quotes…
digit: 50,000…
abbreviations*, product names: 3M, i600, XP, e.g., etc., FB …
anything else between [...]

Formula

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine