• web mining [ text-analysis ]

    web mining is the application of data mining which extracts information from texts. The web mining is focused on gaining information and metadata from the web. For this task, Sketch Engine uses the fully-automated tool WebBootCaT for creating corpora from the web which stores also metadata of processed websites. Read about other text analysis tools.
  • Word

    Note: This entry is for the type of token.  For the positional attribute, see word form. A word is a type of token. All tokens in a corpus are divided into two groups: words and nonwords.  Words are tokens which begin with a letter of the alphabet. Tokens such as book, working, Mary, T-shirt, post-1945, mp3 or CO2 are words because they start with a letter. The regular expression Sketch Engine users to identify words is [[:alpha:]].*  Compare to nonword.
  • word form [ attribute ]

    This entry is for the positional attribute: word form, lemma, lowercase, tag… For the type of token, the opposite of nonword, see word. The word form (often shortened to word in the interface) is a positional attribute. It refers to one of the word forms that a  lemma can take, e.g. the lemma go can take these word forms go, went, gone, goes, going. A list of word forms is a list where each of go, went, gone, goes, going is listed separately and their frequencies are also calculated separately. A search using word forms is a search which will only find the word form(s) that is typed in the input form. It will not find the other word forms belonging to the same lemma. The word form is case-sensitiveapple and Apple are two different word forms. Compare word_lc (lowercase) lemma lemma lc (lowercase) See also list of attributes token  
  • word list

    A word list is a generic name for various types of lists such as list of words, lemmas, POS tags or other attributes with their frequency (hit counts, document counts or others).
  • word sketch

    The word sketch is a tool to display collocations (=word combinations) in a compact, easy-to-understand way. The word sketch makes it easy to understand how a word behaves, which contexts it typically appears in and which words it can be used together. The word sketch can typically display collocations of only nouns, adjectives, verbs and adverbs. This may differ between languages and corpora. The supported parts of speech are determined by the word sketch grammar applied to the corpus. Users can develop their own word sketch grammars to customize the collocation analysis to their own needs. See also Word sketch & collocations and word combinations Word sketch difference & compare words Word sketch grammar  
  • Word Sketch grammar

    Word Sketch grammar (WSG) is a set of rules defining the grammatical relations (=columns/categories) in a Word Sketch. In other words, WSG tells Sketch Engine which words should be regarded as collocations of the search word and also what type of collocation they are. WSG defines the criteria using POS tags, distance between words, and other criteria. The criteria are written using CQL. WSG is language dependent, the same WSG cannot be shared across languages.Typically corpora in the same language use the same WSG, but exceptions exist. Users can write their own WSG to match their specific needs. Corpora in unsupported languages can make use of a universal WSG which provides only basic statistics of words surrounding the keywords ignoring the grammar of the language. The universal WSG can also be modified by the user. more» see also Term grammar Word sketch Word sketch & collocations and word combinations