• web mining [ text-analysis ]

    web mining is the application of data mining which extracts information from texts. The web mining is focused on gaining information and metadata from the web. For this task, Sketch Engine uses the fully-automated tool WebBootCaT for creating corpora from the web which stores also metadata of processed websites. Read about other text analysis tools.
  • Word

    Note: This entry is for the type of token.  For the positional attribute, see word form. A word is a type of token. Words are tokens which begin with a letter of the alphabet. All tokens in a corpus are divided into two groups: words and nonwords. The regular expression Sketch Engine users to identify words is [[:alpha:]].*  Compare to nonword.
  • word form [ attribute ]

    This entry is for the positional attribute: word form, lemma, lowercase, tag… For the type of token, the opposite of nonword, see word. The word form (often shortened to word in the interface) is a positional attribute. It refers to one of the word forms that a  lemma can take, e.g. the lemma go can take these word forms go, went, gone, goes, going. A list of word forms is a list where each of go, went, gone, goes, going is listed separately and their frequencies are also calculated separately. A search using word forms is a search which will only find the word form(s) that is typed in the input form. It will not find the other word forms belonging to the same lemma. The word form is case-sensitiveapple and Apple are two different word forms. Compare word_lc (lowercase) lemma lemma lc (lowercase) See also list of attributes token  
  • word list

    A word list is a generic name for various types of lists such as list of words, lemmas, POS tags or other attributes with their frequency (hit counts, document counts or others).
  • word sketch

    A word sketch is a tool to display collocations (=word combinations) in a compact, easy-to-understand way. The word sketch makes it easy to understand how a word behaves, which contexts it typically appears in and which words it can be used together. more»
  • Word Sketch grammar

    Word Sketch grammar (WSG) is a set of rules defining the grammatical relations (=columns/categories) in a Word Sketch. WSG is language dependent, the same WSG cannot be shared across languages. Different corpora in the same language can use the same or different WSG. Users can write their own WSG to match their specific needs. Corpora in unsupported languages can make use of a universal WSG which provides only basic statistics of words surrounding the keywords ignoring the grammar of the language. The universal WSG can also be modified by the user. more» see also Term grammar