• web mining [ text-analysis ]

    web mining is the application of data mining which extracts information from texts. The web mining is focused on gaining information and metadata from the web. For this task, Sketch Engine uses the fully-automated tool WebBootCaT for creating corpora from the web which stores also metadata of processed websites. Read about other text analysis tools.
  • word form [ attribute ]

    A word form (often shortened to word in Sketch Engine) refers to one of the word forms that a  lemma can take, e.g. the lemma go can take these word forms go, went, gone, goes, going. A list of words is a list where each of go, went, gone, goes, going is listed separately and their frequencies are also calculated separately. A search using words is a search which will only find exactly what was typed. The other word forms will not be included. Word is case sensitiveapple and Apple are two different word forms. compare word_lc (lowercase) lemma lemma lc (lowercase)  
  • word list

    A word list is a generic name for various types of lists such as list of words, lemmas, POS tags or other attributes with their frequency (hit counts, document counts or others).
  • word sketch

    A word sketch is a tool to display collocations (=word combinations) in a compact, easy-to-understand way. The word sketch makes it easy to understand how a word behaves, which contexts it typically appears in and which words it can be used together. more»
  • Word Sketch grammar

    Word Sketch grammar (WSG) is a set of rules defining the grammatical relations (=columns/categories) in a Word Sketch. WSG is language dependent, the same WSG cannot be shared across languages. Different corpora in the same language can use the same or different WSG. Users can write their own WSG to match their specific need. Corpora in unsupported languages can make use of a universal WSG which provides only basic statistics of words surrounding the keywords ignoring the grammar of the language. The universal WSG can also be modified by the user. more»