Text analytics with Sketch Engine
The Sketch Engine software is a comprehensive suite of text analysis tools designed to handle texts in many languages and scripts with a size of billions of words. The analysis takes into account the linguistic features of each language such as morphology or grammar and is suitable for various text analysis techniques.
Text analysis API
All functionality is also available via the Sketch Engine text analysis API. To test the different functionalities, register a free trial account.
All Sketch Engine accounts come with API for text analysis that supports the complete Sketch Engine functionality.
Keyword frequency, term extraction and term frequency will be useful for topic modelling by identifying words and phrases typical for the content of the text. Our API supports this topic modelling.
Calculating word frequency is a frequent task in text analytics. Sketch Engine contains tools to calculate frequencies of words, phrases, n-grams as well as grammatical or lexical structures, e.g. the frequency of verbs in the past tense as compared to the present tense. Word frequency is included in our API.
The wordlist tool will calculate word frequency with plentiful filtering options such as words starting, containing or ending in a particular way or list of nouns, verbs and other parts of speech. Combining the criteria is supported as well as the use of regular expressions.
To analyse texts by looking at multiword expressions, Sketch Engine will compute the frequency of n-grams of different sizes. Texts with a size of billions of words are supported.
Co-occurrence analysis (web or API)
Co-occurrence analysis reveals information about the context in which words appear and helps us understand how the core meaning of the word is modified. Co-occurrence analysis is supported by our text analytics API. This type of text analysis can be done by using the following tools:
A word sketch gives an at-a-glance one-page overview of the context in which the word appears. The context can be clearly understood from the collocations the word keeps.
Word sketches support the clustering of collocations to group similar collocations and reveal topics these collocations represent.
Automatic synonym identification produces a thesaurus entry for every word in the language. The algorithm exploits the theory of distributional semantics which says that words similar in meaning tend to appear in similar context. This produces an automatic thesaurus.