corpus attributes

Words, tags, lemmas, lemposes, lowercase

When using Sketch Engine, every now and then the user comes across the word attribute and its values: words, tags, lemmas, lempos, lowercase and some others depending on the corpus and language. This blog post explains how these positional attributes, to use the correct terminology, work in Sketch Engine and how the user can benefit […]

corpus from the web

Build a corpus from the web

The web is a great source of readily available textual data but also a bottomless warehouse of spam, machine-generated content and duplicated content unsuitable for linguistic analysis. This may generate some uncertainty about the quality of the language included in the corpora from the web. At Sketch Engine, we are very well aware of the […]

blog: pos tags

POS tags

This blog post defines what POS tags are, explains manual and automatic tagging and points readers to Sketch Engine where they can have their texts tagged automatically in many languages. What is a POS tag? A POS tag (or part-of-speech tag) is a special label assigned to each token (word) in a text corpus to indicate […]

Screenshot from OneClick Terms – term extraction tool

The best term extraction

Term extraction or terminology extraction is an automatic method of analysing text in order to identify phrases which fulfil the criteria for terms. Terminology extraction has its use in translation and terminology management but also in text analytics where it is used for topic modelling, data mining and information retrieval from unstructured text.