Entries by Ondřej Matuška

Spoken British National Corpus 2014

The 11-million-word Spoken British National Corpus 2014 is now available in Sketch Engine. Large spoken corpora are extremely demanding to build so we are really excited at this new addition to our corpus portfolio.

CQL builder

The CQL builder helps the user construct CQL queries and not worry about the correct use of brackets, quotes and other characters. The user-friendly interface guides the user towards a valid query for advanced corpus searches.

Case sensitive and insensitive corpus analysis

This blog post explains how to analyse corpora and take into account or ignore the difference between lowercase and uppercase. In other words, how to use Sketch Engine to: type wifi and find wifi, WIFI, WiFi and Wifi OR type WiFi and only find WiFi but not the other variants

Fit more text on the screen

Each user can select the line density that suits them best. A high density fits more text on the screen but is less comfortable to the eye. A low density is more pleasant to work with but does not fit as much data on the same screen.

Display and hide statistics and counts.

Only display the numbers you really need to see Counts and statistics are hidden in most places when the user logs in for the first time. Watch the video to learn they can be displayed.

Words, tags, lemmas, lemposes, lowercase

When using Sketch Engine, every now and then the user comes across the word attribute and its values: words, tags, lemmas, lempos, lowercase and some others depending on the corpus and language. This blog post explains how these positional attributes, to use the correct terminology, work in Sketch Engine and how the user can benefit […]

Parallel corpus – how to search

If you have 4 minutes, why not spend them learning how to find parallel corpora in Sketch Engine and how to use them. A new video on our YouTube channel.

Search words and phrases in one step

The pipe (|) makes it possible to include individual words as well as phrases into the same search.

Searching for hyphenated, non-hyphenated and space-separated words in one step

Some compounds can be spelt with a hyphen, without it or as two words separated by a space. Sketch Engine now allows searching for all three in one simple step. Type two hyphens (–) like this: multi–million and Sketch Engine will find multi-million multimillion multi million.

Old interface closes down

20 January 2020 old interface closes down (Please ignore this message if you only use the new interface.) Sketch Engine decided not to maintain two interfaces. For this reason, the old interface closes down and will not be available any more after 20 January 2020. This does not affect user data in any way.

Build a corpus from the web

The web is a great source of readily available textual data but also a bottomless warehouse of spam, machine-generated content and duplicated content unsuitable for linguistic analysis. This may generate some uncertainty about the quality of the language included in the corpora from the web. At Sketch Engine, we are very well aware of the […]

Lexicom 2018, Jesus College, Cambridge, UK

Pencil the dates in your diary. The next Lexicom – a workshop in lexicography and lexical computing takes place at Jesus College, Cambridge, UK from 11 till 15 September 2018.

Corpus annotation and structures

A corpus is a collection of a very large amount of text that is used, together with a suitable corpus management software such as Sketch Engine, to learn about how language is used. It has become an indispensable tool for all modern linguists and lexicographers. A text corpus can consist of only one very long […]

Most frequent or most typical collocations?

Word sketches in Sketch Engine are one-page summaries of word combinations (called collocations) that the word prefers. These summaries are computed automatically based on a sample of language of billions of words called a text corpus.