Entries by Ondřej Matuška

Search words and phrases in one step

The pipe (|) makes it possible to include individual words as well as phrases into the same search.

Searching for hyphenated, non-hyphenated and space-separated words in one step

Some compounds can be spelt with a hyphen, without it or as two words separated by a space. Sketch Engine now allows searching for all three in one simple step. Type two hyphens (–) like this: multi–million and Sketch Engine will find multi-million multimillion multi million.

Old interface closes down

20 January 2020 old interface closes down (Please ignore this message if you only use the new interface.) Sketch Engine decided not to maintain two interfaces. For this reason, the old interface closes down and will not be available any more after 20 January 2020. This does not affect user data in any way.

Build a corpus from the web

The web is a great source of readily available textual data but also a bottomless warehouse of spam, machine-generated content and duplicated content unsuitable for linguistic analysis. This may generate some uncertainty about the quality of the language included in the corpora from the web. At Sketch Engine, we are very well aware of the […]

Lexicom 2018, Jesus College, Cambridge, UK

Pencil the dates in your diary. The next Lexicom – a workshop in lexicography and lexical computing takes place at Jesus College, Cambridge, UK from 11 till 15 September 2018.

Corpus annotation and structures

A corpus is a collection of a very large amount of text that is used, together with a suitable corpus management software such as Sketch Engine, to learn about how language is used. It has become an indispensable tool for all modern linguists and lexicographers. A text corpus can consist of only one very long […]

Most frequent or most typical collocations?

Word sketches in Sketch Engine are one-page summaries of word combinations (called collocations) that the word prefers. These summaries are computed automatically based on a sample of language of billions of words called a text corpus.

EUR-Lex Judgements Corpus

A new multilingual corpus in 23 languages of the European Union is now available in Sketch Engine. It was compiled from the Judgements of the EU Court of Justice and is useful for anyone interested in looking up translations in these languages. The total size of the corpus is 608 million words.

Timestamped corpus in 18 languages

17 more languages of the unique diachronic web corpus, the Timestamped JSI Corpus, have been added to Sketch Engine. This makes a total of 18 languages together with the previously announced English corpus. The corpus can be used as:

Build a parallel corpus

Building parallel corpora is now easier! We have rewritten and reorganized our web manual on this topic. There are three options to build a parallel corpus:

A new corpus of Tibetan

A corpus of the Tibetan language has now been added to Sketch Engine. This corpus of 80 million words only contains Classical Tibetan but there are plans add a corpus of Standard Tibetan too. The corpus was built

Meet Sketch Engine in Madrid

Come and meet Sketch Engine at the Cursos de Verano del Escorial 2017 held between 17 and 21 July in Madrid and organized by LexiCon research group (Universidad de Granada) and LEETHI research group (Universidad Complutense). This training school focusses on the acquisition of new techniques and tools widely used today by the language industry to […]

Free Sketch Engine for Learner Corpus Association members

Members of the Learner Corpus Association have free access to Sketch Engine to upload, analyse and share their own learner corpora with other LCA members. The free access lasts for the duration of the LCA membership. To gain free access to Sketch Engine, and to review complete conditions, please review LCA member benefit description.

English Preposition Corpus

Our new unique English Preposition Corpus uncovers how prepositions behave and what senses they have. The corpus features special annotation for the sense of the preposition and also for the semantic class of the word that precedes and follows the preposition. The user can

Spanish: NEW rich collocations and NEW clitics handling

Our new Spanish Word Skteches give a much better coverage of Spanish-specific phenomena such as compound verb tenses, verb constructions, ser/estar or el subjuntivo. Spanish collocation information has never been so rich. decirnos, descargárselo, comerselo are examples of verbs with clitics which pose a problem when searching. Sketch Engine can now handle these much better, searching for decir […]