Entries by Michal Cukr

Better tools for Portuguese corpora

We have improved our tools for processing corpora in Brazilian Portuguese and European Portuguese. Now our tools can recognise words from both Portuguese varieties. Moreover, we have made the word sketches for Portuguese better.

Sketch Engine calendar 2018 – June

Check the next page from our Sketch Engine calendar 2018. This time, you will learn how to search inside a corpus structure using the Corpus query language operator within.

Find good examples in German with Sketch Engine

Are you looking for good German examples in context? Do you need German collocations or German thesaurus for your work? Our tool deSkELL, a free simplified interface of Sketch Engine, is the right choice for these types of tasks. Try it on https://deskell.sketchengine.co.uk/

New English corpus from the Web

Check our new 15-billion-word English corpus (enTenTen) comprised of texts from the Web until the end of 2015.

POS tags

This blog post defines what POS tags are, explains manual and automatic tagging and points readers to Sketch Engine where they can have their texts tagged automatically in many languages. What are POS tags? POS tags (or part-of-speech tags) are special labels assigned to each token (word) in a text corpus to indicate the part […]

Sketch Engine calendar 2018 – April

Are you ready for April Fools’ day this year? How about April Fools’ Day CQL? Download the April page from our calendar with an example of a punctuation search. The example does work! No joking. Sketch Engine is a serious tool after all.

New version of Danish corpus from the web

After improving tools for processing Danish, we are coming with a new version of the Danish Corpus from the web. Texts in this 2-billion-word Danish corpus were downloaded in December 2017.

A new Belarusian corpus (beTenTen)

We are pleased to inform you that a list of Sketch Engine corpora has been extended by adding a new Belarusian corpus, the 63-million-word corpus of texts collected from the web.

Sketch Engine calendar 2018 – March

Similarly to last year, we make the Sketch Engine calendar with useful CQL examples available online. Please download the page for March with handy examples of using an optional character and repetitions.

New French word sketches

Find more and better collocations in French. We have improved our collocation search (the word sketch feature) identifying automatically collocations and patterns specific to French.

A new Amharic corpus

A new 25-million-word Amharic corpus has been added to Sketch Engine.

The best term extraction

Term extraction or terminology extraction is an automatic method of analysing text in order to identify phrases which fulfil the criteria for terms. Terminology extraction has its use in translation and terminology management but also in text analytics where it is used for topic modelling, data mining and information retrieval from unstructured text.

New Italian word sketches

Sketch Engine can now find more and better collocations in Italian. The collocation search (the word sketch feature) identifies automatically collocations and patterns specific for Italian.

Bigger ACL Anthology Reference Corpus

The ACL Anthology Reference Corpus made up of papers from Digital Archive of Research Papers in Computational Linguistics is now almost twice as large. The corpus is freely accessible even without a Sketch Engine account.

Automatic thesaurus

By definition, a thesaurus (plural thesauri, pronounced [-rai]) is a type of dictionary which lists synonyms or words from the same semantic category, e.g. animals, furniture etc.