Word sketches in Sketch Engine are one-page summaries of word combinations (called collocations) that the word prefers. These summaries are computed automatically based on a sample of language of billions of words called a text corpus.
Author Archive for: ondrejm
About Ondřej Matuška
This author has not written his bio yet.
But we are proud to say that Ondřej Matuška contributed 39 entries already.
Entries by Ondřej Matuška
A new multilingual corpus in 23 languages of the European Union is now available in Sketch Engine. It was compiled from the Judgements of the EU Court of Justice and is useful for anyone interested in looking up translations in these languages. The total size of the corpus is 608 million words.
17 more languages of the unique diachronic web corpus, the Timestamped JSI Corpus, have been added to Sketch Engine. This makes a total of 18 languages together with the previously announced English corpus. The corpus can be used as:
Building parallel corpora is now easier! We have rewritten and reorganized our web manual on this topic. There are three options to build a parallel corpus:
A corpus of the Tibetan language has now been added to Sketch Engine. This corpus of 80 million words only contains Classical Tibetan but there are plans add a corpus of Standard Tibetan too. The corpus was built
Come and meet Sketch Engine at the Cursos de Verano del Escorial 2017 held between 17 and 21 July in Madrid and organized by LexiCon research group (Universidad de Granada) and LEETHI research group (Universidad Complutense). This training school focusses on the acquisition of new techniques and tools widely used today by the language industry to […]
Members of the Learner Corpus Association have free access to Sketch Engine to upload, analyse and share their own learner corpora with other LCA members. The free access lasts for the duration of the LCA membership. To gain free access to Sketch Engine, and to review complete conditions, please review LCA member benefit description.
Our new unique English Preposition Corpus uncovers how prepositions behave and what senses they have. The corpus features special annotation for the sense of the preposition and also for the semantic class of the word that precedes and follows the preposition. The user can
Our new Spanish Word Skteches give a much better coverage of Spanish-specific phenomena such as compound verb tenses, verb constructions, ser/estar or el subjuntivo. Spanish collocation information has never been so rich. decirnos, descargárselo, comerselo are examples of verbs with clitics which pose a problem when searching. Sketch Engine can now handle these much better, searching for decir […]
The Word Sketches for Simplified Chinese have been completely rewritten and provide a much more complete information about Chinese collocations.
Complex concordance searches are now easier in Sketch Engine thanks to the new CQL manual. The content has been completely rewritten and reorganized into logical sections.
The World Organization for the Development of N’ko has decided to host their N’ko corpus with Sketch Engine. The corpus is hosted as an open corpus and is freely accessible even without a Sketch Engine account.
Meet Sketch Engine at Europhras 2017 Sketch Engine will be attending Europhras 2017 in London to give talks and also a workshop where you can learn directly from the Sketch Engine experts about all the ins and outs of the system.
Diachronic corpus of English A unique diachronic corpus of English newsfeeds, the Jozef Stefan Institute Timestamped web Corpus, has been added to Sketch Engine. It can also serve as an excellent contemporary web corpus.
XLIFF support in Sketch Engine Sketch Engine users can now create their user corpora from texts in the XLIFF file format, a format used during localization processes and a standard for CAT tools.