Adam Kilgarriff’s talk The Long Road from Text to Meaning is a nice introduction to dictionaries, lexicography and collocations.
Revolutionize the dictionary-building process
Sketch Engine offers tools to significantly speed up the process of dictionary building while making it more accurate, efficient, complete and consistent. Sketch Engine’s suite of lexicographic tools is designed for lexicographers striving to conform to modern standards. Sketch Engine focusses lexicographers on what is typical in a language while assuring that any new use will be brought to their attention as soon as it starts entering common use.
OneClick Dictionary – fully automated dictionary drafting process encompassing the following functionality available also as stand-alone tools:
- Word list and n-grams
- Concordance (and frequency count)
- Word Sketch and Word Sketch difference
- Parallel corpora
The existence of a suitable corpus is a prerequisite for any serious lexicographic work. Sketch Engine already comes with hundreds of corpora but also offers all the tools needed to build a text corpus. The user can create a corpus from their own materials, have Sketch Engine download suitable texts from the web or combine both methods.
Sketch Engine strives to develop highly usable tools to respond to lexicographers’ needs.
We believe that lexicographers’ time should not be wasted on tasks that can be completed by machines. We also believe strongly that post-editing existing content is more economical both in terms of time and money. This is why we developed OneClick Dictionary.
OneClick Dictionary is a feature of Sketch Engine combining all lexicography related functionality into a one-click process producing a machine generated dictionary draft to be post-edited by lexicographers.
The process involves generating a headword list, providing part-of-speech labels, usage labels, generating candidates for example sentences, collocations, synonyms and thesaurus entries, definitions and/or translations.
The output is pushed into the Lexonomy dictionary writing system from where lexicographers can communicate with Sketch Engine during the post-editing phase. Export to other DWSs is also possible.
Headword list development — word list
In the past, the most reliable way of developing a headword list was by copying it from an existing dictionary. This meant that neologisms took a long time to enter a dictionary and words gone out of use stayed in dictionaries much longer than was desirable because gathering sufficient evidence to justify their inclusion or removal was a long process.
Sketch Engine’s word list feature can generate a list of headwords or even word forms including any neologisms directly supported by evidence of the extent of use.
Writing entries — concordance, frequency, n-grams and Word Sketches
Discovering word senses and other lexical units (fixed phrases, compounds, mutliword expressions etc.) is easy with an advanced concordance search aided but a vast number of search options including CQL. The frequency count can shed light on a typical preference of a word in terms of text type or subject area.
The most frequently used multiword expressions can be identified with n-grams.
Word Sketches, Sketch Engine’s hallmarks feature, shed light on the syntactic and collocational behaviour by summarising information from thousands of concordance hits on an easy to understand screen with direct access to the underlying evidence. Close synonyms can be analysed further with Word Sketch Difference.
Parallel corpora are an invaluable resource for looking up translation candidates including the less obvious ones and also indirect translations, i.e. cases when a shot expression is translated using a longer phrase or a sentence. Sketch Engine offers the search feature and also a selection of parallel corpora.
Building a thesaurus
The thesaurus feature provides suggestions of similar words based on distributional semantics, i.e. based on identifying words which tend to appear surrounded by the same words as the search word. This yields surprisingly accurate results especially when used together with the large corpora provided Sketch Engine.
Adam Kilgarriff (2013). Using corpora as data sources for dictionaries. In Howard Jackson (ed.) The Bloomsbury Companion to Lexicography, Bloomsbury, London. Chapter 4.1, pp. 77–96.
Adam Kilgarriff (2009). Putting the corpus into the dictionary. In Perspectives in Lexicography: Asia and Beyond, Israel, pp. 239–247.