Sketch Engine is a corpus manager and analysis software has developed by Lexical Computing since 2003. This software consists of three main components which enable to search and build text corpora.
- Bonito – a graphical user interface to corpora maintained
- Manatee – a corpus management tool including corpus building and indexing, fast querying and providing basic statistical measures, see the changelog of Manatee
- FinLib – fast indexing library, see the changelog of FinLib
A brief overview of main changes in Bonito is listed here.
Current stable version: 3.99.3
| Version | New feature | 
|---|---|
| 3.101 | multiword sketches work even for general 3-and-more-word queries like “very -> young -> man”), using the combination of ws() and ccoll() queries | 
| 3.100.1 | trends: sort alphabetically | 
| 3.100 | filtering trends by trend (positive / negative) | 
| 3.99 | added: total frequency to word lists, link to keywords and terms function into the left menu | 
| 3.98 | speed up SkELL; faster loading corpus info; | 
| 3.97 | automatical PoS for wsdiff (Sketch difference); default showing up to 150 items in text types | 
| 3.96 | Minimum frequency for computing n-grams depends on corpus size; Polish locale added | 
| 3.95 | Slovak locale added | 
| 3.93 | Corpus info page shows subcorpora statistics. | 
| 3.92 | API: hide wordcloud with show_wordcloud=0; hide details disable_show_detail=1; use embed instead of show_only_content; show or hide logo with ske_logo; URLTEMPLATE configuration settings for generating links from structure attributes. | 
| 3.91 | API (simple_n must be float), embedding HTML style, parameter show_only_content=1; show_only_first_refs=1 in parallel concordances. | 
| 3.90 | WSPOSLIST is taken from sketch grammar instead of registry file | 
| 3.89 | saving all bilingual term candidates for user corpora | 
| 3.88 | CQL builder | 
| 3.87 | total number of items in wordlists | 
| 3.86 | longest-commonest matches shown in word sketches by default | 
| 3.84 | AdSense for trial users and for anonymous users browsing open corpora | 
| 3.82 | font awesome icons, French localization | 
| 3.81 | nested n-grams | 
| 3.80 | faster term extraction, requires manatee 2.130 | 
| 3.79 | Spanish localization | 
| 3.78 | various bug fixes | 
| 3.77 | Corpus definition file can be shown in corpus info page | 
| 3.76 | Word sketch lemma coverage | 
| 3.75 | notification about an obsolete sub-corpora, automatic rebuilding | 
| 3.73 | store sub-corpus definition for user sub-corpora so it can be rebuilt | 
| 3.72 | headword can be included in thesaurus word cloud, many minor fixes | 
| 3.71 | interface language settable for anonymous users | 
| 3.70 | trends visualization | 
| 3.69 | time analysis of corpora: trends | 
| 3.68 | save concordance result as subcorpus by structures | 
| 3.67 | Bonito returns HTTP error status codes | 
| 3.66 | reset settings, show the last corpcheck on corp info page if available | 
| 3.65 | header layout changed, system menu enriched | 
| 3.64 | show bilingual term candidates if available (.biterm file) save bilingual terms as TBX/TXT Arabic locale | 
| 3.62 | ske menu under gear icon | 
| 3.61 | export terms/keywords to TBX/CSV | 
| 3.60 | showing longest-commonest match for word sketches if compiled | 
| 3.59 | Format long numbers (12345678 => 12,345,678) | 
| 3.58 | Concordance: select and filter lines on multiple pages Word Sketch new parameter: minimum frequency for multiword links (in boldface) Forms: explain what is wrong in a form when validating | 
| 3.57 | JobRunner in Bonito: background processes, long job management | 
| 3.55 | Thesaurus new parameter: minimum score (default 0.1) | 
| 3.54 | Concordance: display attributes as tooltips SkELL for mobile/touch devices (autodetected) | 
| 3.53 | FREQTTATTRS corpus configuration option for separate settings of text support for concordance examples in freqs | 
| 3.52 | show INFO from corpus configuration on corpus info page | 
| 3.51 | feedback gadget with social icons, link for printing, permalink, feedback form | 
| 3.50 | Concordance: save concordance as subcorpus | 
| 3.47 | select gramrels for bilingual sketches bilingual word sketches with the Translate button | 
| 3.45 | “Save & Change Options” button instead of two separate buttons support for filtering candidate definitions in concordance filtering candidate definitions in concordance | 
| 3.44 | Corpus info page: detailed structure/attribute info | 
| 3.43 | Concordance: filter unselected lines | 
| 3.42 | keywords for subcorpus vs. the rest of the corpus save username in ske.li shortener (and list links for a user) keyword and term extraction API | 
| 3.41 | do not show “Match case” for languages not using upper/lower-cased letters cluster words in thesaurus word cloud | 
| 3.40 | RE support for free text type fields error type select box for Learner corpora | 
| 3.39 | Frequency form: top error lists for error corpora | 
| 3.38 | comma separated thousands for better legibility | 
| 3.37 | auto PoS in Thesaurus and Word Sketch (no need to specify PoS in advance) | 
| 3.35 | show only one hit (1st one) per document in Concordance results, use Filter > 1st hit | 
| 3.34 | resizable box with WS relations in WS form | 
| 3.33 | filtering out overlapping KWICs, use Filter > Overlaps | 
| 3.32 | reciprocal BIM (click on a collocate A, put its translation B and see BIM for B and A) | 
| 3.29 | shortening long references (hover mouse over it to see full reference) | 
| 3.28 | * and ? wildcards in a simple query | 
| 3.27 | corpus info link in the main left menu | 
| 3.26 | icon for printing, printer-friendly layout shortening URL service for SkE (ske.li) | 
| 3.20 | terms can be generated in word list | 
| 3.19 | Terms in word list | 
| 3.17 | slider for simple math parameter in word list form | 
| 3.16 | BIM: bilingual manual sketches | 
| 3.14 | Show line numbers in concordance | 
| 3.13 | Highlight line for which context box is begin shown | 
| 3.12 | Check input boxes before submit | 
| 3.11 | One click copy without Flash | 
| 3.8 | Symmetric multi-word sketches | 
| 3.7 | Deleting labels in annotation | 
| 3.6 | Wordcloud in Thesaurus | 
| 3.5 | Frequency distribution graph Annotation pie chart | 
| 3.3 | BIC word sketches | 
| 3.2 | Corpus info page Filter concordance with selected lines | 
| 3.1 | n-grams in word list combined sketches — merged word sketch for multiple words focus and reference corpus switch on keywords page | 
Older
- grey color for word sketch collocates with low frequency
- breadcrumbs menu in concordance (showing the history of concordance)
- maximum frequency filter in word list
- “more data”, “less data” and “gramrels” switches in bilingual word sketches
2013-02-04 bonito2 v2.98
- a new design of the first form
- bilingual word sketches for parallel corpora
- asynchronous query processing
- parallel concordances — more querying options, saving parallel concordances
- “last sample” link in left menu
- word sketch highlights (display in word sketch that the word is somewhat special — where “somewhat” can be configured)
- button for clearing inputs in word list form
- bilingual word sketches for parallel corpora
- low-freq collocations in word sketch are grey
- new “References up” option of concordance display
2012-07-01 bonito2 v2.91.9
- interface language switch (Czech, English, Irish, Chinese)
- keywords based on word sketches
- shortcut for switching salience/frequencies sorting in Word Sketches
- shortcut switch between Word Sketches in “flat” format and structured according to grammatical relations
- WS keyword can be part of a gramrel name — use %w.
- multiword sketches
- parallel corpora concordancing
- multilevel frequencies have the fourth level and automatically selects the higher level
- toggle “select/deselect all” in Text Types section of Concordance form
- full corpus names in word list results
2011-09-01 bonito2 v2.80.3
- commonest match feature added to wseval
2011-08-03 bonito2 v2.80
- Selection list for gramrels in word sketch form
2011-08-01 bonito2 v2.79
- word sketch as word list (unstructured ws)
2011-07-31 bonito2 v2.78
- *FIXORDER directive in sketch grammar files
- Sketch Diff by subcorpora
- Sketch Diff by word form
- “Char” field added to the concordance form
- the first version of subcorpus hypotheses testing
- new comprehensible word list form:
- more possible option combinations
- blacklists
- filtering non-words
- document counts statistic added
 
- WRAPDETAIL configuration option
- new error corpus functionality
- per-million figures in the concordance view
- “jump to” function on sorted concordance
- wide context frame displays structures
2011-01-31 bonito2 v2.59.1
- new interface design
- keywords across different corpora
- Chinese localisation added
- simple search feature (all in one)
- Tickbox Lexicography for clustered word sketches
- text type multiselect for fields with many values and free input (using the | character)
- support for documentation link for text types (ATTRDOC registry option)
- support for sorting text types according to their numeric value (NUMERIC registry option)
- support for accessing whole document text in the wide context window (STRUCTCX registry option)
2010-02-11 bonito2 v1.43.6
- added support for Constructions
- hierarchical header fields in the interface
- option for displaying empty text types in frequency
- header fields added as the option for sorting and word lists
- header fields added to the filter form
- Irish localization added
2009-07-01 bonito2 v1.35
- SimpleMaths for keywords implemented
- added support for created values in word sketches (new COLLOC directive in sketch grammar)
2009-06-22 bonito2 v1.34
- added ARF as an option for word list output
selected features in previous versions
- Enhanced word list functions: multilevel word list, input from file
- ‘Select All’ buttons added to text types boxes
- special handling of ‘–‘ in query box: sand–box –> (sandbox | sand box | sand-box)
Search text corpora with Sketch Engine
Sketch Engine offers a range of tools to work with text corpora in 100+ languages.
or
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.




