webcorpora.org will run on Sketch Engine technology
The Sketch Engine technology has found its way to webcorpora.org where NoSketchEngine, an open source version of Sketch Engine, will replace the currently used Colibri². The webcorpora.org users will now benefit from the additional features NoSketchEngine offers. As Roland Schäfer from the Freie Universität Berlin said: ‘Since there is *nothing* you can do with Colibri² that you cannot do with NoSketch Engine (but a lot which can do with NoSketch Engine that you cannot do with Colibri²), we hope you are as excited as we are about this improvement of COW-related services.’
webcorpora.org will now benefit from the continuous development of Sketch Engine striving to achieve even faster and more efficient data processing.
About NoSketch Engine
NoSketch Engine is an open source version of Sketch Engine with certain limitation in functionality and without support for automated corpus building. Corpora for NoSketch Engine have to be prepared by external tools and adequate technical skill is expected of the user to be able to set the system up and maintain it.























































Difference in size per million when using Text Types vs. a subcorpus













Prices for Academic Individual Users



















example 3 python


Dutch Web Corpus














CLAWS tagset - mapping file






Feed Corpus Project




The New Corpus for Ireland | Nua-Chorpas na hÉireann









Icelandic sample corpus


General instructions on corpus data directory structure

Renaming Sketch Grammar relations

Adding sentence boundaries to a compiled corpus


Compatibility Matrix


Sketch Engine API for IntelliWebSearch

Preloaded Configuration Templates

Building sketches from parsed corpora

Word Sketches definition files

Word Sketch Index Format

Highlight Only Part of a Complex Query


Search Punctuation

Compare corpora using word lists

Distinguish Between Lemmas

How do I…?









Sketch Engine Localisation


JSON API - creating query



Full Administration

Text Types, Headers and Subcorpora

Preparing Corpus Text


czes corpus





TED_en corpus


Scottish Gaelic Wiki corpus





Polish Web Corpus (PolishWaC)


Parallel Corpora Registry Info





Internet-ZH corpus

Project Gutenberg Corpus

Fryske Akademy Parallel Corpus















NepaliWaC corpus

SamoanWaC corpus

SetswanaWaC corpus

SpanishWaC corpus

SwedishWaC corpus


SDeWaC corpus

WelshWaC corpus

ThaiWaC corpus

TurkishWaC corpus

UKWaCsst corpus

DANTE: A Detailed, Accurate, Extensive, Available English Lexical Database

GujarathiWaC corpus

Patakis corpus


GeorgianWaC corpus


FinnishWaC corpus


danishWaC corpus

Domain Specific Corpora

ScienceBlog corpus

e-flux corpus

Environment corpus

Filipino web corpus (FilipinoWaC)



Nineteenthcentury corpus





Penn Historical Corpora





Clustering




Manual for GDEX


Syntax of GDEX configuration files










Dynamic Attributes





Corpus Factory Method

New Model Corpus





Corpus configuration example


Preparing a Text Corpus for Sketch Engine: Overview



Sketch Engine Video Tutorials




Compiling corpus


Common corpus structures

Scripts for adding header fields

Variation in hit counts
































Adam Kilgarriff: Structured bibliography

Research Agenda



Word Sketch highlights










