Our new Spanish Word Skteches give a much better coverage of Spanish-specific phenomena such as compound verb tenses, verb constructions, ser/estar or el subjuntivo. Spanish collocation information has never been so rich.
decirnos, descargárselo, comerselo are examples of verbs with clitics which pose a problem when searching. Sketch Engine can now handle these much better, searching for decir will find instances with and without the pronouns, i.e. decir and decirle, diciéndo and diciéndole
This is available in the European Spanish Web 2011 (eseuTenTen11) or any newly created user corpora.
New Spanish Word Sketches
The new word sketch grammar now analyses Spanish specific phenomena for example
- adjectives preceded by ser or estar
- statistics of verb constructions (perífrasis verbales) in which a verb tends to appear
- noun phrases using de
- statistics of the subjunctive compared to the indicative
and many others.
See an example of the word sketch for apoyar (v) and claro (adj).
Availability
Available in the European Spanish Web 2011 (eseuTenTen11) or any newly created user corpora.
Upgrading your corpus
Previously created user corpora need to be upgraded and re-compiled to bring in the new functionality. Start the re-compilation and you will be invited to upgrade the corpus during the process – watch out for a yellow message.
New clitics handling
Using the simple search option and typing a verb (dar, poner etc.) or a pronoun in its object form (me, le, nos, se…) will find instances of the verb with and without the pronouns (dar and also darse, dárselo…) or pronouns on their own as well as attached to a verb (se and also ponerse, ponerselo…). This is default behaviour in simple search.
In the CQL search, use:
[lemma="dar"]
[morphemes="se"]
to replicate the former and the latter example respectively.
To find verbs with attached pronouns se and lo, use:
[morphemes="se" & morphemes="lo"]
To find verbs with any attached pronouns, use:
[tags="V.*" & tags="PP.*"]
Note the use of tags, not tag.
New attributes for clitics handling
To enable this functionality, Sketch Engine uses two new multi-value attributes for Spanish:
morphemes – lists the morphemes which make up the token
tags – lists the tags related to the morphemes within the token
word form | lemma | tag | morphemes | tags | notes |
---|---|---|---|---|---|
digo | decir | VMIP1S0 | decir | VMIP1S0 | 1 token, 1 morpheme, 1 tag |
decirle | decir | MN0000 | decir le | MN0000 PP3CSD0 | 1 token, 2 morphemes, 2 tags |
diselo | decir | VMM02S0 | decir se lo | VMM02S0 PP3CN00 PP3MSA0 | 1 token, 3 morphemes, 3 tags |
Availability
Available in the European Spanish Web 2011 (eseuTenTen11) or any newly created user corpora.
Upgrading your corpus
Previously created user corpora need to be upgraded and re-compiled to bring in the new functionality. Start the re-compilation and you will be invited to upgrade the corpus during the process – watch out for a yellow message.















































Difference in size per million when using Text Types vs. a subcorpus













Prices for Academic Individual Users



















example 3 python


Dutch Web Corpus














CLAWS tagset - mapping file






Feed Corpus Project




The New Corpus for Ireland | Nua-Chorpas na hÉireann









Icelandic sample corpus


General instructions on corpus data directory structure

Renaming Sketch Grammar relations

Adding sentence boundaries to a compiled corpus

Sketch Grammar development corpora


Compatibility Matrix

Uploading multiple files to Sketch Engine

Sketch Engine API for IntelliWebSearch

Preloaded Configuration Templates

Building sketches from parsed corpora

Word Sketches definition files

Word Sketch Index Format

Highlight Only Part of a Complex Query


Search Punctuation

Compare corpora using word lists

Distinguish Between Lemmas

How do I…?









Sketch Engine Localisation


JSON API - creating query



Full Administration

Text Types, Headers and Subcorpora

Preparing Corpus Text


czes corpus





TED_en corpus


Scottish Gaelic Wiki corpus



Romanian WaC (RoWaC) corpus


Polish Web Corpus (PolishWaC)


Parallel Corpora Registry Info





Internet-ZH corpus

Project Gutenberg Corpus

Fryske Akademy Parallel Corpus















NepaliWaC corpus

SamoanWaC corpus

SetswanaWaC corpus

SpanishWaC corpus

SwedishWaC corpus


SDeWaC corpus

WelshWaC corpus

ThaiWaC corpus

TurkishWaC corpus

UKWaCsst corpus

DANTE: A Detailed, Accurate, Extensive, Available English Lexical Database

GujarathiWaC corpus

Patakis corpus


GeorgianWaC corpus


FinnishWaC corpus


danishWaC corpus

Domain Specific Corpora

ScienceBlog corpus

e-flux corpus

Environment corpus

Filipino web corpus (FilipinoWaC)



Nineteenthcentury corpus





Penn Historical Corpora





Clustering




Manual for GDEX


Syntax of GDEX configuration files










Dynamic Attributes





Corpus Factory Method

New Model Corpus





Corpus configuration example


Preparing a Text Corpus for Sketch Engine: Overview



Sketch Engine Video Tutorials




Compiling corpus


Common corpus structures

Scripts for adding header fields

Variation in hit counts
































Adam Kilgarriff: Structured bibliography

Research Agenda



Word Sketch highlights











