Sketch Engine has a dedicated interface for working with learner corpora. The interface allows the user to search by the error itself, by the type of error, by the correction or by a combination of any of the aforementioned criteria. This error analysis is also useful for second language acquisition and foreign language learning.

In addition, any metadata included in the corpus can be used in the search and analysed to get information about how learner mistakes are distributed across age groups, proficiency levels, mother tongue, types of test tasks etc.

A correctly constructed learner corpus can provide answers to global questions such as:

  • what is the most frequent type of error
  • which age group makes most mistakes

as well as very specific questions:

  • are mistakes related to verb tenses more frequent at B2 or C1 level?

Learner corpus search interface example

The interface for error analysis is available through the Concordance tool. Text types are generated from the metadata included in the corpus.

Search  a Learner corpus

When Error query is selected, you can search on the error code or the correction code on the main panel by selecting your choice in the drop-down box.

When you click make concordance, the text marked as having that error (or correction) code is selected as the KWIC . The error codes are listed in the error codes link. You can use a wild card search e.g. .* or #M.* You can also search on the actual text marked with the error or correction codes using the Incorrect word/s or Corrected word/s boxes respectively.

Creating and setting up a learner corpus

Creating learner corpora with error annotations is a process for advanced users which requires experience with creating common corpora.

A learner corpus can be created from an error annotated text or an error annotated vertical file.

Please contact us if the data are in a different format. Our team will inspect your data and will advise or assist in converting and/or uploading your data.

Create a learner corpus from

Common text formats

For the learner corpus search interface to work correctly, annotate the errors and corrections with err and corr structures like this and upload the data in the usual way.

We attended a <err type="typo">cnoference</err><corr type="typo">conference</corr> in Rio last week. The weather <err type="tense">has been</err><corr type="tense">was</corr>very nice.

A vertical file

If the corpus is to be uploaded as a vertical file, it should follow these specifications:

<err type="Typo">
cnoference      NN      cnoference-n
<corr type="Typo">
conference      NN      conference-n

Set up a learner corpus in a nutshell

  1. create a learner corpus from common text format or vertical format
  2. open your corpus and click Manage corpus on Dashboard
  3. select Configure tab
  4. confirm “I am an expert”
  5. in the text area, find STRUCTURE “corr” and STRUCTURE “err”
  6. edit the structure settings with adding directives DISPLAYBEGIN, DISPLAYEND and DISPLAYCLASS as follows
STRUCTURE "corr" {
DISPLAYCLASS "concgreen" 

You can also define the color in the RGB notation, e.g. DISPLAYCLASS “#FF0000” for the red color.

7. finally make a concordance and set (in View options) err and corr structures as visible

Common requirements

The following structures are mandatory in the error corpora, as well as their proper closures:




The ‘type’ must be the same in both the error and the respective correction.

Both the error and the correction can be empty, indicating that a word was inserted or deleted by the corrector. A special ===NONE=== token must be inserted. For example:

<err type="DeletedWord">
cnoference      NN      cnoference-n
<corr type="DeletedWord">
===NONE===      ===NONE===      ===NONE===

This means that the word “cnoference” was deleted by the corrector.