Sketch Engine has a dedicated interface for working with learner corpora. The interface allows the user to search by the error itself, by the type of error, by the correction or by a combination of any of the aforementioned criteria.

In addition, any metadata included in the corpus can be used in the search and analysed to get information about how learner mistakes are distributed across age groups, proficiency levels, mother tongue, types of test tasks etc.

A correctly constructed learner corpus can provide answers to global questions such as:

  • what is the most frequent type of error
  • which age group makes most mistakes

as well as very specific questions:

  • are mistakes related to verb tenses more frequent at B2 or C1 level?

Search options available in the learner corpus search interfaceLearner corpus search interface example

Text types are generated from the metadata included in the corpus.

Search for a Learner corpus

When Error query is selected, you can search on the error code or the correction code on the main panel by selecting your choice in the drop-down box.

When you click make concordance, the text marked as having that error (or correction) code is selected as the KWIC (middle column of the concordance). The error codes are listed in the error codes link. You can use a wild card search e.g. .* or #M.* You can also search on the actual text marked with the error or correction codes using the Incorrect word/s or Corrected word/s boxes respectively.

Creating and setting up a learner corpus

Creating learner corpora with error annotations is a process for advanced users which requires experience with creating common corpora.

A learner corpus can be created from an error annotated text or an error annotated vertical file.

Please contact us if the data are in a different format. Our team will inspect your data and will advise or assist in converting and/or uploading your data.

Create a learner corpus from

Common text formats

For the learner corpus search interface to work correctly, annotate the errors and corrections with err and corr structures like this and upload the data in the usual way.

We attended a <err type="typo">cnoference</err><corr type="typo">conference</corr> in Rio last week. The weather <err type="tense">has been</err><corr type="tense">was</corr>very nice.

A vertical file

If the corpus is to be uploaded as a vertical file, it should follow these specifications:

<err type="Typo">
cnoference      NN      cnoference-n
</err>
<corr type="Typo">
conference      NN      conference-n
</corr>

means “cnoference” corrected as “conference”. The following structures are mandatory in the error corpora, as well as their proper closures , (this is because of nesting):

Set up a learner corpus in a nutshell

See setting up a learner corpus

  1. create a learner corpus from common text format or vertical format
  2. open your corpus and click Manage corpus in the left menu
  3. select Configure corpus in the left menu
  4. switch to expert mode by clicking on the link at the top
  5. in the textarea, find STRUCTURE “corr” and STRUCTURE “err”
  6. edit the structure settings with adding directives DISPLAYBEGIN, DISPLAYEND and DISPLAYCLASS as follows
STRUCTURE "err" {
...
DISPLAYTAG 0 
DISPLAYBEGIN "<err type=%(type)>" 
DISPLAYEND "|" 
DISPLAYCLASS "concred" 
...
}
STRUCTURE "corr" {
...
DISPLAYTAG 0 
DISPLAYBEGIN "" 
DISPLAYEND "</err>" 
DISPLAYCLASS "concgreen" 
...
}

7. finally make a concordance and set (in View options) err and corr structures as visible

Common requirements

The following structures are mandatory in the error corpora, as well as their proper closures , (this is because of nesting):

<err></err>

and

<corr></corr>

The ‘type’ must be the same in both the error and the respective correction.

Both the error and the correction can be empty, indicating that a word was inserted or deleted by the corrector. A special ===NONE=== token must be inserted. For example:

cnoference NN cnoference-n [[BR]] ===NONE=== ===NONE=== ===NONE===

This means that the word “cnoference” was deleted by the corrector.

‘Double errors’ can be indicated by nesting the structures. For example:

international JJ international-j conference NN conference-n cnoference NN cnoference-n conference NN conference-n ===NONE=== ===NONE=== ===NONE===

In the example above, the word conference was misspelt “cnoference” and also repeated so the corrector first corrected the spelling and then marked it as a deletion.

Visual style for error and correction

In learner corpora, the content of errors is usually rendered in red colour and the corrections in green colour.

See how to change the colour of error annotation

Please bear in mind that changing the colour of error annotation is for expert user and incorrect settings may cause not working corpus search.

  1. Open your learner corpus.
  2. Click Manage corpus.
  3. Click Configure corpus.
  4. Switch to Expert mode by clicking on link at the top.
  5. Define DISPLAYCLASS in the configuration file for the two appropriate structure definitions. E.g.
STRUCTURE err {
    DISPLAYCLASS "errclass"
}
STRUCTURE corr {
    DISPLAYCLASS "corrclass"
}

6. In CSS file (view.css) you may define the class and add styles you want:

.errclass {
    background-color: red;
    color: white;
    font-weight: bold;
}
.corrclass {
    background-color: green;
    color: white;
    font-weight: bold;
}