MAPTO directive | Sketch Engine

Adding MAPTO directives

MAPTO directive in attribute definitions serves for defining a map between the attribute and another attribute. It is computed from vertical data using tool mknormattr.

First you need to define which attribute should be mapped to what. This is done in the corpus configuration file. It is allowed to use multiple values separated by a comma character in MAPTO. Let us say we want to have a mapping between word and WSATTR for a given corpus (let it be lempos for the purpose of this example). Then we specify:

ATTRIBUTE word {
        MAPTO lempos
}

Now we let manatee compute the mapping by running this command:

mknormattr CORPUS SOURCE_ATTRIBUTE TARGET_ATTRIBUTE

Let us say we work with BNC, so the command will look like this:

mknormattr bnc2_tt21 word lempos

A single attribute can have multiple mapto directives, separated with comma, i.e.

ATTRIBUTE word {      MAPTO "lempos,word_diac"  }

It takes a few seconds for a 100-million-word corpus.

Mapped attributes in concordance queries

After the computation, we can e.g. query the corpus for wordforms for which we don’t know the lemma. If we have MAPTO word=>lemma, we can use CQL [word@lemma="mice"] and we obtain hits also for “mouse”.

$ corpquery bnc2_tt2 '[word@lemma="mice"]' -n
2501
$ corpquery bnc2_tt2 '[lemma="mouse"]' -n
2501

The -n parameter tells corpquery to count the results and do not output the hits. Similarly, we can do these queries (with the mapping between word and lempos):

$ corpquery bnc2_tt2 '[word@lempos="books"]' -n
36562
$ corpquery bnc2_tt2 '[word="books"]' -n
11597

Mapped attributes in Word Sketch queries

To enable querying Word Sketches with both Chinese ideograms and a Japanese syllabic script (provided the source vertical contains both information):

1. Put the MAPTO directive to the respective corpus attributes in the corpus registry file:

ATTRIBUTE word
ATTRIBUTE word_kana {
    MAPTO lemma
}
ATTRIBUTE lemma
ATTRIBUTE lemma_kana {
    MAPTO lemma
}

2. Build the mapping

mknormattr CORPUS word_kana lemma
mknormattr CORPUS lemma_kana lemma

3. Enjoy querying Word Sketches. E.g. entering both “都市” (“city”, Chinese ideograms) or “シティ” (“city”, Katakana script) will lead to Word Sketch for “都市” (“city”, Chinese ideograms).

Adding MAPTO directives

Mapped attributes in concordance queries

Mapped attributes in Word Sketch queries

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine