Adding MAPTO directives

MAPTO directive in attribute definitions serves for defining a map between the attribute and another attribute. It is computed from vertical data using tool mknormattr.

First you need to define which attribute should be mapped to what. This is done in the corpus configuration file. It is allowed to use multiple values separated by a comma character in MAPTO. Let us say we want to have a mapping between word and WSATTR for a given corpus (let it be lempos for the purpose of this example). Then we specify:

        MAPTO lempos

Now we let manatee compute the mapping by running this command:


Let us say we work with BNC, so the command will look like this:

mknormattr bnc2_tt21 word lempos

It takes a few seconds for a 100-million-word corpus.

Mapped attributes in concordance queries

After the computation, we can e.g. query the corpus for wordforms for which we don’t know the lemma. If we have MAPTO word=>lemma, we can use CQL [word@lemma="mice"] and we obtain hits also for “mouse”.

$ corpquery bnc2_tt2 '[word@lemma="mice"]' -n
$ corpquery bnc2_tt2 '[lemma="mouse"]' -n

The -n parameter tells corpquery to count the results and do not output the hits. Similarly, we can do these queries (with the mapping between word and lempos):

$ corpquery bnc2_tt2 '[word@lempos="books"]' -n
$ corpquery bnc2_tt2 '[word="books"]' -n

Mapped attributes in Word Sketch queries

To enable querying Word Sketches with both Chinese ideograms and a Japanese syllabic script (provided the source vertical contains both information):

1. Put the MAPTO directive to the respective corpus attributes in the corpus registry file:

ATTRIBUTE word_kana {
    MAPTO lemma
ATTRIBUTE lemma_kana {
    MAPTO lemma

2. Build the mapping

mknormattr CORPUS word_kana lemma
mknormattr CORPUS lemma_kana lemma

3. Enjoy querying Word Sketches. E.g. entering both “都市” (“city”, Chinese ideograms) or “シティ” (“city”, Katakana script) will lead to Word Sketch for “都市” (“city”, Chinese ideograms).