Please read first about what dynamic attributes are and how they are setup in the corpus configuration file documentation.

Internal dynamic functions

The following table gives an overview of existing builtin dynamic functions together with examples of usage:

- striplastn         (str, n) - returns str striped from last n characters
- lowercase     (str, locale) - returns str in lowercase (for any single-byte encoding and the corresponding locale)
- utf8lowercase         (str) - returns str in lowercase (for any utf-8 encoded string str)
- utf8uppercase         (str) - returns str in uppercase (for any utf-8 encoded string str)
- utf8capital           (str) - returns str with first character capitalized (for any utf-8 encoded string str)
- getfirstn          (str, n) - returns first n characters of str
- getlastn           (str, n) - returns last n characters of str (for any single-byte encoding)
- utf8getlastn       (str, n) - returns last n characters of str (for any utf-8 encoded string)
- getfirstbysep      (str, c) - returns prefix of str up to the character c (excluding)
- getnbysep       (str, c, n) - returns n-th component of str according to the delimiter c (excluding)
- getnchar           (str, n) - returns n-th character of str 
- getnextchars    (str, c, n) - returns n characters after character c
- getnextchar        (str, c) - returns the character after character c 
- url2domain         (str, n) - returns n-th component of the URL (0 = web domain, 1 = top level domain, 2 = second level domain) 
- ascii    (str, enc, locale) - returns ASCII transliteration of the string according to the given encoding and locale
ATTRIBUTE   lemma {
          DYNAMIC    striplastn
          DYNLIB     internal
          ARG1       "2"
          FUNTYPE    i
          FROMATTR   lempos
          DYNTYPE    index
}
ATTRIBUTE "lemma2" {
          ARG1 "-"
          ARG2 "1"
          DYNAMIC "getnbysep"
          DYNLIB "internal"
          DYNTYPE "index"
          FROMATTR "lempos2"
          FUNTYPE "ci"
}
ATTRIBUTE   lc {
          DYNAMIC    lowercase
          DYNLIB     internal
          ARG1       "C"
          FUNTYPE    s
          FROMATTR   word
          DYNTYPE    index
          TRANSQUERY yes
}
ATTRIBUTE   tag {
         DYNAMIC     getfirstn
         DYNLIB      internal
         ARG1        "3"
         FUNTYPE     i
         FROMATTR    ambtag
         DYNTYPE     index
}
ATTRIBUTE   k {
         DYNAMIC     getnchar
         DYNLIB      internal
         ARG1        1
         FUNTYPE     i
         FROMATTR    tag
         DYNTYPE     index
}
ATTRIBUTE   g {
         DYNAMIC     getnextchar
         DYNLIB      internal
         ARG1        "g"
         FUNTYPE     c
         FROMATTR    tag
         DYNTYPE     index
}
ATTRIBUTE   g3 {
         DYNAMIC     getnextchar
         DYNLIB      internal
         ARG1        "g"
         ARG2        3
         FUNTYPE     ci
         FROMATTR    tag
         DYNTYPE     index
}

Dynamic functions from a shared library

A shared library function must return const char*.

The following example function takes the year of publishing of the document and determines the epoch from which the document comes.

  • the source code (epoch.c):
    #include <stdio.h>
    
    const char * epoch (char* year)
    {
           int y;
           sscanf(year, "%d",&y);
           if(y<1990) return ("before 1990");
           if(y<2001) return ("1990-2000");
           if(y<2005) return ("2001-2004");
           if(y<2009) return ("2005-2008");
           return ("2009 and later");
    }
    
  • to compile the library use:
    gcc -Wall -fPIC -DPIC -shared -o epoch.so epoch.c
    
  • the important part from the corpus configuration file:
    STRUCTURE doc {
        ATTRIBUTE year
        ATTRIBUTE time {
            DYNAMIC    epoch
            DYNLIB     "/corpora/vert/greek/epoch.so"
            FUNTYPE    0
            FROMATTR   year
            DYNTYPE    index
            TRANSQUERY yes
        }
    }
    

Dynamic functions from a shell script

In this case the dynamic function is implemented as a shell pipe, e.g.:
ATTRIBUTE "case" {
DYNAMIC "/somewhere/somescript.py"
DYNLIB "pipe"
DYNTYPE "freq"
FROMATTR "tag"
LABEL "grammatical case"
}

Where somescript.py shall read a line from the standard input, perform the transformation and write on the standard output, e.g.:

#!/usr/bin/python3
import sys
for tag in sys.stdin:
   new_tag = do_something(tag.strip())
   print(new_tag)