• vertical file

    A vertical file is a text file where each token (or word) is on a separate line. This format is typically used for text corpora and may contain additional metainformation (annotation). Vertical files are usually created from a prevertical format. The first column contains tokens and structures, the other columns may contain part of speech, lemmas or other positional attributes. An example of a vertical file:
    <doc genre="fiction" title="1984" author="G. Orwell">
    <chapter no="1.1">
    <p>
    <s>
    It       PP   it-d
    was      VBD  be-v
    a        DT   a-x
    bright   JJ   bright-j
    cold     JJ   cold-j
    day      NN   day-n
    in       IN   in-i
    April    NP   April-n
    <g/>
    ,        ,    ,-x
    and      CC   and-c
    the      DT   the-x
    clocks   NNS  clock-n
    were     VBD  be-v
    striking JJ   striking-j
    thirteen CD   thirteen-m
    <g/>
    .       SENT    .-x
    </s>
    ...
    </p>
    ...
    </chapter>
    </doc>
    column 1: tokens and structures column 2: part of speech tags column 3: lempos attribute See more details on how to prepare a text for the vertical format.