vertical file

A vertical file is a text file where each token (or word) is on a separate line. This format is typically used for text corpora and may contain additional metainformation (annotation). Vertical files are usually created from a prevertical format.

The first column contains tokens and structures, the other columns may contain part of speech, lemmas or other positional attributes. An example of a vertical file:

<doc genre="fiction" title="1984" author="G. Orwell">
<chapter no="1.1">
<p>
<s>
It       PP   it-d
was      VBD  be-v
a        DT   a-x
bright   JJ   bright-j
cold     JJ   cold-j
day      NN   day-n
in       IN   in-i
April    NP   April-n
<g/>
,        ,    ,-x
and      CC   and-c
the      DT   the-x
clocks   NNS  clock-n
were     VBD  be-v
striking JJ   striking-j
thirteen CD   thirteen-m
<g/>
.       SENT    .-x
</s>
...
</p>
...
</chapter>
</doc>

column 1: tokens and structures
column 2: part of speech tags
column 3: lempos attribute

See more details on how to prepare a text for the vertical format.