CoNLL format

CoNLL format is a specific format of the vertical file that represents a syntactic parse tree. In comparison with the vertical, there are extra columns describing the syntactic structure of words within the sentence, i.e. id, head, deprel. The number and position of these extra columns may vary depending on the specific CoNLL format.

  • id representing the positions of the current word (the 1st column)
  • head is the parent node id of the current word (the 5th column)
  • deprel contains the information about the relation by which the current node and parent node are connected (the 6th column)
1    Dropping    drop-v    VBG    14    advcl
2    down    down-x    RP    1    prt
3    abaft    abaft-i    IN    1    prep
4    the    the-x    DT    5    det
5    bridge    bridge-n    NN    3    pcomp
6    ,    ,-x    ,    14    punct
7    the    the-x    DT    9    det
8    first    first-j    JJ    9    amod
9    thing    thing-n    NN    14    subj
10    to    to-x    TO    11    infmark
11    come    come-v    VB    9    infmod
12    into    into-i    IN    11    prep
13    view    view-n    NN    12    pcomp
14    was    be-v    VBD    0    ROOT
15    the    the-x    DT    16    det
16    funnel    funnel-n    NN    14    arg1
17    .    .-x    .    14    punct

see also


building word sketches from parsed corpora