The English Preposition Corpus (TPP) is a text corpus developed with the purpose of showing how prepositions are used in English. Each preposition in the corpus was annotated with additional information about the sense and context. For example, the user can compare the use of the preposition “in” followed by a place and followed by time or only find examples of in followed by a word describing a group of objects or people, e.g. in the team.
The Preposition Corpus comprises 3 corpora annotated according to the Pattern Dictionary of English Prepositions (PDEP) which describes the behavior of prepositions.
Using the corpus
A full set of Sketch Engine features is available to search the corpus.
The easiest way to observe the behavior of a preposition in English is to generate Word Sketch for the preposition. The corpus was compiled with a specific word sketch grammar to exploit the unique annotation.
note: Word Sketches for prepositions are only supported in this corpus. Generating a Word Sketch for a preposition in other corpora will produce empty results.
All standard searches can be used with this corpus. Complex searches are also available using CQL.
Use the visibility view options (above the concordance results) where you can select which Attributes will be displayed. Click the info line details (at the beginning of any concordance line), to display required References.
See the attached document to get a full understanding of the attributes and values used.
Annotation in detail
Each sentence contains only one annotated preposition, i.e. each sentence serves as an example of the use of one preposition. Other prepositions in the sentence are not annotated. Individual sentences with the sense of the preposition (linked to the PDEP sense description), the PDEP class and the PDEP subclass, and supersense tags for preposition complements and governors.
Structures and attributes in the corpus
Apart from the standard attributes as a POS tag or sentence tag, there are a few special attributes and structures related to prepositions.
- deprel – attribute with the distinct dependency relations, see the list of syntactic tags
- s.sense_label (sense label) – label restricting search to those sentences with a specific sense tag (note that if this restriction is used, it should be with a specified preposition, since numbering is generally the same for each preposition; also, all sense labels include parentheses and these need to be preceded by backslashes), e.g.
[lemma_lc="aboard"] within <s sense_label="1(1)" ></s>
- s.sense_desc (sense description) – contains a link to the PDEP pattern for the given sense
- compl.sst and gov.sst (supersense tags) – can be used to examine the occurrence of the various WordNet lexicographer file names within the corpora (values of the supersenses are generally examined after a search for complements or governors).
More information about corpus attributes can be found in the manual The Preposition Corpus in the Sketch Engine by Ken Litkowski.
Sentences were parsed using the Tratz parser (A Fast, Accurate, Non-Projective, Semantically-Enriched Parser, 2011) with output in the CoNLL-X format. This format consists of a line for each token in a sentence and a blank line is used to separate the sentences. Words in the corpus are lemmatized including the assigned lempos attribute.
As POS tagset was used the Penn Treebank tags with one special tag for proper adjectives:
|JJPROP||United, North, Royal|
Search the English Preposition corpus
Sketch Engine offers a range of tools to work with the English Preposition corpus.
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.