Terms of Service (English) corpus

The Terms of Service (English) corpus is a legal corpus comprising terms and conditions from web hosting service providers in British English and Italian. It is designed to support legal translation research and training by addressing the scarcity of corpora focusing on private legal documents such as contracts and agreements. The corpus serves as a resource for legal translators, translation students, and scholars interested in comparative legal language and system-specific terminology. Developed with Sketch Engine, the corpus enables users to explore legal phraseology, detect system-based discrepancies, and extract native-like equivalents across languages. The corpus is particularly relevant in the post-Covid-19 digital economy, where legal clarity in online service agreements is increasingly vital. Empirical validation through classroom experiments confirms the corpus’s reliability in enhancing legal translation accuracy, raising awareness of legal system variation, and fostering advanced legal language proficiency.

Note: As of this moment, Sketch Engine offers only the English part of the corpus. It is available for free (no need for registration).

Part-of-speech tagset and lemmatization

The English Web corpora are part-of-speech tagged with the following English Penn Treebank tagset summary (with Sketch Engine modifications) indicating the part of speech and grammatical category. The corpus texts also contain lemmatization when each word form from the corpus is assigned to its base form (lemma).

Terms of Service corpus sizes

Frequency
Tokens 190+ thousand
Words 160+ thousand
Sentences 4+ thousand
Web pages 39

Search the English corpus Terms of Service

Sketch Engine offers a range of tools to work with this English corpus.

Tools to work with this English corpus

A complete set of Sketch Engine tools is available to work with this English corpus to generate:

  • word sketch – English collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • keywordsterminology extraction of one-word and multi-word units
  • word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

version tos_eng

  • available in Sketch Engine – June 2024

Giampieri, P. (2023). The use of comparable corpora on (general) terms and conditions as a pedagogical tool in translation training between English and Italian (Doctoral dissertation). https://www.um.edu.mt/library/oar/handle/123456789/119427

Other English corpora

Explore our largest English Trends corpus with 85+ billion words.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.