Terms of Service (English) corpus
The Terms of Service (English) corpus is a legal corpus comprising terms and conditions from web hosting service providers in British English and Italian. It is designed to support legal translation research and training by addressing the scarcity of corpora focusing on private legal documents such as contracts and agreements. The corpus serves as a resource for legal translators, translation students, and scholars interested in comparative legal language and system-specific terminology. Developed with Sketch Engine, the corpus enables users to explore legal phraseology, detect system-based discrepancies, and extract native-like equivalents across languages. The corpus is particularly relevant in the post-Covid-19 digital economy, where legal clarity in online service agreements is increasingly vital. Empirical validation through classroom experiments confirms the corpus’s reliability in enhancing legal translation accuracy, raising awareness of legal system variation, and fostering advanced legal language proficiency.
Note: As of this moment, Sketch Engine offers only the English part of the corpus. It is available for free (no need for registration).
Part-of-speech tagset and lemmatization
The English Web corpora are part-of-speech tagged with the following English Penn Treebank tagset summary (with Sketch Engine modifications) indicating the part of speech and grammatical category. The corpus texts also contain lemmatization when each word form from the corpus is assigned to its base form (lemma).
Terms of Service corpus sizes
Frequency | |
Tokens | 190+ thousand |
Words | 160+ thousand |
Sentences | 4+ thousand |
Web pages | 39 |
Search the English corpus Terms of Service
Sketch Engine offers a range of tools to work with this English corpus.
Tools to work with this English corpus
A complete set of Sketch Engine tools is available to work with this English corpus to generate:
- word sketch – English collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- keywords – terminology extraction of one-word and multi-word units
- word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- text type analysis – statistics of metadata in the corpus
Changelog
version tos_eng
- available in Sketch Engine – June 2024
Bibliography
Giampieri, P. (2023). The use of comparable corpora on (general) terms and conditions as a pedagogical tool in translation training between English and Italian (Doctoral dissertation). https://www.um.edu.mt/library/oar/handle/123456789/119427
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.