LatinISE: corpus of historical Latin

The LatinISE historical corpus is a text corpus collected from the LacusCurtius, Intratext and Musisque Deoque websites. The corpus texts consist of topics, e.g. literature, history, philosophy or poetry. The corpus contains also rich metadata containing information such as genre, title, century or specific date.

This Latin corpus was built by Barbara McGillivray.

Lemmatization and part-of-speech tagset

The texts were lemmatized with Dag Haug’s Latin morphological analyser and  Quick Latin and POS tagged with TreeTagger, trained on the Index Thomisticus Treebank, the Latin Dependency Treebank and the Latin treebank of the Proiel Project.

The part-of-speech tagset is available here.

Available tools

A complete set of tools is available to work with this LatinISE corpus to generate:

  • word sketch – Latin collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of Latin nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

version 2 (October 2014)

  • part-of-speech tagging has been partially corrected (by Barbara McGillivray)
  • text cleaning
  • 10,9 million words

version 1 (2011)

  • initial size 11,3 million words

Barbara McGillivray and Adam Kilgarriff (2012). Tools for historical corpus research, and a corpus of Latin. In New Methods in Historical Corpus Linguistics 3, Germany, 2013, pp. 247–255

Bill Thayer (LacusCurtius), Nicola Mastidoro (IntraText), Linda Spinazzè (Musisque Deoque), Dag Haug (Latin morphological analyser and Latin treebank of the PROIEL project), Marco Passarotti (Index Thomisticus Treebank) and Perseus Project (Latin Dependency Treebank).

Search the Latin corpus

Sketch Engine offers a range of tools to work with this Latin corpus.

Other text corpora

Sketch Engine offers 400+ language corpora.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.