LatinISE: corpus of historical Latin
The LatinISE historical corpus is a text corpus collected from the LacusCurtius, Intratext and Musisque Deoque websites. The corpus texts consist of topics, e.g. literature, history, philosophy or poetry. The corpus contains also rich metadata containing information such as genre, title, century or specific date.
This Latin corpus was built by Barbara McGillivray.
Lemmatization and part-of-speech tagset
The texts were lemmatized with Dag Haug’s Latin morphological analyser and Quick Latin and POS tagged with TreeTagger, trained on the Index Thomisticus Treebank, the Latin Dependency Treebank and the Latin treebank of the Proiel Project.
The part-of-speech tagset is available here.