The LatinISE corpus is a Latin text corpus collected from the following historical sources: LacusCurtius, Intratext and Musisque Deoque. The corpus texts consist of topics such as literature, history, philosophy or poetry. The corpus contains also rich metadata containing information such as genre, title, century or specific date.
This Latin corpus was built by Barbara McGillivray. Please cite the paper in the Bibliography section (below) when using this corpus.
Lemmatization and part-of-speech tagset
The texts were lemmatized using Dag Haug’s Latin morphological analyser and Quick Latin and POS tagged with TreeTagger, trained on the Index Thomisticus Treebank, Latin Dependency Treebank and Latin treebank of the Proiel Project.
The part-of-speech tagset for the LatinISE corpus is available here.