Corpora are a good starting point for collecting historical texts. You can upload your texts in various formats (TXT, PDF, DOC, etc.) to create a corpus from all your files or use our tool for building corpora from the web, e.g. downloading specific websites containing historical texts or books. Corpora can be divided into smaller parts called subcorpora which allows you to work with only specific parts of the whole corpus, i.e. texts from a specific time period or texts of only one author, genre and the like.

Historical corpora:

Sketch Engine is also being used in the ChartEx project which applies text mining methods to medieval Latin charters. It will make the corpora publicly available through Sketch Engine as the project proceeds.


Adam Kilgarriff, Miloš Husák and Robyn Woodrow (2012). The Sketch Engine as infrastructure for historical corpora. In Jeremy Jancsary (ed.). Empirical Methods in Natural Language Processing; Proceedings of the Conference on Natural Language Processing 2012, pp. 351–356

Barbara McGillivray and Adam Kilgarriff (2012). Tools for historical corpus research, and a corpus of Latin (presentation). In New Methods in Historical Corpus Linguistics 3, Germany, 2013, pp. 247–255.