Corpora are a good starting point for collecting historical texts. You can upload your texts in various formats (TXT, PDF, DOC, etc.) to create a corpus from all your files or use our tool for building corpora from the web, e.g. downloading specific websites containing historical texts or books. Corpora can be divided into smaller parts called subcorpora which allows you to work with only specific parts of the whole corpus, i.e. texts from a specific time period or texts of only one author, genre and the like.

Historical corpora:

Sketch Engine is also being used in the ChartEx project which applies text mining methods to medieval Latin charters. It will make the corpora publicly available through Sketch Engine as the project proceeds.


