Corpora are a good starting point as collection history texts. You can have all your data in one corpus with the help of WebBootCat.
Historical corpora:
- Corpus of English Dialogues 1560–1760 (English)
- Early English Books Online 1473–1820 (English)
- GerManC. A Historical Corpus of German Newspapers 1650–1800 (German)
- Penn Historical Corpora (English)
- Nineteenthcentury (English)
- Latin corpus (Latin)
Sketch Engine is also being used in the ChartEx project which is applying text mining methods to medieval Latin charters. It will make the corpora it prepares publicly available through Sketch Engine as the project proceeds.
Reference
Adam Kilgarriff, Miloš Husák and Robyn Woodrow (2012). The Sketch Engine as infrastructure for historical corpora. In Jeremy Jancsary (ed.). Empirical Methods in Natural Language Processing; Proceedings of the Conference on Natural Language Processing 2012, pp. 351–356
Barbara McGillivray and Adam Kilgarriff (2012). Tools for historical corpus research, and a corpus of Latin (presentation). In New Methods in Historical Corpus Linguistics 3, Germany, 2013, pp. 247–255.