Caja corpus is a balanced corpus of Academic Journal Articles created by Iztok Kosem in 2010, for more information, see his PhD thesis (below). The corpus has 79 million words and consists of 13,116 articles from 28 different disciplines.


The access to the corpus is restricted and only for academic purposes. Contact Iztok Kosem <iztok.kosem(a)> <>to gain access to this corpus.</>


Kosem, Iztok. Designing a model for a corpus-driven dictionary of Academic English. PhD Thesis. Aston University, 2010.

Explore other domain corpora in Sketch Engine

See a list of corpora prepared from specific domains in Sketch Engine.