Corpus of Academic Journal Articles corpus (CAJA) is an English balanced corpus of Academic Journal Articles created by Iztok Kosem in 2010. The corpus has 79 million words and consists of 13,116 articles from 28 different disciplines. For more information, see his PhD thesis in the reference section.


The access to the corpus is restricted and only for academic purposes. For gaining access, please contact Dr. Iztok Kosem  and then forward his answer to our support email <> so that we could grant you access to this corpus.


KOSEM, Iztok. Designing a model for a corpus-driven dictionary of Academic English. PhD Thesis. Aston University, 2010.

Explore other domain corpora in Sketch Engine

See a list of corpora prepared from specific domains in Sketch Engine.