CAEC: Cambridge Academic English Corpus

The Cambridge Academic English Corpus (CAEC) is an Academic English corpus made up of a sample of texts collected from a written and spoken academic language at undergraduate and post-graduate level from a range of US and UK institutions. The texts in this Academic English corpus are composed of lectures, seminars, student presentations, journals, essays and textbooks.

Part-of-speech tagset

This Academi English corpus was tagged by TreeTagger using Penn TreeBank tagset with Sketch Engine modifications.

Tools to work with the Cambridge Academic English Corpus

A complete set of tools is available to work with this Academic English corpus to generate:

  • word sketch – English collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • keywords– terminology extraction of one-word units
  • text type analysis – statistics of metadata in the corpus

Search the CAEC corpus

Sketch Engine offers a range of tools to work with this Cambridge Academic English Corpus.


Other text corpora

Sketch Engine offers 700+ language corpora.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.