CAEC: Cambridge Academic English Corpus

The Cambridge Academic English Corpus (CAEC) is an Academic English corpus made up of a sample of texts collected from a written and spoken academic language at undergraduate and post-graduate level from a range of US and UK institutions. The texts in this Academic English corpus are composed of lectures, seminars, student presentations, journals, essays and textbooks.

Part-of-speech tagset

This Academi English corpus was tagged by TreeTagger using Penn TreeBank tagset with Sketch Engine modifications.

Tools to work with the Cambridge Academic English Corpus

A complete set of tools is available to work with this Academic English corpus to generate:

  • word sketch – English collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • keywords– terminology extraction of one-word units
  • text type analysis – statistics of metadata in the corpus

Search the CAEC corpus

Sketch Engine offers a range of tools to work with this Cambridge Academic English Corpus.

English Trends corpus

Explore our largest English corpus, which totals over 80 billion words and grows automatically every week.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.