Arabic corpus of the Quran

The Quran annotated corpus is an Arabic corpus built up from the Quran, the central religious text of Islam. This Quran corpus version was prepared by Zainab Alqassem (Alqassem 2013). The data was taken from the Quranic Arabic Corpus (Dukes 2009) and the QurAna anaphoric coreference database (Sharaf and Atwell 2012). Corpus texts were lemmatized and POS tagged.

Part-of-speech tagset

The morphological annotation in the corpus uses POS tagset specially created for the Quranic Arabic Corpus.

Tools to work with the Quranic Arabic corpus

A complete set of tools is available to work with this Arabic corpus to generate:

  • word sketch – Arabic collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of Arabic nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • keywords– terminology extraction of one-word and multi-word units
  • text type analysis – statistics of metadata in the corpus

version 1 (7th May 2013)

  • initial version

The Quran annotated corpus

Zainab, A. (2013). Unifying Quranic analyses into a single database. BSc Final Year Project Dissertation, School of Computing, University of Leeds.

The Quranic Arabic Corpus

Dukes, K. (2009). The Quranic Arabic Corpus. Leaman, Oliver.

QurAna: Corpus of the Quran annotated with Pronominal Anaphora

Sharaf, A. B. M., & Atwell, E. (2012). QurAna: Corpus of the Quran annotated with Pronominal Anaphora. In LREC (pp. 130-137).

Search the Quran corpus with annotation

Sketch Engine offers a range of tools to work with this Quranic Arabic Corpus.


Other Arabic corpora

Explore the largest Arabic corpus with 7.4 billion words

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract. Use our Quick Start Guide to learn it in minutes.