Arabic corpus of the Quran
The Quran annotated corpus is an Arabic corpus built up from the Quran, the central religious text of Islam. This Quran corpus version was prepared by Zainab Alqassem (Alqassem 2013). The data was taken from the Quranic Arabic Corpus (Dukes 2009) and the QurAna anaphoric coreference database (Sharaf and Atwell 2012). Corpus texts were lemmatized and POS tagged.
Part-of-speech tagset
The morphological annotation in the corpus uses POS tagset specially created for the Quranic Arabic Corpus.
Tools to work with the Quranic Arabic corpus
A complete set of tools is available to work with this Arabic corpus to generate:
- word sketch – Arabic collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- word lists – lists of Arabic nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- keywords– terminology extraction of one-word and multi-word units
- text type analysis – statistics of metadata in the corpus
Changelog
version 1 (7th May 2013)
- initial version
Bibliography
The Quran annotated corpus
Zainab, A. (2013). Unifying Quranic analyses into a single database. BSc Final Year Project Dissertation, School of Computing, University of Leeds.
The Quranic Arabic Corpus
Dukes, K. (2009). The Quranic Arabic Corpus. Leaman, Oliver.
QurAna: Corpus of the Quran annotated with Pronominal Anaphora
Sharaf, A. B. M., & Atwell, E. (2012). QurAna: Corpus of the Quran annotated with Pronominal Anaphora. In LREC (pp. 130-137).
Search the Quran corpus with annotation
Sketch Engine offers a range of tools to work with this Quranic Arabic Corpus.
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extract. Use our Quick Start Guide to learn it in minutes.