Corpus of Classical Arabic (KSUCCA)

KSUCCA: King Saud University Corpus of Classical Arabic

The King Saud University Corpus of Classical Arabic (KSUCCA) is a language corpus made up of Classical Arabic texts dating between the 7th and early 11th centuries. The corpus consists of 46 million words and was created as part of the Ph.D. work of Maha Alrabiah, find out more here. The corpus contains texts from a wide range of genres, such as Religion, Linguistics, Literature, Science, Sociology, and Biography; including division into subgenres.

Part-of-speech tagset

Texts were lemmatized and POS tagged by Yonatan Belinkov using the MADA tools from the University of Columbia. See the POS tagset description.

Tools to work with the Arabic KSUCCA corpus

A complete set of Sketch Engine tools is available to work with this corpus of Classical Arabic to generate:

word sketch – Arabic collocations categorized by grammatical relations
thesaurus – synonyms and similar words for every word
word lists – lists of Arabic nouns, verbs, adjectives etc. organized by frequency
n-grams – frequency list of multi-word units
concordance – examples in context
keywords– terminology extraction of one-word
text type analysis – statistics of metadata in the corpus

Bibliography

Alrabiah, M., Al-Salman, A., & Atwell, E. S. (2013). The design and construction of the 50 million words KSUCCA. In Proceedings of WACL’2 Second Workshop on Arabic Corpus Linguistics (pp. 5-8). The University of Leeds.

Search the corpus of Classical Arabic

Sketch Engine offers a range of tools to work with the KSUCCA corpus.

open in Sketch Engine

about Sketch Engine

Other text corpora in Sketch Engine

Sketch Engine offers 800+ language corpora.

corpora in Sketch Engine

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.

Quick Start Guide

KSUCCA: King Saud University Corpus of Classical Arabic

Part-of-speech tagset

Tools to work with the Arabic KSUCCA corpus

Search the corpus of Classical Arabic

Other text corpora in Sketch Engine

Use Sketch Engine in minutes

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine