KSUCCA: King Saud University Corpus of Classical Arabic
The King Saud University Corpus of Classical Arabic (KSUCCA) is a language corpus made up of Classical
Arabic texts dating between the 7th and early 11th century. The corpus consists of 46 million words and was created as the part of Ph.D. work of Maha Alrabiah, find out more here. The corpus contains texts from a wide range of genres, such as Religion, Linguistics, Literature, Science, Sociology, and Biography; including division into subgenres.
Texts were lemmatised and POS tagged by Yonatan Belinkov using the MADA tools from the University of Columbia. See the POS tagset description.