ALC: Arabic Learner corpus

The Arabic Learner corpus (ALC) is a language corpus made up of texts written and spoken texts that belong to learners of Arabic in Saudi Arabia. All texts were gained in the years 2012–2013 and include 282732 words of 942 students from 67 nationalities.

See more on the project site:

Part-of-speech tagset

Texts were POS tagged using the Stanford parser with the following POS tagset description.


This Arabic corpus is accessible to all users with a Sketch Engine standard subscription, corpus texts are licensed under CC-BY NC 4.0 licence. The corpus is provided in Sketch Engine with permission of the author Abdullah Alfaifi.

Tools to work with the Arabic ALC corpus

A complete set of Sketch Engine tools is available to work with this Arabic Learner Corpus to generate:

  • word sketch – Arabic collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • keywords – terminology extraction of one-word and multi-word units
  • word lists – lists of Arabic nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

TenTen corpora

Alrabiah, M., Al-Salman, A., & Atwell, E. S. (2013). The design and construction of the 50 million words KSUCCA. In Proceedings of WACL’2 Second Workshop on Arabic Corpus Linguistics (pp. 5-8). The University of Leeds.

Search the Arabic Learner Corpus

Sketch Engine offers a range of tools to work with the Arabic Learner Corpus.

Other text corpora in Sketch Engine

Sketch Engine offers 800+ language corpora.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.