Afrikaans corpus from Wikipedia

The Afrikaans Wikipedia Corpus (afwiki) is an Afrikaans corpus made up of texts collected from Afrikaans internet encyclopedia Wikipedia in March 2018.  The corpus consists of 15 million words.

Tools to work with the Afrikaans corpus

A complete set of tools is available to work with this Wikipedia Afrikaans corpus to generate:

  • word lists – lists of Afrikaans words organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

Search the Afrikaans corpus

Sketch Engine offers a range of tools to work with this Afrikaans corpus from Wikipedia.


Your own Wikipedia corpora

We can build a Wikipedia corpus in any language for you. Please contact us.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.