Afrikaans corpus from Wikipedia
The Afrikaans Wikipedia Corpus (afwiki) is an Afrikaans corpus made up of texts collected from the Afrikaans internet encyclopedia Wikipedia in early October 2022. The corpus consists of 22 million words.
The Afrikaans corpus from Wikipedia has been tagged by NCHLT tagger (derived from HunPos) using the following tagset.
Tools to work with the Afrikaans corpus
A complete set of tools is available to work with this Wikipedia Afrikaans corpus to generate:
- word sketch – Afrikaans collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- keywords – terminology extraction of one-word and multi-word units
- word lists – lists of Afrikaans nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- text type analysis – statistics of metadata in the corpus
Search the Afrikaans corpus
Sketch Engine offers a range of tools to work with this Afrikaans corpus from Wikipedia.
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.