Cundeelee Wangka Stories parallel corpus

The Cundeelee Wangka Stories is a Cundeelee Wangka – English parallel corpus made up of texts provided by Goldfields Aboriginal Language Centre in Kalgoorlie, Australia. The corpus was created for the purpose of Lexicom workshop 2018. The Cundeelee Wangka language belongs to the aboriginal languages.

Part-of-speech tagset

The Cundeelee Wangka part of the corpus was tagged using Cundeelee Wangka POS tagset.

The English part of the corpus was tagged by TreeTagger using Penn TreeBank tagset with Sketch Engine modifications.

Access policy

To get access, please contact access Sue Hanson <>  from Goldfields Aboriginal Language Centre in Kalgoorlie, Australia  and provide a brief description of the purpose of your work. If you request will be accepted, please contact us at with including the confirmation from Sue Hanson and your username so that we could grant you access to this corpus.

Tools to work with the Cundeelee Wangka Stories corpus

A complete set of tools is available to work with this Cundeelee Wangka – English parallel corpus to generate:

  • keywords – terminology extraction of one-word and multi-word units
  • word lists – lists of nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

initial version (April 2018)

  • created corpus
  • part-of-speech tagged by researchers from Goldfields Aboriginal Language Centre

Search the Cundeelee Wangka Stories corpus

Sketch Engine offers a range of tools to work with this Cundeelee Wangka Stories corpus.

Other text corpora

Sketch Engine offers 800+ language corpora.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.