Assamese corpus from Wikipedia

The Assamese Wikipedia Corpus (aswiki) is an Assamese corpus made up of texts collected from the Assamese internet encyclopedia Wikipedia at the beginning of May 2023. The corpus consists of 2.5 million words. The Assamese language is an Indo-Aryan language spoken mainly in the north-eastern Indian state of Assam and it is also known as Asamiya.

Tools to work with the Assamese corpus

A complete set of tools is available to work with this Wikipedia Assamese corpus to generate:

Search the Assamese corpus

Sketch Engine offers a range of tools to work with this Asamiya corpus from Wikipedia.


Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.