Dutch SoNaR corpus search

SoNaR: Dutch reference corpus

The SoNaR corpus (Stevin Nederlandstalig Referentie corpus) is a Dutch reference corpus consists of 500 million tokens. The corpus is balanced for research on the contemporary (1954–2011) written Dutch language. There is also balance in view of the number of speakers in Dutch-speaking regions, one-third of the texts coming from Flanders, and two-thirds from the Netherlands. Corpus texts are comprised of newspapers, reports, etc. as well as chat, SMS, internet fora and email.

More information about the SoNaR corpus can be found at https://www.lt3.ugent.be/projects/sonar/

Part-of-speech tagset

This corpus was POS tagged with the TreeTagger tool using the following Dutch tagset legend.

Tools to work with the SoNaR corpus

A complete set of Sketch Engine tools is available to work with this Dutch SoNaR to generate:

word sketch – Dutch collocations categorized by grammatical relations
thesaurus – synonyms and similar words for every word
word lists – lists of Dutch nouns, verbs, adjectives etc. organized by frequency
n-grams – frequency list of multi-word units
concordance – examples in context
keywords– terminology extraction of one-word and multi-word units
text type analysis – statistics of metadata in the corpus

Access policy

To get access to this corpus, please contact the service desk of The Dutch Language Institute
at servicedesk@ivdnt.org

Bibliography

Nelleke Oostdijk , Martin Reynaert, Véronique Hoste, Ineke Schuurman. The construction of a 500-million-word reference corpus of contemporary written Dutch. In Essential speech and language technology for Dutch, pp. 219-247, 2013.

Search the Dutch SoNaR corpus

Sketch Engine offers a range of tools to work with the STEVIN Nederlandstalig Referentie corpus.

open in Sketch Engine

about Sketch Engine

Other text corpora in Sketch Engine

Sketch Engine offers 800+ language corpora.

corpora in Sketch Engine

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.

Quick Start Guide

SoNaR: Dutch reference corpus

Part-of-speech tagset

Tools to work with the SoNaR corpus

Access policy

Search the Dutch SoNaR corpus

Other text corpora in Sketch Engine

Use Sketch Engine in minutes

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine