fyWaC: Corpus of the Frisian Web

The Frisian web corpus (fyWaC) is a West Frisian corpus made up of texts collected from the Internet. The corpus was prepared according to standards described in the document A Corpus Factory for Many Languages (Kilgarriff et al. at LREC 2010).

Texts were crawled by SpiderLing in August 2013 and comprised of 3 million words.

Tools to work with the West Frisian corpus

A complete set of tools is available to work with this Frisian corpus to generate:

  • word lists – lists of Frisian nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • keywords– terminology extraction of one-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

ınitial version (August 2013)

  • crawled 3 million words
  • no part-of-speech tagging

WaC corpora

BARONI, Marco, et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corporaLanguage resources and evaluation, 2009, 43.3: 209-226.

BARONI, Marco; KILGARRIFF, Adam. Large linguistically-processed web corpora for multiple languages. In: Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations. Association for Computational Linguistics, 2006, pp. 87–90.

Search the Frisian corpus

Sketch Engine offers a range of tools to work with this Frisian corpus from the web.

Other text corpora

Sketch Engine offers 800+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms. Use our Quick Start Guide to learn it in minutes.