euWaC: Corpus of the Basque Web

The Basque Web Corpus (euWaC) is a Basque corpus made up of texts collected from the Internet. language corpus made up of texts collected from the Internet. The corpus was prepared by Dr. Igor Leturia <i.leturia(at)elhuyar.eus> in 2012. The size of the corpus is almost 100 million words.

Part-of-speech tagset

The Basque Web corpus is lemmatized and part-of-speech tagged with the following list of part-of-speech tags.

Tools to work with the Basque Web corpus

A complete set of Sketch Engine tools is available to work with this Basque Web corpus to generate:

  • word sketch – Basque collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • keywords – terminology extraction of one-word units
  • word lists – lists of Basque nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

version 3 (March 2017)

  • corpus tagged by the RFTagger tool with the NKJP tagset
  • created lempos

version 2 (1 July 2013)

  • corpus tagged by the WCRFT tagger

version 1 (23 July 2012)

  • initial version – 7.7 billion words, untagged

a sample for Cesar (25 October 2012)

TenTen corpora

Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlý, P., & Suchomel, V. (2013, July). The TenTen corpus family. In 7th International Corpus Linguistics Conference CL (pp. 125-127).

Suchomel, V., & Pomikálek, J. (2012). Efficient web crawling for large text corpora. In Proceedings of the seventh Web as Corpus Workshop (WAC7) (pp. 39-43).

Word Sketches

Radziszewski, A., Kilgarriff, A., & Lew, R. (2011). Polish word sketches.

Search the Basque corpus euWaC

Sketch Engine offers a range of tools to work with this Basque corpus from the web.

or

Other text corpora in Sketch Engine

Sketch Engine offers 700+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.