CzechParl: Corpus of Stenographic Protocols from Czech Parliament

The Corpus of Stenographic Protocols from Czech Parliament (CzechParl) is a language corpus built from stenographic protocols recorded during plenary meetings of the Czech parliament in its modern era from 1993 to 2012.

The corpus contains the language of politicians from the regular meeting of the Parliament of the Czech Republic. Users can search texts of a specific member of the parliament or a year, date as well as a certain role of the spokesperson.

Part-of-speech tagset

The CzechParl corpus was annotated by the morphological analyser MAJKA using the following POS tagset legend. After that, there was applicated disambiagutor DESAMB.

role – a role of the talking person (e.g. Poslanec)

name – a name of the talking person

date – a date of the meeting

year – a year of the meeting

Tools to work with the CzechParl corpus

A complete set of Sketch Engine tools is available to work with this CzechParl: Corpus of Stenographic Protocols from Czech Parliament to generate:

  • word sketch – Czech collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of Czech nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • keywords– terminology extraction of one-word
  • text type analysis – statistics of metadata in the corpus

2013 29th of November

  • added years 2011, 2012
  • fixed dates of Senat documents

2012 25th of December

  • remove stenographic documents of the Slovak National Council
  • retagged

2010 autumn

  • first version

Jakubíček, Miloš, and Vojtěch Kovář. “Czechparl: Corpus of stenographic protocols from czech parliament.” RASLAN 2010 Recent Advances in Slavonic Natural Language Processing (2010): 41.

Search the CzechParl corpus

Sketch Engine offers a range of tools to work with the Corpus of Stenographic Protocols from Czech Parliament.

Other text corpora in Sketch Engine

Sketch Engine offers 800+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.