CzechParl: Corpus of Stenographic Protocols from Czech Parliament
The Corpus of Stenographic Protocols from Czech Parliament (CzechParl) is a language corpus built from stenographic protocols recorded during plenary meetings of the Czech parliament in its modern era from 1993 to 2012.
The corpus contains the language of politicians from the regular meeting of the Parliament of the Czech Republic. Users can search texts of a specific member of the parliament or a year, date as well as a certain role of the spokesperson.
The CzechParl corpus was annotated by the morphological analyser MAJKA using the following POS tagset legend. After that, there was applicated disambiagutor DESAMB.
A list of corpus structures
– role – a role of the talking person (e.g. Poslanec)
– name – a name of the talking person
– date – a date of the meeting
– year – a year of the meeting
Tools to work with the CzechParl corpus
A complete set of Sketch Engine tools is available to work with this CzechParl: Corpus of Stenographic Protocols from Czech Parliament to generate:
- word sketch – Czech collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- word lists – lists of Czech nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
2013 29th of November
- added years 2011, 2012
- fixed dates of Senat documents
2012 25th of December
- remove stenographic documents of the Slovak National Council
- first version
Jakubíček, Miloš, and Vojtěch Kovář. “Czechparl: Corpus of stenographic protocols from czech parliament.” RASLAN 2010 Recent Advances in Slavonic Natural Language Processing (2010): 41.
Search the CzechParl corpus
Sketch Engine offers a range of tools to work with the Corpus of Stenographic Protocols from Czech Parliament.
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.