Corpus of Protocols from Czech Parliament

CzechParl: Corpus of Stenographic Protocols from Czech Parliament

The Corpus of Stenographic Protocols from Czech Parliament (CzechParl) is a language corpus built from stenographic protocols recorded during plenary meetings of the Czech parliament in its modern era from 1993 to 2012.

The corpus contains the language of politicians from the regular meeting of the Parliament of the Czech Republic. Users can search texts of a specific member of the parliament or a year, date as well as a certain role of the spokesperson.

Part-of-speech tagset

The CzechParl corpus was annotated by the morphological analyser MAJKA using the following POS tagset legend. After that, there was applicated disambiagutor DESAMB.

A list of corpus structures

– role – a role of the talking person (e.g. Poslanec)

– name – a name of the talking person

– date – a date of the meeting

– year – a year of the meeting

Tools to work with the CzechParl corpus

A complete set of Sketch Engine tools is available to work with this CzechParl: Corpus of Stenographic Protocols from Czech Parliament to generate:

word sketch – Czech collocations categorized by grammatical relations
thesaurus – synonyms and similar words for every word
word lists – lists of Czech nouns, verbs, adjectives etc. organized by frequency
n-grams – frequency list of multi-word units
concordance – examples in context
keywords– terminology extraction of one-word
text type analysis – statistics of metadata in the corpus

Changelog

2013 29th of November

added years 2011, 2012
fixed dates of Senat documents

2012 25th of December

remove stenographic documents of the Slovak National Council
retagged

2010 autumn

first version

Bibliography

Jakubíček, Miloš, and Vojtěch Kovář. “Czechparl: Corpus of stenographic protocols from czech parliament.” RASLAN 2010 Recent Advances in Slavonic Natural Language Processing (2010): 41.

Search the CzechParl corpus

Sketch Engine offers a range of tools to work with the Corpus of Stenographic Protocols from Czech Parliament.

open in Sketch Engine

about Sketch Engine

Other text corpora in Sketch Engine

Sketch Engine offers 800+ language corpora.

corpora in Sketch Engine

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.

Quick Start Guide

CzechParl: Corpus of Stenographic Protocols from Czech Parliament

Part-of-speech tagset

Tools to work with the CzechParl corpus

2013 29th of November

2012 25th of December

2010 autumn

Search the CzechParl corpus

Other text corpora in Sketch Engine

Use Sketch Engine in minutes

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine