The Polish Web Corpus contains 103 million words and is encoded in UTF-8. It was compiled from web pages retrieved using Google queries containing the most frequent Polish words. The corpus was tagged with Morfeusz and TaKIPI.

Changelog

v2.0 (25 May 2011)

  • Fixed document metadata. Previously, the same metadata was displayed for the whole corpus.