The Polish Web Corpus contains 103 million words and is encoded in UTF-8. It was compiled from web pages retrieved using Google queries containing the most frequent Polish words. The corpus was tagged with Morfeusz and TaKIPI.
Changelog
v2.0 (25 May 2011)
- Fixed document metadata. Previously, the same metadata was displayed for the whole corpus.




