HebrewGC: Hebrew General Corpus
The Hebrew General Corpus (HebrewGC) is a Hebrew corpus made up of newspaper texts collected. The corpus was donated by Prof. Ari Rappoport and Daphna Shezaf from the Computer Science and Engineering Department at the Hebrew University in Jerusalem. from the Internet. This Hebrew corpus consists of 150+ million words, but the texts were not deduplicated.
The heWaC corpus was tagged and uses the following Hebrew POS tagset summary.