Corpus of Chinese Wikipedia
The Chinese Wikipedia corpus is a Chinese corpus created from the Chinese internet encyclopedia Wikipedia in 2012. For the building corpus was used Wikipedia dump (from April 2014). The corpus was segmented by Stanford Word Segmenter. Later tagged with Stanford Tagger using a model trained on a combination of Chinese Treebank texts from Chinese and Hong Kong sources.
POS tags are based on Chinese Penn TreeBank tagset.
A complete set of is available to work with this Chinese corpus to generate:
Search the Chinese corpus from Wikipedia
Sketch Engine offers a range of tools to work with this Chinese corpus.
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.