A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Chinese NEUCSP part-of-speech tagset is available in Chinese corpora annotated by the NEUCSP tagging tool developed by the Natural Language Processing Group at Northeastern University, China.

An example of a tag in the CQL concordance search box: [tag=”d”] searches for adverbs, e.g. 副词

Tag Description Example
n common noun 普通名词
nt temporal noun 时间名词
nd noun of locality (e.g. 下) 方位名词
nl location noun (e.g. 内地) 处所名词
nh personal name 人名
ns place name 地名
ni organization name 团体、机构、组织的专名
nz other proper nouns 其它专名
v verb 动词
a adjective 形容词
b distinguishing word (e.g. 主要) 区别词
d adverb 副词
m numeral 数词
q measure word 量词
r pronoun 代词
p preposition 介词
c conjunction 连词
e interjection 叹词
o onomatopoeic word 拟声词
u particles (e.g. 的,了) 助词
h prefix 前接成分
k suffix 后接成分
i habitual language 习用语
j abbreviation 简称
g alpha-numeric symbol 语素字
x non-language symbol 非语素字
wp punctuation 标点
ws string of symbols 字符串

Source: http://www.niutrans.com/niutrans/NiuTrans.html

Chinese corpora

Sketch Engine offers dozens of Chinese corpora.