Guangwai-Lancaster Chinese Learner Corpus

A brief description

Guangwai-Lancaster Chinese Learner Corpus (CLC), a 1.2-million-word corpus of learner Mandarin Chinese, which is a result of the collaboration between Guangdong University of Foreign Studies and Lancaster University, represents a new addition to corpora of L2 Chinese.  The corpus has both a spoken (621,900 tokens, 48%) and a written (672,328 tokens, 52%) part and covers a variety of task types and topics. It is fully error tagged. It can be used to explore various theoretical and practical issues pertaining to the acquisition of Chinese as a foreign language.

Development team

Prof. Hai Xu (Guandong University of Foreign Studies)

Dr. Richard Xiao (Lancaster University)

Dr. Vaclav Brezina (Lancaster University)


Prof. Hai Xu:

Dr. Vaclav Brezina:

The funding for the corpus was obtained by Dr. Richard Xiao to whom the corpus is also dedicated.


The building of the corpus was supported by the British Academy IPM Scheme, Grant No. PM120462.

Other text corpora in Sketch Engine

Sketch Engine offers 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.