Internet-ZH is a Chinese web corpus collected by Serge Sharoff. It is also available on his site at Leeds University, UK. It was tokenised and part-of-speech tagged using tools from North Eastern University, China.