Environment Corpus – domain specific corpus
The English Environment Corpus is an English corpus made up of texts related to environment topic collected from the Internet. This domain specific corpus is focused on the English environment language. The corpus consists of 61 million words and was created by SpiderLing on 2011-11-29 using a topic word list supplied by MacMillan.
This Environment corpus has named entity recognition created by using tool GATE. The name entities include 5 classes: Date, Location, Money, Person and Organization.
Access to the Environment corpus is restricted. For more information, please contact us at email@example.com
The Environment corpus is tagged by TreeTagger using Penn TreeBank tagset with Sketch Engine modifications.