Environment Corpus – domain specific corpus
The English Environment Corpus is an English corpus made up of texts related to environment topic collected from the Internet. This domain specific corpus is focused on the English environment language. The corpus consists of 61 million words and was created by SpiderLing on 2011-11-29 using a topic word list supplied by MacMillan.
This Environment corpus has named entity recognition created by using tool GATE. The name entities include 5 classes: Date, Location, Money, Person and Organization.
This Nynorsk corpus is available to all users with a regular subscription.
The Environment corpus is tagged by TreeTagger using Penn TreeBank tagset with Sketch Engine modifications.