EcoLexicon: English corpus of the Environment domain
The EcoLexicon English Corpus (EEC) is a text corpus of contemporary environmental texts. It consists of 23.1 million words and was prepared by the LexiCon Research Group at the University of Granada for the purpose of the development of EcoLexicon, a terminological knowledge base on the environment. This corpus is available with POS tags, lemmas and lempos.
A word sketch is a tool enabling users to explore word’s grammatical and collocational behaviour. The EEC employs word sketches based on the Sketch Grammar for English developed by León Araúz, San Martín and Faber (2016).
Text types contain information about corpus texts, e.g. domain (zoology, ecology) or language variety (British English). Thanks to these parameters, users can narrow their search:
A list of text types in the corpus
- Domain: the EEC encompasses all the domains and subdomains of environmental studies (e.g. biology, meteorology, ecology, environmental engineering, environmental law, etc.).
- User: the EEC includes texts for three different types of user (depending on their level of expertise): expert, semi-expert, general public.
- Geographical variant: the EEC comprises American, British, and Euro English.
- Genre: the EEC covers a wide variety of text genres: journal articles, books, websites, lexicographical material, etc.
- Editor: the EEC distinguishes texts edited by scholars/researchers, businesses, government bodies, etc.
- Year: the EEC includes texts from 1973 to 2016.
- Country: the EEC texts are tagged according to the country of publication.
The EcoLexicon were tagged by TreeTagger using the Penn TreeBank tagset with Sketch Engine modifications.
Faber, P., León Araúz, P. & Reimerink, A. (2014) Representing environmental knowledge in EcoLexicon. In Languages for Specific Purposes in the Digital Era. Educational Linguistics, 19:267-301. Springer.
Faber, P., León-Araúz, P. & Reimerink, A. (2016) EcoLexicon: new features and challenges. In GLOBALEX 2016: Lexicographic Resources for Human Language Technology in conjunction with the 10th edition of the Language Resources and Evaluation Conference, edited by Kernerman, I., Kosem Trojina, I., Krek, S. & Trap-Jensen, L., pages 73-80. Portorož.
León-Araúz, P., San Martín, A. & Faber, P. (2016) Pattern-based Word Sketches for the Extraction of Semantic Relations. In Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016), pages 73-82. Osaka, Japan: COLING 2016.
Search the EcoLexicon English corpus
Sketch Engine offers a range of tools to work with the EcoLexicon corpus.
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.