EcoLexicon: English corpus of the Environment domain
The EcoLexicon English Corpus (EEC) is a text corpus of contemporary environmental texts. It consists of 23.1 million words and was prepared by the LexiCon Research Group at the University of Granada for the purpose of the development of EcoLexicon, a terminological knowledge base on the environment. This corpus is available with POS tags, lemmas and lempos.
A word sketch is a tool enabling users to explore word’s grammatical and collocational behaviour. The EEC employs word sketches based on the Sketch Grammar for English developed by León Araúz, San Martín and Faber (2016).
Text types contain information about corpus texts, e.g. domain (zoology, ecology) or language variety (British English). Thanks to these parameters, users can narrow their search:
A list of text types in the corpus
- Domain: the EEC encompasses all the domains and subdomains of environmental studies (e.g. biology, meteorology, ecology, environmental engineering, environmental law, etc.).
- User: the EEC includes texts for three different types of user (depending on their level of expertise): expert, semi-expert, general public.
- Geographical variant: the EEC comprises American, British, and Euro English.
- Genre: the EEC covers a wide variety of text genres: journal articles, books, websites, lexicographical material, etc.
- Editor: the EEC distinguishes texts edited by scholars/researchers, businesses, government bodies, etc.
- Year: the EEC includes texts from 1973 to 2016.
- Country: the EEC texts are tagged according to the country of publication.
The EcoLexicon were tagged by TreeTagger using the Penn TreeBank tagset with Sketch Engine modifications.
Tools to work with the EcoLexicon corpus.
A complete set of Sketch Engine tools is available to work with this English corpus of environmental texts to generate:
- word sketch – English collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- keywords – terminology extraction of one-word and multi-word units
- word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- text type analysis – statistics of metadata in the corpus
Faber, P., León Araúz, P. & Reimerink, A. (2014) Representing environmental knowledge in EcoLexicon. In Languages for Specific Purposes in the Digital Era. Educational Linguistics, 19:267-301. Springer.
Faber, P., León-Araúz, P. & Reimerink, A. (2016) EcoLexicon: new features and challenges. In GLOBALEX 2016: Lexicographic Resources for Human Language Technology in conjunction with the 10th edition of the Language Resources and Evaluation Conference, edited by Kernerman, I., Kosem Trojina, I., Krek, S. & Trap-Jensen, L., pages 73-80. Portorož.
León-Araúz, P., San Martín, A. & Reimerink, A. (2018) The EcoLexicon English Corpus as an open corpus in Sketch Engine. In Proceedings of the 18th EURALEX International Congress, edited by Čibej, J., Gorjanc, V., Kosem, I. & Krek, S., pages 893-901. Ljubljana: Euralex.
León-Araúz, P. & San Martín, A. (2018) The EcoLexicon Semantic Sketch Grammar: from Knowledge Patterns to Word Sketches. In Proceedings of the LREC 2018 Workshop “Globalex 2018 – Lexicography & WordNets”, edited by Kerneman, I. & Krek, S., pages 94-99. Miyazaki: Globalex.
León-Araúz, P., San Martín, A. & Faber, P. (2016) Pattern-based Word Sketches for the Extraction of Semantic Relations. In Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016), pages 73-82. Osaka, Japan: COLING 2016.
Search the EcoLexicon corpus
The EcoLexicon English corpus is available to anybody. No login required.
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extracting terms with Sketch Engine. Use our Quick Start Guide to learn it in minutes.