EcoLexicon: English corpus of the Environment domain

The EcoLexicon English Corpus (EEC) is a text corpus of contemporary environmental texts. It consists of 23.1 million words and was prepared by the LexiCon Research Group at the University of Granada for the purpose of the development of EcoLexicon, a terminological knowledge base on the environment. This corpus is available with POS tags, lemmas and lempos.

Word sketches

A word sketch is a tool enabling users to explore word’s grammatical and collocational behaviour. The EEC employs word sketches based on the Sketch Grammar for English developed by León Araúz, San Martín and Faber (2016).

Text types

Text types contain information about corpus texts, e.g. domain (zoology, ecology) or language variety (British English). Thanks to these parameters, users can narrow their search:

  • Domain: the EEC encompasses all the domains and subdomains of environmental studies (e.g. biology, meteorology, ecology, environmental engineering, environmental law, etc.).
  • User: the EEC includes texts for three different types of user (depending on their level of expertise): expert, semi-expert, general public.
  • Geographical variant: the EEC comprises American, British, and Euro English.
  • Genre: the EEC covers a wide variety of text genres: journal articles, books, websites, lexicographical material, etc.
  • Editor: the EEC distinguishes texts edited by scholars/researchers, businesses, government bodies, etc.
  • Year: the EEC includes texts from 1973 to 2016.
  • Country: the EEC texts are tagged according to the country of publication.

Part-of-speech tagset

The EcoLexicon were tagged by TreeTagger using the Penn TreeBank tagset with Sketch Engine modifications.

Tools to work with the EcoLexicon corpus.

A complete set of Sketch Engine tools is available to work with this English corpus of environmental texts to generate:

  • word sketch – English collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • keywords – terminology extraction of one-word and multi-word units
  • word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

Corpus

Faber, P., León Araúz, P. & Reimerink, A. (2014) Representing environmental knowledge in EcoLexicon. In Languages for Specific Purposes in the Digital Era. Educational Linguistics, 19:267-301. Springer.

Faber, P., León-Araúz, P. & Reimerink, A. (2016) EcoLexicon: new features and challenges. In GLOBALEX 2016: Lexicographic Resources for Human Language Technology in conjunction with the 10th edition of the Language Resources and Evaluation Conference, edited by Kernerman, I., Kosem Trojina, I., Krek, S. & Trap-Jensen, L., pages 73-80. Portorož.

Word sketches

León-Araúz, P., San Martín, A. & Faber, P. (2016) Pattern-based Word Sketches for the Extraction of Semantic Relations. In Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016), pages 73-82. Osaka, Japan: COLING 2016.

Search the EcoLexicon corpus

The EcoLexicon English corpus is available to anybody. No login required.

Other text corpora in Sketch Engine

Sketch Engine offers 700+ language corpora.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extracting terms with Sketch Engine. Use our Quick Start Guide to learn it in minutes.