Corpus of Elsevier Open Access Journals
The Elsevier OA CC-BY Corpus is an English corpus consisting of 40,000 scientific research papers which are a representative sample from across scientific disciplines. The Elsevier corpus is comprised of open access articles with the CC-BY 4.0 (Creative Commons) license available in Elsevier journals of a Dutch publishing company specializing in scientific, technical, and medical content. These articles were published between 2014 and 2020.
The original data of the Elsevier OA CC-BY corpus have been prepared by Daniel Kershaw and Rob Koeling. More information about the corpus can be found in the Digital Commons (Elsevier) deposit.
The Elsevier Open Access Journals corpus is part-of-speech tagged by the TreeTagger part-of-speech tagset.
Elsevier OA CC-BY Corpus – year distribution
The English corpus of Elsevier Open Access Journals contains 40,000 scientific articles from 2014 to 2020.
Hover over the chart to display a number of tokens of the particular topic.
Search the Elsevier OA CC-BY Corpus
Sketch Engine offers a range of tools to work with this English corpus of Elsevier Journals.
Tools to work with the Elsevier OA CC-BY Corpus
A complete set of Sketch Engine tools is available to work with this English corpus of scientific papers to generate:
- word sketch – English collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- keywords – terminology extraction of one-word and multi-word units
- word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- text type analysis – statistics of metadata in the corpus
Citation & Reference
Kershaw, Daniel; Koeling, Rob (2020), “Elsevier OA CC-BY Corpus”, Mendeley Data, V1, doi: 10.17632/zm33cdndxs.1
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.