DOAJ corpora – Directory of Open Access Journals
The Directory of Open Access Journals (DOAJ) corpora are text corpora comprised of journals covering all areas of science, technology, medicine, social science, and humanities in dozens of languages.
The DOAJ corpora contain rich metadata about journals, such as title, country, year of publication, etc. It is also possible to search by the keywords of articles.
Detailed information about Open Access Journals can be found on the original website Directory Open Acess Journals.
A list of DOAJ corpora in Sketch Engine
- Directory of Open Access Journals ((DOAJ) – English – 2.6 billion words
DOAJ corpora are POS tagged depending on language specifications.
Tools to work with the Open Access Journals corpus
A complete set of tools is available to work with this OAJ corpus to generate:
- word sketch – English collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- keywords– terminology extraction of one-word
- text type analysis – statistics of metadata in the corpus
DOAJ English corpus in detail
The chart shows the distribution of the parts of speech in the DOAJ English corpus.
Further information about texts in the corpus
* the figures above are rounded to million
Metadata (Structures and attributes)
|Authors||author‘s name||Wei Wang|
|Country of journal||country of issue||US,|
|Document id||document identification||9999884fafc844958864f26e06a22373|
|Identifier||print ISSN and electronic ISSN||pissn:2078-0958;eissn:2078-0966|
|Journal languages||language of the journal||EN, English|
|Journal number||number of the journal||1|
|Journal publisher||publisher of the journal||Copernicus Publications|
|Journal title||title of the journal||Mathematical Problems in Engineering|
|Journal volume||volume of the journal||7|
|Keywords||Keywords of the journal||climate change|
|Last updated||Last modification||2016-09-30T18:33:16Z|
|Month||month of publication||12|
|Subjects||Subjects of the document||Health Sciences|
|Time stamp||Type of document||2004-05-31T00:00:00Z|
|Title||Name of the article||Sovereignty in Conflict|
|Year of publication||year of publication||2014|
Texts in DOAJ are published under Creative Commons (CC) license.
More information about the licensing can be found at https://doaj.org/publishers#licensing
Search the corpora of the Directory of Open Access Journals
Sketch Engine offers a range of tools to work with these DOAJ corpora.
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extracting terms. Use our Quick Start Guide to learn it in minutes.