DOAJ corpora – Open Access Journals corpora
The Open Access Journals (OAJ) corpora are text corpora comprised of journals covering all areas of science, technology, medicine, social science, and humanities in dozens of languages.
The OAJ corpora contain rich metadata about journals, such as title, country, year of publication, etc. It is also possible to search by the keywords of articles.
Detailed information about Open Access Journals can be found on the original website Directory Open Acess Journals.
A list of OAJ corpora in Sketch Engine
- Open Access Journals (English) – 2.6 billion words
More languages will be available soon.
Part-of-speech tagset
OAJ corpora are POS tagged depending on language specifications.
Tools to work with the Open Access Journals corpus
A complete set of tools is available to work with this OAJ corpus to generate:
- word sketch – English collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
DOAJ English corpus in detail
The chart shows the distribution of the parts of speech in the DOAJ English corpus.
Further information about texts in the corpus
Basic information
Frequency* | |
Tokens | 3 350 |
Words | 2 663 |
Sentences | 123 |
Documents | 0.66 |
* the figures above are rounded to million
Metadata (Structures and attributes)
Metadata | Description | Example |
Authors | author‘s name | Wei Wang |
Country of journal | country of issue | US, |
Document id | document identification | 9999884fafc844958864f26e06a22373 |
Identifier | print ISSN and electronic ISSN | pissn:2078-0958;eissn:2078-0966 |
Journal languages | language of the journal | EN, English |
Journal number | number of the journal | 1 |
Journal publisher | publisher of the journal | Copernicus Publications |
Journal title | title of the journal | Mathematical Problems in Engineering |
Journal volume | volume of the journal | 7 |
Keywords | Keywords of the journal | climate change |
Last updated | Last modification | 2016-09-30T18:33:16Z |
Month | month of publication | 12 |
Subjects | Subjects of the document | Health Sciences |
Time stamp | Type of document | 2004-05-31T00:00:00Z |
Title | Name of the article | Sovereignty in Conflict |
Url | web address | http://www.ijpsonline.com/article.asp?issn=0250-474X |
Wordcount | Number of words in the document | 1081 |
Year of publication | year of publication | 2014 |
Copyright
Texts in DOAJ are published under Creative Commons (CC) license.
More information about the licensing can be found at https://doaj.org/publishers#licensing
Search the Open Access Journals corpus
Sketch Engine offers a range of tools to work with this English corpus.
or
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extracting terms. Use our Quick Start Guide to learn it in minutes.