Corpus of Hebrew translation texts
The Hebrew translation corpus, also known as Hebrew Comparable Corpus is a language corpus made up of translated and non-translated texts of the Hebrew language. There are about fifteen books (fiction and non-fiction) in each component. The two components are matched for topic and genre: for example, there is one biography in each. It is best suited for people who want to study differences between translated and non-translated language. It can also be used in order to study language use more generally.
The corpus was compiled as part of a project funded by the Israel Science Foundation and carried out in the Department of Translation and Interpreting Studies at Bar Ilan University.
Detailed information about the corpus
Part-of-speech tagset
The Hebrew translation corpus is POS tagged with using following part-of-speech tags.
Tools to work with the Hebrew Translational corpus
A complete set of Sketch Engine tools is available to work with this Hebrew Translational corpus to generate:
- word sketch – Hebrew collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- word lists – lists of Hebrew nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
A corpus attribute overview
A list of positional attributes used in the corpus
TAGGER OUTPUT | ABRV | VALUES PER TAG |
token | token | |
transliteration (token) | trans | |
lemma | lemma | |
transliteration (lemma) | transl | |
pos | tag | adjective adverb conjunction copula existential foreign interjection interrogative modal negation noun numberExpression numeral participle preposition pronoun properName punctuation quantifier title url verb wPrefix |
pos-type | postype | amount and arithmetic-operation bracket-end bracket-start colon comma coordinating demonstrative determiner dot exclamation-mark gematria hyphen impersonal literal-number numeral-cardinal numeral-fractional numeral-ordinal or other partitive personal proadverb prodet pronoun question-mark quote reflexive relativizing semicolon slash subordinating yesno |
prefix string | prestring | ב בכ ו וב ובכ וכ וכש וכשל ול ומ ומכ ומש וש ושב ושל ושמ כ ככ כש כשב כשל כשמ ל לכ לכש מ מכ מש משב משכ משל משמ ש שב שכ שכש שכשמ של שמ שמש |
base string | basestring | |
suffix string | sufstring | גם ה הם הן ו י ך כם כן ם ן נו |
gender | gender | feminine masculine masculine-and-feminine |
number | number | dual dual-and-plural plural singular singular-and-plural |
status | status | absolute construct |
polarity | polarity | negative positive |
person | person | 1 2 3 any |
tense | tense | beinoni future imperative infinitive past |
binyan | binyan | Hifil Hitpael Hufal Nifal Paal Piel Pual |
prefix conjunction | prefconj | conjunction |
prefix definite article | prefdefinite | definiteArticle |
prefix interrogative | prefinterrog | |
prefix preposition | prefprep | preposition |
prefix subordination conjunction / relativizer | relativizer | relativizer/subordinatingConjunction |
prefix temporal subordinating conjunction | preftemp | temporalSubConj |
prefix adverb | prefadv | adverb |
suffix function | suffunction | accusative-or-nominative possessive pronomial |
suffix number | sufnum | feminine masculine masculine-and-feminine |
suffix gender | sufgender | plural singular |
suffix person | sufper | 1 2 3 |
Search the Hebrew Translation corpus
Sketch Engine offers a range of tools to work with the Hebrew Translation corpus.
or
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.