lemma

Learn to understand attributes

Lemma is a positional attribute. It is the basic form of a word, typically the form found in dictionaries. A lemmatized corpus allows for searching for the basic form and include all forms of the word in the result, e.g. searching for lemma go will find go, goes, went, going, gone.

lemma	word forms
do	do, did, done, doing, does
long	long, longer, longest
be	am, is, are, was, were, being, been
knife	knife, knives
cup	cup, cups

Lemma in Sketch Engine is case sensitive so City and city are two different lemmas (City = the City of London; city = a common noun). The lemma of the first word of a sentence is always lowercased. Therefore, the search for lemma city will also find City but only in if City appears at the beginning of a sentence.

A wordlist of lemmas is a frequency list where all of go, went, gone, goes, going are counted together and listed as go.

A lemma search of go will find all of go, went, gone, goes, going.

The concept of the lemma is not always clearly defined and may differ between languages (or even between two corpora in the same language). For example, in Sketch Engine, many, more, most are three different lemmas in English. On the other hand, in Czech, the same adjective which is also irregular mnoho, více, nejvíce share the same lemma hodně.

The situation is even more complex with agglutinating languages such as Turkish, Hungarian or Japanese where it may not be easy to decide how many affixes should be removed to produce a lemma. Sometimes the stem is used instead of the lemma in these languages but a stem refers to the very core part of the word. The same stem may be shared by several lemmas. Therefore, lemma is preferred, although tools that produce correct lemmas in these languages are not always available or work with limited success.

In Sketch Engine, all corpora in the same language are processed using the same tools and therefore have the same lemmatization. Rare exceptions exist if the corpus was acquired from external sources and not developed by Sketch Engine.

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine