Lemma is the basic form of a word, typically the form found in dictionaries. A lemmatized corpus allows for searching for a lemma so that the result include all forms of the word, e.g. searching for lemma go
will find go
, gone, Go
(if found at the beginning of sentences). Lemma in Sketch Engine is case sensitive
are two different lemmas (City
= the City of London; city
= a common noun).
The concept of lemma is not always clearly defined and may differ between languages. Often there is no single definition for the language. For example, in Sketch Engine, many, more, most
are three different lemmas in English. On the other hand, in Czech, the same adjective which is also irregular hodně, více, nejvíce
share the same lemma hodně
The situation is even more complex with agglutinating languages such as Turkish, Hungarian or Japanese where it may not be easy to decide how many affixes should be removed to produce a lemma. The term stem often replaces the term lemma but stem often refers to the very core part of the word while several lemmas may share the same stem.
See also lemma-lc
or compare with word form.