lemma

Lemma is the basic form of a word, typically the form found in dictionaries. A lemmatized corpus allows for searching for a lemma so that the result include all forms of the word, e.g. searching for lemma go will find go, goes, went, going, gone, Go (if found at the beginning of sentences). Lemma in Sketch Engine is case sensitive so City and city are two different lemmas (City = the City of London; city = a common noun).

The concept of lemma is not always clearly defined and may differ between languages. Often there is no single definition for the language. For example, in Sketch Engine, many, more, most are three different lemmas in English. On the other hand, in Czech, the same adjective which is also irregular hodně, více, nejvíce share the same lemma hodně.

The situation is even more complex with agglutinating languages such as Turkish, Hungarian or Japanese where it may not be easy to decide how many affixes should be removed to produce a lemma. The term stem often replaces the term lemma but stem often refers to the very core part of the word while several lemmas may share the same stem.

See also lemma-lc or compare with word form.