The lemma is the form of the word found in dictionaries, sometimes called the base form. Introducing lemmas makes it possible to treat different word forms of the word as the same word. This is especially useful with morhpologically rich languages, i.e. languges where lemmas can have many differet word forms (Spanish, French, Polish, Japanese, Turkish, Russian etc.).
The existence of the lemma makes it possible to type go and find go, goes, going, gone and went automatically. A wordlist generated on the lemma attribute will count the frequencies of go, goes, going, gone and went together and display them as one item: go. To find their individual frequencies, the lc attribue should be used.
The lemma preserves the original capitalization but, typically, the first word of a sentence will be lowercased.
In most languages (German is one of the exceptions), when a capitalized word is found in the middle of the sentence, the lemmatizer identifies it as unusual usage, possibly a brand name or proper noun, and will assign a lemma which is identical to the word form. Compare islands ⇢ island but Islands ⇢ Islands.
see also lemma