• lc [ attribute ]

    (also referred to as word_lc, word lowercase or word form lowercase) is one of the positional attributes of each token in the corpus. The lc attribute is a lowercased version of the word attribute: John becomes john, Apple becomes apple, BE becomes be. The lc attribute makes the upper case and lowercase version of each token identical. Searching the lc attribute is used for case insensitive searching. see also word form lemma (lowercase)
  • learner corpus [ corpus-types ]

    A collection of texts produced by learners of a language used to study errors and mistakes made by learners of languages. Learner corpora in Sketch Engine can use both error and correction annotation. A special search interface is available to search by the former or the latter or both. see also Setting up a learner corpus
  • lemma [ attribute ]

    Lemma is the basic form of a word, typically the form found in dictionaries. A lemmatized corpus allows for searching for a lemma so that the result include all forms of the word, e.g. searching for lemma go will find go, goes, went, going, gone, Go (if found at the beginning of sentences). Lemma in Sketch Engine is case sensitive so City and city are two different lemmas (City = the City of London; city = a common noun). The concept of lemma is not always clearly defined and may differ between languages. Often there is no single definition for the language. For example, in Sketch Engine, many, more, most are three different lemmas in English. On the other hand, in Czech, the same adjective which is also irregular hodně, více, nejvíce share the same lemma hodně. The situation is even more complex with agglutinating languages such as Turkish, Hungarian or Japanese where it may not be easy to decide how many affixes should be removed to produce a lemma. The term stem often replaces the term lemma but stem often refers to the very core part of the word while several lemmas may share the same stem. See also lemma-lc or compare with word form.
  • lemma_lc [ attribute ]

    lemma-lc is a case insensitive lemma. All upper-case characters are converted to lowercase. apple and Apple is the same thing. see lemma
  • Lemmatization

    Lemmatization is a process of assigning a lemma to each word form in a corpus using an automatic tool called a lemmatizer. Lemmatization bring the benefit of searching for a base form of a word and getting all the derived forms in the result, e.g. searching for go will also find goes, went, gone, going. See also PoS tagger stemming
  • lempos [ attribute ]

    lempos is a combination of lemma and part of speech (pos) consisting of the lemma, hyphen and a one-letter abbreviation of the part of speech, eg. go-vhouse-n. The part of speech abbreviations differ between corpora. Lempos is case sensitive, house-n is different from House-n.  see also lempos_lc
  • lempos_lc [ attribute ]

    lempos_lc is a case insensitive counterpart of lempos. All uppercase letters are converted to lowercase, thus House-n becomes identical with house-n.
  • likelihood [ statistics ]

    a function of parameters of a statistical model, it plays a key role in statistical inference and is the basis for the log-likelihood function. see Statistics in Sketch Engine
  • log-likelihood [ statistics ]

    one of the functions used in computed statistics of Sketch Engine. It is the association measures based on the likelihood function, using in tests for significance (see the log-likelihood calculator and more details)
  • logDice [ statistics ]

    a statistic measure for identifying collocations. It expresses the typicality of the co-occurence of the node and the collocate. It is used in the word sketch feature and also when computing collocations from a concordance. It is only based on the frequency of the node and the collocate and the frequency of the whole collocation. logDice is not affected by the size of the corpus and, therefore, can be used to compare the scores between different corpora. logDice is the preferred option when working with large corpora.   see also logDice in Statistics used in Sketch Engine A Lexicographer-Friendly Association Score (paper) T-score MI score
  • Longest-commonest match

    The longest-commonest match is a concept coined by Adam Kilgarriff to name the most common realisation of a collocation, i.e. the chunk of language in which the collocation appears most frequently. The longest-commonest match is part of the word sketch result screen to facilitate the understanding of how the collocation typically behaves.
  • longtag [ attribute ]

    Longtag is a detailed part-of-speech tag which usually contains more information than tag. Some corpora have tags containing only basic information on parts of speech and also attribute longtags consist of detailed grammatical information such as case, number, gender, etc. The longtangs are available in Estonian corpus etTenTen or Turkis corpus trTenTen.