Each corpus in Sketch Engine has Corpus info, a page which contains detailed statistics, tag and structure overview of the corpus.

Accessing the screen

When you open a corpus, the Corpus info page is available via Corpus info link in the left menu or by clicking on the name of corpus next to the search box in the heading of the corpus search screen.

Descriptions of info tabs:

  • Counts – statistical information about the corpus (number of words, tokens, sentences etc.)
  • General info – information about language, encoding, last modification of the corpus and links to a tagset overview and corpus info page on our websites
  • Lexicon sizes – number of various words, tags, lemmas, and other specific attributes in the corpus
  • Tags legend – an overview of basic tags, click on tagset link to see the full tagset description
  • Lempos suffixes – a list of letters representing part of speeches and creating lempos by joining them to lemma with dash (e.g. “red-j” for adjective “red”)
  • Structures and attributes – a list of structures and their attributes (more detailed distribution) representing documents, sentences and other positioning tokens in the corpus. It can be searched with CQL. (e.g.  searches in all attributes rend of the structure poem)
  • Grammar relations – names and counts of relations used in word sketches

Corpus statistics and details