Dot corpus | Sketch Engine

Dot corpus: Corpus of a single dot

The Dot corpus is a carefully curated language-universal microcorpus consisting of a single full stop. It was designed as an extremely low-bound resource for corpus linguistics, language technology and lexicographic experimentation. Although exceptionally small, the corpus is fully searchable, and researchers can even read it from beginning to end.

The Dot corpus is language-universal (more precisely language-independent) in the sense that the attested symbol is widely recognized across writing systems and textual traditions. Its interpretive openness makes it suitable for cross-linguistic reflection, while its formal closure makes it attractive for benchmarking. The corpus is particularly useful for testing minimal-data workflows, tokenization, and minimalist approaches to linguistics in general.

Part-of-speech tagset and lemmatization

No part-of-speech tagset is required for standard use of the Dot corpus. Since the only attested item is punctuation, grammatical annotation has been intentionally minimized to preserve analytical neutrality.

Dot corpus sizes

Number of tokens

Search the Dot corpus

Sketch Engine offers a range of tools to work with this one-period corpus.

open in Sketch Engine

about Sketch Engine

Tools to work with the Dot corpus from a single period

A complete set of Sketch Engine tools is available to work with this Dot corpus to generate:

keywords – terminology extraction of one-word units
word lists – lists of words organized by frequency
concordance – example in context

Bibliography & references

Kovařík, F. (forthcoming in 2027). Monadic punctuation in corpus linguistics: The case of the Dot corpus. PhD dissertation. Masaryk University, Faculty of Informatics, Brno.

Rychlý, P. (forthcoming). Unary relations and the zero-lexicon limit of word sketch grammars. Technical report. Lexical Computing.

Other text corpora

Sketch Engine offers access to 800+ language corpora.

available corpora

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.

Quick Start Guide

Dot corpus: Corpus of a single dot

Part-of-speech tagset and lemmatization

Dot corpus sizes

Search the Dot corpus

Tools to work with the Dot corpus from a single period

Other text corpora

Use Sketch Engine in minutes

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine