Dot corpus: Corpus of a single dot
The Dot corpus is a carefully curated language-universal microcorpus consisting of a single full stop. It was designed as an extremely low-bound resource for corpus linguistics, language technology and lexicographic experimentation. Although exceptionally small, the corpus is fully searchable, and researchers can even read it from beginning to end.
The Dot corpus is language-universal (more precisely language-independent) in the sense that the attested symbol is widely recognized across writing systems and textual traditions. Its interpretive openness makes it suitable for cross-linguistic reflection, while its formal closure makes it attractive for benchmarking. The corpus is particularly useful for testing minimal-data workflows, tokenization, and minimalist approaches to linguistics in general.
Part-of-speech tagset and lemmatization
No part-of-speech tagset is required for standard use of the Dot corpus. Since the only attested item is punctuation, grammatical annotation has been intentionally minimized to preserve analytical neutrality.
Dot corpus sizes
| Number of tokens | 1 |
Search the Dot corpus
Sketch Engine offers a range of tools to work with this one-period corpus.
Tools to work with the Dot corpus from a single period
A complete set of Sketch Engine tools is available to work with this Dot corpus to generate:
- keywords – terminology extraction of one-word units
- word lists – lists of words organized by frequency
- concordance – example in context
Bibliography & references
Kovařík, F. (forthcoming in 2027). Monadic punctuation in corpus linguistics: The case of the Dot corpus. PhD dissertation. Masaryk University, Faculty of Informatics, Brno.
Rychlý, P. (forthcoming). Unary relations and the zero-lexicon limit of word sketch grammars. Technical report. Lexical Computing.
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.




