BFM 2022: Corpus of Old French and Middle French

The BFM 2022 (Base de français médiéval 2022) is a historical corpus of French medieval texts written in Old French and Middle French. The Texts were written between the 9th and the end of the 15th centuries (nearly 6 million words).

The texts include a set of metadata of different types: bibliographic data (title, author, scientific publisher, etc.), date of composition of the texts, form of the texts (verse/prose), dialects, genres, etc.

Part-of-speech tagset

The BFM corpus was annotated with the FreeLing tagger (tagset) with Sketch Engine modifications. The corpus texts also contain lemmatization when each word form from the corpus is assigned to its base form (lemma).

The authors

The BFM corpus is developed within the IHRIM laboratory. The authors are Céline GUILLOT-BARBANCE, Serge HEIDEN, Alexei LAVRENTIEV, and others.

For more information, please refer to the official website: https://txm-bfm.huma-num.fr/txm/?command=documentation&path=/BFM2022

Content of the BFM corpus

This corpus of Old and Middle French comprises 211 medieval texts, totaling 219 documents in the original database. The remaining 8 documents were excluded due to academic licensing constraints.

BFM corpus sizes

Tokens 6,863,246
Words 6,002,552
Sentences 207,042
Documents 211

Search the French BFM corpus

Sketch Engine offers a range of tools to work with this corpus Old and Middle French.

Tools to work with the BFM corpus

A complete set of Sketch Engine tools is available to work with this Old French corpus to generate:

  • word sketch – French collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • keywords – terminology extraction of one-word and multi-word units
  • word lists – lists of French nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • trends – diachronic analysis automatically identifies neologisms and changes in use
  • text type analysis – statistics of metadata in the corpus

BFM 2022 (December 2023)

  • initial version bfm2022 processed by Freeling pipeline version 3
  • Guillot-Barbance, Céline, Heiden, Serge and Lavrentiev, Alexei (2017), “Base of medieval French: an open and free reference base of medieval sources at the service of the scientific community”, Diachroniques, n 7 , pp.168- 184. 〈halshs-01809581〉https://halshs.archives-ouvertes.fr/halshs-01809581

Other French corpora

Explore other French corpora available via Sketch Engine.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.