The document frequency is the number of documents in which the word or phrase appears.
If the corpus has 100 documents and 2 documents contain the word
city:
document number 7 contains 17 instances of
city,
document number 31 contains 6 instances of
city,
the document frequency of
city is
2, because 2 documents contain the word.
It is not important how many documents the corpus contains or how many times the word appears in total.
The document frequency can be better suited for comparison in situations when the corpus contains a small number of documents with an extremely high frequency of particular words.
Relative document frequency (also relative DOCF) is the percentage of documents that contain the word or item. Similar to the relative frequency, it is used to compare document frequencies between corpora of different sizes.
see also
frequency
frequency per million
ARF
Statistics used in Sketch Engine