document frequency

The document frequency is the number of documents in which a word or phrase appears irrespective of how many times. If the corpus has 100 documents and 2 documents contain the word city: document 1 contains 17 instances and document 2 contains 6 instances, the document frequency of city is 2. It is not important how many documents the corpus contains or how many times the word appears in total.

The document frequency can be better suited for comparison in situations when the corpus contains a small number of documents with an extremely high frequency of particular words.

