since manatee 2.96

Searching for similar words with CQL

Use the tilde ~ to generate a thesaurus for the word and include the top N items into the query. For example, to find the verb chop followed by vegetables, use this:

 [lemma="chop"] []{0,3} ~"carrot-n" 

The query will generate a thesaurus for the word carrot and will search for the combination of chop and the top N items from the thesaurus for carrot. You can use the thesaurus to preview the words that will be included.

Note: Some corpora require the word to be inputted as lempos, others as lemma or word. Use the others if the first one does not work.

When no number is specified, the top N items will be determined automatically based on the frequency of the word in the corpus (10-base logarithm of the frequency of “word” in the corpus, i.e. frequency of 100 – 2 synonyms will be used,  1,000 – 3 synonyms etc.

To set the number of thesaurus items manually, use:

[lemma="chop"] []{0,3} ~15"carrot-n" 

Using the thesaurus with small corpora and low frequency words will not generate good quality synonyms. For a high-quality thesaurus, a large corpus and a word of a decent frequency is needed.