Clustering can be performed in Sketch Engine on
If the clustering option is selected then the similar words from the thesaurus are clustered according to their distributional similarity scores. The distributional similarity score is provided in section 3 of our documentation Statistics used in the Sketch Engine. The algorithm is a greedy and agglomerative. All pairs of words are listed in order of their distributional similarity. The sorted list is processed in decreasing order, merging a word into a cluster so far formed provided that the distributional similarity with it and any word in the cluster is greater than the specified threshold similarity and that this value is higher than the equivalent from the other clusters so far formed.
The collocates within a word sketch are clustered according to any such clusters from the distributional thesaurus that they appear in.
The third section in Statistics used in the Sketch Engine