# Meaning of the Descriminating Feature % in the Solution Output under Descriptive and Descriminating Features

I would really appreciate if it is possible to provide any additional interpretation on the descriminating and descriptive %!
Thank you very much in advance for any, even very short, explanation!

I wonder what presicely does the percentage near discriptive and descriminating features in gCLUTO solution output mean? I found in Rasmussen and Karypis (2004) that the interpretation of the descriptve and discriminating features is that for each of these features "the precentage of the within cluster similarity and across cluster difference that these features account for, respectively" is displayed. Could you, please, explain what substantively these descriptive and descriminating % demonstrate?
Does it mean that based on the output below I could interpret the specific % as follows.
1. For word "loss"'s descriptive %: "the word "loss" explains 13.3% of similarity within cluster, that is the documents in this cluster are similar to each other based on the appearence of word "loss" in them in 13.3% cases?
2. For word "loss"'s discriminating %: "the word "loss" make this cluster different from any other cluster in 8.6% of all cases that this cluster could be different from any other clusters. That is, the average distances between the cluster containing this word, "loss" and all other clusters is in 8.6 % of all the cases is due to the presence of this word "loss" in this cluster"?
3. Could we say that the word in each cluster that has the largest descriptive percentage is the centroid vector of that cluster?

Descriptive & Descriminating Features
Cluster 0 Size: 95 ISim: 0.501 ESim: 0.066
Descriptive: loss 13.1% compens 7.1% suffer 6.7% latter 5.9%
Descriminating: loss 8.6% suffer 4.6% arm 3.1% emerg 3.0%

### RE: No quite... The 13.1% means

No quite...

The 13.1% means that if the average similarity between the documents in the cluster is X, then loss accounts for 13.1% of that similarity.

Similarly, if the difference between a cluster and the rest of the documents is Y, 8.6% of that difference is due to loss.

hope that helps.

Thank you.