Clustering objects when column value are qualitative categories

I am trying to cluster objects based on several attributes, all of which are qualitative categories and not quantitative measures. E.g. A product has 2 attributes, (say) 'color' and 'family'. Color can take on 3 values R/G/B and family can take on 2 values A/B

Assigning a number to each color may not be the best thing to do e.g. R=1, G=2, B=3 seems to suggest that G is more similar to B than R is to B. How does Cluto handle such situations?

I have been thinking if I should represent color as a 3 columns:
Color_R (0 or 1), Color_G (0 or 1) and Color_B(0 or 1) and represent R as 1 0 0

Seems to work on some trial data sets..is there a better way to do this?

Thanks
Ananth

Submitted by Ananth on Tue, 2007-07-31 06:37

RE: Ananth, The approach that

Ananth,

The approach that you have taken (i.e., use different columns for each value) sounds reasonable and is probably the best way to handle such attributes.

Submitted by karypis on Tue, 2007-08-07 02:54.

Navigation Menu

Clustering objects when column value are qualitative categories

RE: Ananth, The approach that