Clustering objects when column value are qualitative categories
I am trying to cluster objects based on several attributes, all of which are qualitative categories and not quantitative measures. E.g. A product has 2 attributes, (say) 'color' and 'family'. Color can take on 3 values R/G/B and family can take on 2 values A/B
Assigning a number to each color may not be the best thing to do e.g. R=1, G=2, B=3 seems to suggest that G is more similar to B than R is to B. How does Cluto handle such situations?
I have been thinking if I should represent color as a 3 columns:
Color_R (0 or 1), Color_G (0 or 1) and Color_B(0 or 1) and represent R as 1 0 0
Seems to work on some trial data sets..is there a better way to do this?
Thanks
Ananth
Submitted by Ananth on Tue, 2007-07-31 06:37
»
- Login to post comments
RE: Ananth, The approach that
Ananth,
The approach that you have taken (i.e., use different columns for each value) sounds reasonable and is probably the best way to handle such attributes.