Clustering objects when column value are qualitative categories

I am trying to cluster objects based on several attributes, all of which are qualitative categories and not quantitative measures. E.g. A product has 2 attributes, (say) 'color' and 'family'. Color can take on 3 values R/G/B and family can take on 2 values A/B

Assigning a number to each color may not be the best thing to do e.g. R=1, G=2, B=3 seems to suggest that G is more similar to B than R is to B. How does Cluto handle such situations?

I have been thinking if I should represent color as a 3 columns:
Color_R (0 or 1), Color_G (0 or 1) and Color_B(0 or 1) and represent R as 1 0 0

Seems to work on some trial data there a better way to do this?


RE: Ananth, The approach that


The approach that you have taken (i.e., use different columns for each value) sounds reasonable and is probably the best way to handle such attributes.