Cluto Segmentation fault

Hi
I'm trying to run vcluto with a matix size 425 and I have a segmentation fault when cluto is trying to print the output of the external quality measure. However i'm able to get the cluster solution file but I can not get the entropy and the purity of each cluster. If I run with matrix < 250 it works and I'm able to extract the entropy and purity. I'm running the version 2.1.2. My machine is a Linux.i686 32 bits.

./vcluto -rclassfile=music.mat.rlabel -showtree -labeltree -nfeatures=10 music.mat 10

is it possible to get the entropy and the purity of each of the cluster from a output file?. or what could be the reason that segmentation fault happen?

I'm attaching my input files music.tar.gz. It contains the source file (music), the music.mat, which is generated using doc2mat.

I really appreciate your help.

Xavier

RE: amazing

amazing

RE: Segmentation fault

Hi
I am trying to run vcluster on a unix machine with a matrix size of 6834 x 9 (6834 instances and 9 features). I am getting a segmentation fault error when I run the following command:

-bash-3.2$ vcluster -rclassfile=Cafe_Conferences-and-Academic-Travel.rclass Cafe_Conferences-and-Academic-Travel.mat -showfeatures 10

I get the following error upon execution:

********************************************************************************
vcluster (CLUTO 2.1.1) Copyright 2001-03, Regents of the University of Minnesota

Matrix Information -----------------------------------------------------------
Name: Cafe_Conferences-and-Academic-Travel.mat, #Rows: 6834, #Columns: 9, #NonZeros: 61506

Options ----------------------------------------------------------------------
CLMethod=RB, CRfun=I2, SimFun=Cosine, #Clusters: 10
RowModel=None, ColModel=None, GrModel=SY-DIR, NNbrs=40
Colprune=1.00, EdgePrune=-1.00, VtxPrune=-1.00, MinComponent=5
CSType=Best, AggloFrom=0, AggloCRFun=I2, NTrials=10, NIter=10

Solution ---------------------------------------------------------------------

-------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
Segmentation fault

I am able to get the cluster solution file but I can not get the entropy and the purity of each cluster due to this segmentation fault error. The vcluster runs perfectly with a smaller matrix holding 28 instances and 9 features giving me information on entropy and purity.

For example:
-bash-3.2$ vcluster -rclassfile=test1.rclass test1.mat 10
********************************************************************************
vcluster (CLUTO 2.1.1) Copyright 2001-03, Regents of the University of Minnesota

Matrix Information -----------------------------------------------------------
Name: test1.mat, #Rows: 28, #Columns: 9, #NonZeros: 252

Options ----------------------------------------------------------------------
CLMethod=RB, CRfun=I2, SimFun=Cosine, #Clusters: 10
RowModel=None, ColModel=None, GrModel=SY-DIR, NNbrs=40
Colprune=1.00, EdgePrune=-1.00, VtxPrune=-1.00, MinComponent=5
CSType=Best, AggloFrom=0, AggloCRFun=I2, NTrials=10, NIter=10

Solution ---------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10-way clustering: [I2=2.80e+01] [28 of 28], Entropy: 0.365, Purity: 0.357
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cid Size ISim ISdev ESim ESdev Entpy Purty | abuf anon Anon anon anon Arts beow bioe Bria Canu case Conj curl drdi dund Eagl Euro expa fara Fund gesu GoGo grad grup humm Indi
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 4 +1.000 +0.000 +0.848 +0.000 0.425 0.250 | 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1
1 2 +1.000 +0.000 +0.939 +0.001 0.213 0.500 | 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

I don't need the class information, all I need is the entropy and purity information on my clusters. Is there any other way to find this information? I would really appreciate if I can get any insight on this issue.

Thanks,
Sangita

RE: Cluto Segmentation fault

Hi

I'm sorry but I can not attach the my input data.

My data has the following format:

id data
2141 arts bands and artists b beatles fan pages
1701 arts composition composers contemporary p
796 arts composition composers
800 arts bands and artists g grateful dead chats and forums
1288 arts instruments squeezebox organizations
982 arts bands and artists 8
1629 arts bands and artists m morissette, alanis chats and forums
1601 arts vocal a cappella barbershop harmony groups male choruses a

I'm looking forward for your answer

Xavier

RE: Xavier, Please open an issue

Xavier,

Please open an issue with flyspray and upload your files.