# Interpreting treefile output

Hi,

I'm having a bit of trouble interpreting the third column of the treefile output which, according to the documentation, "is the change in the value of the criterion function achieved by combining the particular pair of clusters." I am using the I2 function, which I understand is maximized at each step of the clustering process, yet the values in the third column are all negative, and increasing in magnitude. A naive interpretation of this would be that the optimal number of clusters is one, since each additional cluster is reducing the value of the criterion function we are trying to maximize. Somehow this seems wrong, though.

What am I missing?

Thanks very much,

John

Submitted by flyfish72 on Tue, 2008-06-10 09:48

## RE: The maximum value of the

The maximum value of the criterion function is obtained when each document is on a cluster by itself. So a merge operation will decrease the value of the criterion function and this is why these numbers are negative.