Re: Oracle Data Mining

From: Jim Kennedy <kennedy-downwithspammersfamily_at_attbi.net>
Date: Sat, 05 Jun 2004 19:28:39 GMT
Message-ID: <BPowc.55009$Ly.14923@attbi_s01>

"szalas" <gszalach_at_hotmail.com> wrote in message news:c9t52i$a2g$1_at_julia.coi.pw.edu.pl...
> Hi,
> I have question about Oracle implementation of data mining algorithm -
> O-Cluster.
> In the paper "O-Cluster: Scalable Clustering of Large High Dimensional
Data
> Sets" is said that algorithm chooses the best cutting plan in the
histogram
> using statistical test c2 :
> 2*(observed - expected)^2/expected > 3.843
> ,where
> observed - histogram count of the valley
> expected - average of the histogram counts of the valley and the lower
peak
> I have clustered example set of data.
> I found out using Data Mining Browser where cutting planes go through and
> used above-mentioned equation to calculate value of c2
> and I've never got value above 3.843
> Data Mining Browser shows that histogram counts are in the range <0,1> so
> how can it be possible to achieve value 3.843 using above-mentioned
> equation.
> I would be grateful if someone explains me what is going on
> Thanks in advance
> szalas
>
>

It has been years since I have done stats, but I think the equation you have up there is for a Chi Squared distribution and not a uniform distribution over the interval 0 and 1. These are two very different distributions. Instead of relying on a Data Mining Browser and a paper, you need to understand the fundamentals of what you are trying to do. What are you trying to do?
Jim Received on Sat Jun 05 2004 - 14:28:39 CDT