Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Oracle Data Mining
"szalas" <gszalach_at_hotmail.com> wrote in message
news:c9t52i$a2g$1_at_julia.coi.pw.edu.pl...
> Hi,
> I have question about Oracle implementation of data mining algorithm -
> O-Cluster.
> In the paper "O-Cluster: Scalable Clustering of Large High Dimensional
Data
> Sets" is said that algorithm chooses the best cutting plan in the
histogram
> using statistical test c2 :
> 2*(observed - expected)^2/expected > 3.843
> ,where
> observed - histogram count of the valley
> expected - average of the histogram counts of the valley and the lower
peak
> I have clustered example set of data.
> I found out using Data Mining Browser where cutting planes go through and
> used above-mentioned equation to calculate value of c2
> and I've never got value above 3.843
> Data Mining Browser shows that histogram counts are in the range <0,1> so
> how can it be possible to achieve value 3.843 using above-mentioned
> equation.
> I would be grateful if someone explains me what is going on
> Thanks in advance
> szalas
>
>
It has been years since I have done stats, but I think the equation you have
up there is for a Chi Squared distribution and not a uniform distribution
over the interval 0 and 1. These are two very different distributions.
Instead of relying on a Data Mining Browser and a paper, you need to
understand the fundamentals of what you are trying to do. What are you
trying to do?
Jim
Received on Sat Jun 05 2004 - 14:28:39 CDT