Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Histogram/Bucketizing resultset

Re: Histogram/Bucketizing resultset

From: Mark Townsend <markbtownsend_at_attbi.com>
Date: Fri, 01 Nov 2002 04:24:05 GMT
Message-ID: <B9E741D5.C3D%markbtownsend@attbi.com>


in article 499d4ffb.0210310936.731aa87d_at_posting.google.com, NoSpamPlease at googleposer_at_yahoo.com wrote on 10/31/02 9:36 AM:

> Thx Mark
> My bad - I framed the question incorrectly
> 
> Assume that the lower bound and the upper bound is not preset
> I need to do this in 1 single query
> Hence 
> LB = minimum of the attribute in the table
> UB = max of the attribute in the table
> Bucketsize = UB-LB/10 <- 10 is -need 10 buckets
> Age

>>> 27
>>> 24
>>> 34
>>> 56
>>> 40
>>> 76
>>> 30
>>> 32
>>> 67
>>> 71
> 
> LB = 24
> UB = 76
> BucketSize =(76 -24)/10
> 
> Bucket                 Count
> 24-29.2                2
> (total 10 rows)
> ..
> 70.8-76                1
> 

Hence the WIDTH_BUCKET function - something like (I don't have Oracle at home to test this) -

SELECT WIDTH_BUCKET(age, lbound, ubound, nbr_buckets) "Bucket Nbr", count(*)

   FROM t
   GROUP BY "Bucket Nbr";

where lbound is min(age), ubound is max(age), and nbr_buckets is the number of buckets you want (10)

Note that Bucket Nbr would be a number i.e 1 for the first bucket, 2 for the second bucket etc.

To turn this number into the actual range statement, you would need something like an inline view

SELECT lbound+(((ubound-lboundb)/nbr_buckets) * (b.bucket_nbr-1))||' - '||

       lbound+(((ubound-lboundb)/nbr_buckets) *  b.bucket_nbr) "Bucket",
       val "Count"
FROM  (SELECT WIDTH_BUCKET(age, lbound, ubound, nbr_buckets) "Bucket_Nbr",
       count(*) val

   FROM t
   GROUP BY "Bucket") b

Note that this will work (assuming the theory is correct) for any range of values, but more importantly for any number of buckets. YMMV, as there is bound to be some debugging required for the above pseudo code.

> Mark Townsend <markbtownsend_at_attbi.com> wrote in message
> news:<B9E5FECF.B3D%markbtownsend_at_attbi.com>...
>> in article 499d4ffb.0210302045.452ef143_at_posting.google.com, NoSpamPlease at
>> googleposer_at_yahoo.com wrote on 10/30/02 8:45 PM:
>> 

>>> Given an input table T(age)
>>>
>>> Age
>>> 27
>>> 24
>>> 34
>>> 56
>>> 40
>>> 76
>>> 30
>>> 32
>>> 67
>>> 71
>>>
>>> I would like to generate the following output
>>>
>>> Range Value
>>> 0-10 0
>>> 11-20 0
>>> 21-30 2
>>> 31-40 4
>>> 41-50 0
>>> 51-60 1
>>> 61-70 1
>>> 71-80 2
>>> 81-90 0
>>> 91-100 0
>>>
>>>
>>> Can anyone help me with a query
>>> Thx in adv
>> 
>> For an Oracle solution, assuming the bucket ranges you have given, something
>> like
>> 
>> SELECT
>> (CASE WHEN age BETWEEN   0 AND 10 THEN  ' 0 - 10'
>> WHEN age BETWEEN  11 AND 20 THEN  '11 - 20'
>> WHEN age BETWEEN  21 AND 30 THEN  '21 - 30'
>> ...
>> END) 
>> AS Range, 
>> COUNT(*) AS Value
>> FROM T
>> GROUP BY 
>> (CASE WHEN age BETWEEN   0 AND 10 THEN  ' 0 - 10'
>> WHEN age BETWEEN  11 AND 20 THEN  '11 - 20'
>> WHEN age BETWEEN  21 AND 30 THEN  '21 - 30'
>> ...
>> END);
>> 
>> This obviously becomes a PITA with more buckets.
>> 
>> Depending on your actual requirements, you may also want to investigate
>> NTILE or WIDTH_BUCKET functions, which will automatically create either
>> equiwidth buckets or histograms for you, respectively -
>> http://otn.oracle.com/docs/products/oracle9i/doc_library/release2/server.920
>> /a96520/analysis.htm
Received on Thu Oct 31 2002 - 22:24:05 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US