Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.misc -> Re: SQL Question -- Stratifying Data
Ken Moore wrote:
>
> I was wondering what the best method is to allow a user to create a
> query which returns stratified data from continuous numeric data. For
> example, say I have a table with a value for age. I want to write a
> select statement which returns four age categories from all the possible
> age values in the table. As these categories will change frequently, I
> don't want to create and store an age strata column in the table.
>
> I know that the DECODE function will allow me to categorize interval
> data, but I need a method to do the same for data which may be contained
> within a continuous range.
>
> Thanks for your help
You don't say what kind of report(s) you are trying to generate, but the following may be useful (but see caveat at the end)
Given a table T with column C containing a set of values from a continuous distribution, to get a single-row report with N columns containing the count of rows containing values between b1 and e1, b2 and e2, b3 and e3, ..., bn and en, the following will work:
select sum(decode(greatest(b1,least(C,e1),C,C,null))) "R1" ,sum(decode(greatest(b2,least(C,e2),C,C,null))) "R2" ,sum(decode(greatest(b3,least(C,e3),C,C,null))) "R3" ,... ,sum(decode(greatest(bn,least(C,en),C,C,null))) "Rn" ,count(*) - count(C) "NULLCOUNT" -- optionalfrom T;
To get a similar report with a separate row for each range, you *must* join to a table with (at least) N rows. (I maintain a table of integers from 1 - 1000 for just such a purpose.) The required statement is then of the form:
select 'R'||S.seq
,sum(C) -- or count(C) or whatever
from T, sequence_numbers S
where S.seq between 1 and N
and T.C between decode(S.seq,1,b1 ,2,b2 ,3,b3 ,... ,N,bn ) and decode(S.seq,1,e1 ,2,e2 ,3,e3 ,... ,N,en )
However, the expense of maintaining/generating such statements for
ranges which vary over time is probably greater than that of maintaining
a separate range table.
Since the user is creating his/her own statements, I would prefer to
allow him to maintain his own entries in a range table, identified by
user_id if necessary. This allows the query statement(s) to be "static"
(and much simpler), e.g.
select RT.id,sum(C)[,count(C),...]
from range_table RT, T
where T.C between RT.lo_value and RT.hi_value
[and RT.user_id = user] -- for multiple users
;
HTH
-- Chrysalis FABRICATI DIEM, PVNC ('To Protect and to Serve') Terry Pratchett : "Guards Guards"Received on Thu Jul 31 1997 - 00:00:00 CDT
![]() |
![]() |