Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: SQL Question -- Stratifying Data

Re: SQL Question -- Stratifying Data

From: Chrysalis <cellis_at_iol.ie>
Date: 1997/07/31
Message-ID: <33E0F69D.5071@iol.ie>#1/1

Ken Moore wrote:
>
> I was wondering what the best method is to allow a user to create a
> query which returns stratified data from continuous numeric data. For
> example, say I have a table with a value for age. I want to write a
> select statement which returns four age categories from all the possible
> age values in the table. As these categories will change frequently, I
> don't want to create and store an age strata column in the table.
>
> I know that the DECODE function will allow me to categorize interval
> data, but I need a method to do the same for data which may be contained
> within a continuous range.
>
> Thanks for your help

You don't say what kind of report(s) you are trying to generate, but the following may be useful (but see caveat at the end)

Given a table T with column C containing a set of values from a continuous distribution, to get a single-row report with N columns containing the count of rows containing values between b1 and e1, b2 and e2, b3 and e3, ..., bn and en, the following will work:

select sum(decode(greatest(b1,least(C,e1),C,C,null))) "R1"
      ,sum(decode(greatest(b2,least(C,e2),C,C,null))) "R2"
      ,sum(decode(greatest(b3,least(C,e3),C,C,null))) "R3"
      ,...
      ,sum(decode(greatest(bn,least(C,en),C,C,null))) "Rn"
      ,count(*) - count(C) "NULLCOUNT"	-- optional
from T;
(for counts rather than sums, replace "C,C,null" by "C,1,null")

To get a similar report with a separate row for each range, you *must* join to a table with (at least) N rows. (I maintain a table of integers from 1 - 1000 for just such a purpose.) The required statement is then of the form:

select 'R'||S.seq

      ,sum(C) -- or count(C) or whatever from T, sequence_numbers S
where S.seq between 1 and N

 and   T.C between decode(S.seq,1,b1
                               ,2,b2
                               ,3,b3
                               ,...
                               ,N,bn
                         )
               and decode(S.seq,1,e1
                               ,2,e2
                               ,3,e3
                               ,...
                               ,N,en
                         )

group by S.seq
;

However, the expense of maintaining/generating such statements for ranges which vary over time is probably greater than that of maintaining a separate range table.
Since the user is creating his/her own statements, I would prefer to allow him to maintain his own entries in a range table, identified by user_id if necessary. This allows the query statement(s) to be "static" (and much simpler), e.g.

select RT.id,sum(C)[,count(C),...]
 from range_table RT, T
where T.C between RT.lo_value and RT.hi_value [and RT.user_id = user] -- for multiple users ;

HTH

-- 
Chrysalis

FABRICATI DIEM, PVNC
('To Protect and to Serve')
Terry Pratchett : "Guards Guards"
Received on Thu Jul 31 1997 - 00:00:00 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US