Group By Noodle Twister

From: <lycovian_at_my-deja.com>
Date: Fri, 10 Mar 2000 18:31:01 GMT
Message-ID: <8abf15$qoj$1@nnrp1.deja.com>

OK. I've been doing Oracle now for 2 years (application programming) and databases for about 10 but this simple one has me stumped for an efficient way to do this. Before starting thanks for reading this and any help you can render would be appreciated.

I've got the problem "figured out" but I'm wondering if the solution is to use sub-selects or is there a way to get the info in one pass rather than two queries or a sub-select.

Look at this table:

aid     dt      dir       grouping
-------------------       ----------
a       1/1/00  0         (*) - dt,dir
b       1/1/00  1         (*) - dt,dir
c       1/2/00  0         (*) - dt,dir
c       1/2/00  1             - dt
d       1/2/00  0         (*) - dt,dir
d       1/1/00  1             -
e       1/1/00  0             - dir
e       1/2/00  1         (*) - dt
f       1/2/00  1         (*) - dt
f       1/1/00  0             - dr
f       1/1/00  1             -

Problem: Return for each group (aid) a single row that is the max date (dt) for the group and the min dir for that max date.

Desired Result (marked by *):

aid     dt      dir        grouping
-------------------        ----------
a       1/1/00  0         (*) - dt,dir
b       1/1/00  1         (*) - dt,dir
c       1/2/00  0         (*) - dt,dir
d       1/2/00  0         (*) - dt,dir
e       1/2/00  1         (*) - dt
f       1/2/00  1         (*) - dt

The grouping columns are not part of the data but give a clue as how I'm currently solving this problem. Under group dt indicates its the max date for the group, dir indicates it is the min for the group on aid, not for dt

Here is my current solution:
select a.aid, a.dt, a.dir
  from test a
  where a.dt =
(

    select max(b.dt)
    from test b
    where b.aid = a.aid
  )
  and a.dir =
(

    select min(c.dir)
    from test c
    where c.aid = a.aid
    and c.dt = a.dt
   )

As you can see this is a rather complex solution for what I consider a rather simple problem requiring multiple sub-selects and five joins! Further complicating the problem is that I don't have a pk to work with or I would use it in the outermost query to join with rather than having to find my row with the results of two sub- selects.

I've also tried using the IN predicate and nesting the sub-queries but in my speed tests the presented solution is several times faster. What I would like to do is simply return the pk for the max dt and the min dir for that max dt with one or two queries and as few joins as possible. My actual dataset for this is about 57K rows (and growing) and performance is suffering because it requires so many joins. Is it possible to do this in one query? Anyone know? I'm probably going to add a primary key to the table eventually but I'm curious as to how to make this more efficient without having to make a structural change to the db. Oh, and yes, I have indexes all all joined columns and in general the database is a speed demon. Thanks for your time and any help you lend.

Mike Wilson
Software Engineer
Sony Pictures Imageworks
e-mail: mwilson_at_imageworks.com

To save some typing here is the table script: CREATE TABLE test
(

  aid                        VARCHAR2(10),
  dt                         DATE,
  dir                        NUMBER(2)

)
/

Sent via Deja.com http://www.deja.com/
Before you buy. Received on Fri Mar 10 2000 - 12:31:01 CST