Home » SQL & PL/SQL » SQL & PL/SQL » Get min and max dates by grouping dates with 30 day range
Get min and max dates by grouping dates with 30 day range [message #253036] Sat, 21 July 2007 17:22 Go to next message
oraraj
Messages: 8
Registered: December 2005
Junior Member
Hi,

I am working on a query that has service date column and I need to group dates within 30 day range and then display min and max in each group.

Service Date

04/20/2007 -- Grp1

05/25/2007 -- Grp2
(diff between previous date and this one is > 30. So this one is in Grp2)

06/01/2007 -- Grp2
06/15/2007 -- Grp2
06/17/2007 -- Grp2
06/19/2007 -- Grp2
06/22/2007 -- Grp2


07/06/2007 -- Grp3
(diff between first date of previous group and this one is > 30. So this is in Grp3)
07/10/2007 -- Grp3


So the final output should look as follows:

Min Service Date Max Service Date
04/20/2007 04/20/2007
05/25/2007 06/22/2007
07/06/2007 07/10/2007

I have tried using analytic function min and max using range 30 preceding and 30 following to get min and max date for each service date.

For the date in grp2
05/25/2007 and 06/01/2007, the min and max date is within 30 day range. But for the third one onwards the date changes as the range window is now 30 days onwards from the current row. So the max date for 06/15/2007 is 07/06/2007.

All dates within 05/25/2007 and 06/22/2007 should have min date as 05/22/2007 and max date as 06/22/2007.


Any suggestions is appreciated.

Thanks!
RK

Re: Get min and max dates by grouping dates with 30 day range [message #253050 is a reply to message #253036] Sun, 22 July 2007 00:05 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I have encountered this one once before - it's probably the most difficult problem I've ever seen to try to solve without procedural logic.

On first glance, it looks possible to do with Analytic Functions, but its not.

For windowing clauses to work on analytic functions, you need to be able to determine the window in which a row resides using just the row itself and those around it (based on the analytic PARTITION and ODER clauses).

That is not possible with this query. To determine the window in which a row resides, you need to start with the very first row and work forward - dividing the rows into groups.

For example: consider the dates:
1-jan-2007
1-feb-2007
2-mar-2007
These are divides into 2 groups. The 30 day period commencing 1-jan ends on 30-jan, therefore 1-feb is the start of a second group. Since 1-mar is exactly 30 days after 1-feb, it too is in the second group.

But what happens when we add a row:
1-jan-2007
31-jan-2007
1-feb-2007
2-mar-2007
1-jan is still in a group of its own, but now the second group starts from 31-jan. Since 2-mar is 31 days away, it does not fit into group 2 - it must be placed into a third group.

So we can see now that - in order to determine which rows will be grouped with 2-mar - we first must allocate groups to all of the preceding rows.

This makes the problem iterative, not analytic.

The iterative nature gives us a clue to a possible solution: Oracle provides an iterative operator: CONNECT BY

Consider the data:
create table dates (d date);

insert into dates values (to_date('20070101','YYYYMMDD'));
insert into dates values (to_date('20070131','YYYYMMDD'));
insert into dates values (to_date('20070201','YYYYMMDD'));
insert into dates values (to_date('20070302','YYYYMMDD'));

And the following SQL lists the table applying an ordinal number to each date:
SELECT d, ROW_NUMBER() OVER (ORDER BY d) AS rn
FROM   dates;

01/JAN/07 1 
31/JAN/07 2 
01/FEB/07 3 
02/MAR/07 4


Now we use CONNECT BY to create lists of dates in 30 day groups:
SELECT CONNECT_BY_ROOT d AS root_d
,      d
,      rn
FROM (
    SELECT d, ROW_NUMBER() OVER (ORDER BY d) AS rn
    FROM   dates
)
CONNECT BY rn = PRIOR rn + 1
AND        d <= CONNECT_BY_ROOT d + 29;

ROOT_D    D         RN 
--------- --------- --
01/JAN/07 01/JAN/07 1 

31/JAN/07 31/JAN/07 2 
31/JAN/07 01/FEB/07 3 

01/FEB/07 01/FEB/07 3 
01/FEB/07 02/MAR/07 4 

02/MAR/07 02/MAR/07 4 

Here, we can see that each row can appear multiple times because we have included EVERY possible group. eg. 2-mar appears twice: once in a group starting from 1-feb, and once in a group starting 2-mar.

Clearly we're not there yet because we only want each date to appear once. We want to iteratively work through the list and choose one row for each value of RN.

Using CONNECT BY again, we'll start with row 1, connect to row 2, then row 3 etc with the clause START WITH rn = 1 CONNECT BY rn = PRIOR rn + 1

Going from RN=2 to RN = 3, we have a choice of two rows and we want only one of them. There may be more than 2 of course, but regardless of how many there are, no more than one of them will have the same ROOT_D as the previous row in the chain. If one such row exists, its the one we want to choose. If it doesn't exist, then the new row (in this case, RN=3) will be the start of a new group, so we pick up the row where D=ROOT_D.

SELECT root_d
,      d
,      rn
FROM (
    SELECT CONNECT_BY_ROOT d AS root_d
    ,      d
    ,      rn
    FROM (
        SELECT d, ROW_NUMBER() OVER (ORDER BY d) AS rn
        FROM   dates
    )
    CONNECT BY rn = PRIOR rn + 1
    AND        d <= CONNECT_BY_ROOT d + 29
)
START WITH rn = 1
CONNECT BY rn = PRIOR rn + 1
AND (root_d = PRIOR root_d OR (d = root_d AND root_d > PRIOR root_d + 29));

ROOT_D    D         RN 
--------- --------- --
01/JAN/07 01/JAN/07 1 
31/JAN/07 31/JAN/07 2 
31/JAN/07 01/FEB/07 3 
02/MAR/07 02/MAR/07 4 


That's just about it, we now have one output row per original row in the table, and each row is marked with the start date of the group. There's still one problem, what happens with duplicates? I won't include it here, but it stuffs the whole thing up. You can resolve it by selecting only distinct dates in the innermost SELECT.
SELECT root_d
,      d
,      rn
FROM (
    SELECT CONNECT_BY_ROOT d AS root_d
    ,      d
    ,      rn
    FROM (
        SELECT d, ROW_NUMBER() OVER (ORDER BY d) AS rn
        FROM   (SELECT DISTINCT d FROM dates)
    )
    CONNECT BY rn = PRIOR rn + 1
    AND        d <= CONNECT_BY_ROOT d + 29
)
START WITH rn = 1
CONNECT BY rn = PRIOR rn + 1
AND (root_d = PRIOR root_d OR (d = root_d AND root_d > PRIOR root_d + 29))


Finally, to get the min and max of each group is simple:
SELECT MIN(d), MAX(d)
FROM (
    SELECT root_d
    ,      d
    ,      rn
    FROM (
        SELECT CONNECT_BY_ROOT d AS root_d
        ,      d
        ,      rn
        FROM (
            SELECT d, ROW_NUMBER() OVER (ORDER BY d) AS rn
            FROM   (SELECT DISTINCT d FROM dates)
        )
        CONNECT BY rn = PRIOR rn + 1
        AND        d <= CONNECT_BY_ROOT d + 29
    )
    START WITH rn = 1
    CONNECT BY rn = PRIOR rn + 1
    AND (root_d = PRIOR root_d OR (d = root_d AND root_d > PRIOR root_d + 29))
)
GROUP BY root_d;

MIN(D)    MAX(D) 
--------- ---------
01/JAN/07 01/JAN/07 
31/JAN/07 01/FEB/07 
02/MAR/07 02/MAR/07 


Now, this is the most important thing: take that SQL around the office without my explanation attached, and see if anyone can figure out what it is supposed to do.

Chances are they can't; and with good reason: it's rubbish code. If someone I employed wrote that code for a production system I'd fire them because it is a waste of their time and everybody elses that tries to read it.

You should always use the tool appropriate to the task, and for iteration, the tool is PL/SQL.
DECLARE
    group_start DATE;
    group_end   DATE;
BEGIN
    FOR c IN (
        SELECT DISTINCT d
        FROM   dates
        ORDER  BY d
    ) LOOP
        IF group_start IS NULL THEN
             group_start := c.d;
        END IF;

        IF c.d > group_start + 29 THEN
            dbms_output.put_line(
                to_char(group_start, 'YYYYMMDD') || ' ' || 
                to_char(group_end, 'YYYYMMDD'));
            group_start := c.d;
        END IF;

        group_end := c.d;
    END LOOP;

    IF group_start IS NOT NULL THEN
            dbms_output.put_line(
                to_char(group_start, 'YYYYMMDD') || ' ' || 
                to_char(group_end, 'YYYYMMDD'));
    END IF;    
END;

Now this is simple and clear - and it took me 5 minutes to write. If you want to reference this in a SQL, you can use a PIPELINED table function - I'll leave that as an exercise for you.

I imagine it is also possible to write a solution using the SQL MODEL clause, although I doubt it would be much more intuitive than the CONNECT BY.

Ross Leishman
Re: Get min and max dates by grouping dates with 30 day range [message #253066 is a reply to message #253036] Sun, 22 July 2007 02:01 Go to previous messageGo to next message
Volder
Messages: 38
Registered: April 2007
Location: Russia
Member
10g

SQL> with t as (select to_date('04/20/2007', 'mm/dd/yyyy') dt from dual union all -- Grp1
  2             select to_date('05/25/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  3             select to_date('06/01/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  4             select to_date('06/15/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  5             select to_date('06/17/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  6             select to_date('06/19/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  7             select to_date('06/22/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  8             select to_date('07/06/2007', 'mm/dd/yyyy') dt from dual union all -- Grp3
  9             select to_date('07/10/2007', 'mm/dd/yyyy') dt from dual) -- Grp3
 10     --
 11             select min(dt) min_dt, max(dt) max_dt from (
 12             select * from t
 13              model
 14               dimension by (row_number() over (order by dt) rn)
 15               measures(dt, dt+30 new_dt)
 16               rules(new_dt[rn>1] order by rn = case when dt[CV()]<=new_dt[CV()-1] then new_dt[CV()-1] else new_dt[CV()] end)
 17                 )
 18                 group by new_dt
 19  /

MIN_DT      MAX_DT
----------- -----------
20.04.2007  20.04.2007
06.07.2007  10.07.2007
25.05.2007  22.06.2007

SQL> 


@rleishman Would you fire me if I wrote such code in your prod system? Razz

[Updated on: Sun, 22 July 2007 02:10]

Report message to a moderator

Re: Get min and max dates by grouping dates with 30 day range [message #253071 is a reply to message #253050] Sun, 22 July 2007 02:30 Go to previous messageGo to next message
oraraj
Messages: 8
Registered: December 2005
Junior Member
Ross,

Thanks for your quick reply with a great solution!

Yes I do agree with iterative logic, PL/SQL is good.

I have tried out complex queries using analytic functions.

So I wanted to try using it.

Last year I had a requirement where I had to compare consecutive academic terms and group them within range of 8 and then get min and max terms. That worked and so I thought this may work...

Anyway, now I am getting following error
ORA-00932: inconsistent datatypes: expected NUMBER got -,

I get this error after adding these lines -

START WITH rn = 1
CONNECT BY rn = PRIOR rn + 1
AND (root_d = PRIOR root_d OR (d= root_d AND root_d > PRIOR root_d + 29)

I don't see any problem.
Do you see anything?

Thanks
RK
Re: Get min and max dates by grouping dates with 30 day range [message #253073 is a reply to message #253071] Sun, 22 July 2007 02:46 Go to previous messageGo to next message
Michel Cadot
Messages: 64152
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
What's the problem with copy and paste the statement and error?
Use SQL*PLus, execute the statement, copy and paste your screen.
Of course, format the output, read How to format your posts

Also Oracle version is important.

Regards
Michel
Re: Get min and max dates by grouping dates with 30 day range [message #253074 is a reply to message #253066] Sun, 22 July 2007 02:57 Go to previous messageGo to next message
oraraj
Messages: 8
Registered: December 2005
Junior Member
Volder,

That was an awesome solution!
I used my tables and ran your query and it worked!

Can you explain this part -

measures(dt, dt+30 new_dt)
rules(new_dt[rn>1] order by rn = case when dt[CV()]<=new_dt[CV()-1] then new_dt[CV()-1] else new_dt[CV()] end)
)



Thanks
RK
Smile
Re: Get min and max dates by grouping dates with 30 day range [message #253077 is a reply to message #253074] Sun, 22 July 2007 03:15 Go to previous messageGo to next message
Michel Cadot
Messages: 64152
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Model Expressions
Model Functions

Regards
Michel
Re: Get min and max dates by grouping dates with 30 day range [message #253087 is a reply to message #253074] Sun, 22 July 2007 04:24 Go to previous messageGo to next message
Volder
Messages: 38
Registered: April 2007
Location: Russia
Member
oraraj wrote on Sun, 22 July 2007 02:57


Can you explain this part -

measures(dt, dt+30 new_dt)
rules(new_dt[rn>1] order by rn = case when dt[CV()]<=new_dt[CV()-1] then new_dt[CV()-1] else new_dt[CV()] end)
)


well, as Michel provided you links you better start with reading about model clause.
But in short:

step1
in the model clause I declared two measures dt and dt+30
so if you change the rule clause to
rules()

you'll get the following result
SQL> with t as (select to_date('04/20/2007', 'mm/dd/yyyy') dt from dual union all -- Grp1
  2               select to_date('05/25/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  3               select to_date('06/01/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  4               select to_date('06/15/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  5               select to_date('06/17/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  6               select to_date('06/19/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  7               select to_date('06/22/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  8               select to_date('07/06/2007', 'mm/dd/yyyy') dt from dual union all -- Grp3
  9               select to_date('07/10/2007', 'mm/dd/yyyy') dt from dual) -- Grp3
 10       --
 11               select * from t
 12                model
 13                 dimension by (row_number() over (order by dt) rn)
 14                 measures(dt, dt+30 new_dt)
 15                 rules()
 16  /

        RN DT          NEW_DT
---------- ----------- -----------
         1 20.04.2007  20.05.2007
         2 25.05.2007  24.06.2007
         3 01.06.2007  01.07.2007
         4 15.06.2007  15.07.2007
         5 17.06.2007  17.07.2007
         6 19.06.2007  19.07.2007
         7 22.06.2007  22.07.2007
         8 06.07.2007  05.08.2007
         9 10.07.2007  09.08.2007

9 rows selected

SQL


step2
then I placed one rule which will run through all the dates in the table starting from the second one
rn>1
in the ascending order
 order by rn

and check whether the current dt less or equal to new_dt of the previous row:
when dt[CV()]<=new_dt[CV()-1] then

If it so - then current row and previous one are in the same group - so I change the new_dt of the current row to the new_dt of the preious row:
then new_dt[CV()-1]

When the dt of the current row is more than new_dt of the previous row - it means that the dt of the current row is more than 30 days after the first dt of the previous group - and hence the current dt should be in the next group - so I leave the new_dt of the current row without change:
else new_dt[CV()]
and so on.

SQL> with t as (...)
 10       --
 11               select * from t
 12                model
 13                 dimension by (row_number() over (order by dt) rn)
 14                 measures(dt, dt+30 new_dt)
 15                 rules(new_dt[rn>1] order by rn = case when dt[CV()]<=new_dt[CV()-1] then new_dt[CV()-1] else new_dt[CV()] end)
 16  /

        RN DT          NEW_DT
---------- ----------- -----------
         1 20.04.2007  20.05.2007
         2 25.05.2007  24.06.2007
         3 01.06.2007  24.06.2007
         4 15.06.2007  24.06.2007
         5 17.06.2007  24.06.2007
         6 19.06.2007  24.06.2007
         7 22.06.2007  24.06.2007
         8 06.07.2007  05.08.2007
         9 10.07.2007  05.08.2007

9 rows selected

SQL> 


step3

In the end I will have same new_dt's for the dt's belonging to the same group - so I just group by new_dt in the end and find MIN and MAX dt's in the group.
SQL> with t as (...)
 11               select min(dt) min_dt, max(dt) max_dt from (
 12               select * from t
 13                model
 14                 dimension by (row_number() over (order by dt) rn)
 15                 measures(dt, dt+30 new_dt)
 16                 rules(new_dt[rn>1] order by rn = case when dt[CV()]<=new_dt[CV()-1] then new_dt[CV()-1] else new_dt[CV()] end)
 17                   )
 18                   group by new_dt
 19                   order by 1
 20  /

MIN_DT      MAX_DT
----------- -----------
20.04.2007  20.04.2007
25.05.2007  22.06.2007
06.07.2007  10.07.2007

SQL> 

That's all the logic.
Re: Get min and max dates by grouping dates with 30 day range [message #253204 is a reply to message #253066] Mon, 23 July 2007 02:33 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
rleishman wrote on Sun, 22 July 2007 15:05
I imagine it is also possible to write a solution using the SQL MODEL clause, although I doubt it would be much more intuitive than the CONNECT BY.


Volder wrote on Sun, 22 July 2007 17:01
@rleishman Would you fire me if I wrote such code in your prod system? Razz


More intuitive? Yep; in exactly the same way that Ancient Greek is more intuitive than Hieroglyphics.

If I was the manager of the support/upgrade project that followed your build project, and my project was failing because programmers were spending hours longer than necessary to understand elegant but obtuse legacy code so that they can apply new business rules, then yes, I would fire you.

  • Relational logic belongs in SQL.
  • Iterative and conditional logic belongs in PL/SQL.
  • Recursive tree traversing logic belongs in CONNECT BY.
  • Multi-dimensional OLAP manipulation belongs in MODEL clauses.

Deviations from this cost projects.

Ross Leishman
Re: Get min and max dates by grouping dates with 30 day range [message #253301 is a reply to message #253036] Mon, 23 July 2007 07:34 Go to previous messageGo to next message
MarcL
Messages: 455
Registered: November 2006
Location: Connecticut, USA
Senior Member
So you write this SQL code using models, then you move on to another job. Now the user changes their mind and wants slightly different requirements. How much time and therefore $$ is wasted for the developer to try and maintain your fancy code.

Time = $$ ...

KISS.

Re: Get min and max dates by grouping dates with 30 day range [message #253362 is a reply to message #253301] Mon, 23 July 2007 10:48 Go to previous messageGo to next message
Frank
Messages: 7880
Registered: March 2000
Senior Member
MarcL wrote on Mon, 23 July 2007 14:34
So you write this SQL code using models, then you move on to another job. Now the user changes their mind and wants slightly different requirements. How much time and therefore $$ is wasted for the developer to try and maintain your fancy code.

Time = $$ ...

KISS.



I know people who said the same about using joins, referential integrity etc, years ago. The fact that you (and I Wink) are not used to this type of sql, and hence have great difficulties understanding it, does not mean that others should not use it.
You should use the tool that fits the job. Don't use something simply 'because it is available', but also don't be afraid to use something new.
Upon introduction of analytics, they were 'scary' too..
Re: Get min and max dates by grouping dates with 30 day range [message #253373 is a reply to message #253204] Mon, 23 July 2007 11:20 Go to previous messageGo to next message
Volder
Messages: 38
Registered: April 2007
Location: Russia
Member
@Ross
Quote:

* Relational logic belongs in SQL.
* Iterative and conditional logic belongs in PL/SQL.
* Recursive tree traversing logic belongs in CONNECT BY.
* Multi-dimensional OLAP manipulation belongs in MODEL clauses.



Who did say that?
Even in the doc on the MODEL clause you can find:
1. The rules can range from basic arithmetic to simultaneous equations using recursion.
2. You can specify iterative rule evaluation, in which case the rules are evaluated iteratively until the termination condition is satisfied.

@MarcL,Ross
If you guys are not familiar with model clause - it doesn't mean you shouldn't learn and use it.

So for people who know the syntax of Model - it is much easier to understand MODEL than your CONNECT BY, Ross.
I think it would take not more than five minutes - to understand it.

When you say "spend time", "time is money", you guys, speak about the time future programmers will take to understand the code.
But you don't think that time which takes a query to execute is much more important?

Just look:

SQL> drop table dates;

Table dropped.

SQL> create table dates(d date);

Table created.

SQL>    declare dt date:=sysdate;
  2     begin
  3     for i in 1..1000
  4     loop
  5     insert into dates values (dt);
  6     dt:=dt+round(dbms_random.value(1,2));
  7     end loop;
  8     end;
  9  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user,'dates');

PL/SQL procedure successfully completed.

SQL> set timing on;
SQL> set autotrace traceonly;

SQL> select min(d) min_dt, max(d) max_dt from (
  2        select * from dates
  3         model
  4          dimension by (row_number() over (order by d) rn)
  5          measures(d, d+29 new_dt)
  6          rules(new_dt[rn>1] order by rn = case when d[CV()]<=new_dt[CV()-1] then new_dt[CV()-1] 
else new_dt[CV()] end)
  7            )
  8            group by new_dt;

50 rows selected.

Elapsed: 00:00:00.09

Execution Plan
----------------------------------------------------------
Plan hash value: 3995135642

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |  1000 | 18000 |     4  (50)| 00:00:01 |
|   1 |  HASH GROUP BY        |       |  1000 | 18000 |     4  (50)| 00:00:01 |
|   2 |   VIEW                |       |  1000 | 18000 |     3  (34)| 00:00:01 |
|   3 |    SQL MODEL ORDERED  |       |  1000 |  8000 |     3  (34)| 00:00:01 |
|   4 |     WINDOW SORT       |       |  1000 |  8000 |     3  (34)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| DATES |  1000 |  8000 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
       1792  bytes sent via SQL*Net to client
        414  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         50  rows processed
         
SQL> SELECT MIN(d), MAX(d)
  2  FROM (
  3      SELECT root_d
  4      ,      d
  5      ,      rn
  6      FROM (
  7          SELECT CONNECT_BY_ROOT d AS root_d
  8          ,      d
  9          ,      rn
 10          FROM (
 11              SELECT d, ROW_NUMBER() OVER (ORDER BY d) AS rn
 12              FROM   (SELECT DISTINCT d FROM dates)
 13          )
 14          CONNECT BY rn = PRIOR rn + 1
 15          AND        d <= CONNECT_BY_ROOT d + 29
 16      )
 17      START WITH rn = 1
 18      CONNECT BY rn = PRIOR rn + 1
 19      AND (root_d = PRIOR root_d OR (d = root_d AND root_d > PRIOR root_d + 29))
 20  )
 21  GROUP BY root_d;

50 rows selected.

Elapsed: 00:00:40.72

Execution Plan
----------------------------------------------------------
Plan hash value: 2602740945

--------------------------------------------------------------------------------------------
| Id  | Operation                          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |       |  1000 | 18000 |     4  (50)| 00:00:01 |
|   1 |  HASH GROUP BY                     |       |  1000 | 18000 |     4  (50)| 00:00:01 |
|   2 |   VIEW                             |       |  1000 | 18000 |     3  (34)| 00:00:01 |
|*  3 |    CONNECT BY WITHOUT FILTERING    |       |       |       |            |          |
|*  4 |     FILTER                         |       |       |       |            |          |
|   5 |      COUNT                         |       |       |       |            |          |
|   6 |       VIEW                         |       |  1000 | 31000 |     3  (34)| 00:00:01 |
|   7 |        CONNECT BY WITHOUT FILTERING|       |       |       |            |          |
|   8 |         COUNT                      |       |       |       |            |          |
|   9 |          VIEW                      |       |  1000 | 22000 |     3  (34)| 00:00:01 |
|  10 |           WINDOW SORT              |       |  1000 |  9000 |     3  (34)| 00:00:01 |
|  11 |            VIEW                    |       |  1000 |  9000 |     3  (34)| 00:00:01 |
|  12 |             HASH UNIQUE            |       |  1000 |  8000 |     3  (34)| 00:00:01 |
|  13 |              TABLE ACCESS FULL     | DATES |  1000 |  8000 |     2   (0)| 00:00:01 |
|  14 |     COUNT                          |       |       |       |            |          |
|  15 |      VIEW                          |       |  1000 | 31000 |     3  (34)| 00:00:01 |
|  16 |       CONNECT BY WITHOUT FILTERING |       |       |       |            |          |
|  17 |        COUNT                       |       |       |       |            |          |
|  18 |         VIEW                       |       |  1000 | 22000 |     3  (34)| 00:00:01 |
|  19 |          WINDOW SORT               |       |  1000 |  9000 |     3  (34)| 00:00:01 |
|  20 |           VIEW                     |       |  1000 |  9000 |     3  (34)| 00:00:01 |
|  21 |            HASH UNIQUE             |       |  1000 |  8000 |     3  (34)| 00:00:01 |
|  22 |             TABLE ACCESS FULL      | DATES |  1000 |  8000 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("RN"=1)
   4 - filter("RN"=1)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
       1792  bytes sent via SQL*Net to client
        414  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
         50  rows processed



MarcL, do you still find MODEL CLAUSE a "fancy" query?

[Updated on: Mon, 23 July 2007 11:23]

Report message to a moderator

Re: Get min and max dates by grouping dates with 30 day range [message #253390 is a reply to message #253373] Mon, 23 July 2007 13:53 Go to previous messageGo to next message
skooman
Messages: 912
Registered: March 2005
Location: Netherlands
Senior Member
Thumbs Up Volder!!
Re: Get min and max dates by grouping dates with 30 day range [message #253433 is a reply to message #253390] Mon, 23 July 2007 21:21 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
@volder, don't get me wrong. I'm not saying the CONNECT BY solution is appropriate in any way. I included it to demonstrate what a complex problem it is using "normal" SQL. The MODEL solution is much nicer and certainly more efficient.

However, given the appropriate indexing, the PL/SQL solution would be more efficient still (as an index can be used to bypass the necessary sort), and it is more "readable" to boot. The analytic query in the MODEL solution uses a window sort which cannot currently be bypassed, and (by my limited understanding) the MODEL itself uses analytic workspace which would increase memory usage.

Frank, I agree. Time will tell whether MODEL is embraced and used widely beyond its intended scope as an OLAP extension. However, history and experience tells us that much can be accomplished with clever CONNECT BY or analytic functions, but to the detriment of performance and understandability.

Clearly this is a subjective judgement. Volder values brevity and cleverness, I value clarity and performance. I don't see a solution here - I can just agree to disagree.

Ross Leishman

[Updated on: Mon, 23 July 2007 21:21]

Report message to a moderator

Re: Get min and max dates by grouping dates with 30 day range [message #253475 is a reply to message #253433] Tue, 24 July 2007 01:39 Go to previous messageGo to next message
Volder
Messages: 38
Registered: April 2007
Location: Russia
Member
rleishman wrote on Mon, 23 July 2007 21:21

However, given the appropriate indexing, the PL/SQL solution would be more efficient still ...


but your PL/SQL block still just prints the text (dbms_output.)
And if you want to use this result next - what changes would you do to your PLSQL?

So at this moment we can't compare them in the perfomance - cause they do different things.
Re: Get min and max dates by grouping dates with 30 day range [message #253758 is a reply to message #253475] Tue, 24 July 2007 15:04 Go to previous messageGo to next message
oraraj
Messages: 8
Registered: December 2005
Junior Member
Volder,

I have other columns that I need along with the dates.

So I have added those columns in the query below.

I am getting ora-01722 invalid number error on running the query.

PATIENT_NUM, MEM_NUM, PRV_TYPE, DIAG_CD have datatype VARCHAR2.
DOB, DT, NEW_DT have datatype DATE.

I think the date comparison is failing.

SELECT PATIENT_NUM, MEM_NUM, DOB, PRV_TYPE, DIAG_CD, MIN(DT) MIN_DT, MAX(DT) MAX_DT FROM
(
SELECT * FROM T
MODEL
DIMENSION BY (PATIENT_CONTROL_NUM, MEM_NUM, DOB, PRV_TYPE,DIAG_CD,
ROW_NUMBER() OVER (PARTITION BY PATIENT_CONTROL_NUM, MEM_NUM, DOB, PRV_TYPE,DIAG_CD
ORDER BY PATIENT_CONTROL_NUM, MEM_NUM, DOB, PRV_TYPE,DIAG_CD,DT) RN)
MEASURES(DT, DT+29 NEW_DT)

RULES(NEW_DT[ANY,ANY,ANY,ANY,ANY, RN >1] ORDER BY PATIENT_CONTROL_NUM, MEM_NUM, DOB, PRV_TYPE,DIAG_CD,DT =

CASE WHEN DT[CV(), CV(), CV(), CV(), CV(), CV()]<=
NEW_DT[CV()-1,CV()-1,CV()-1,CV()-1,CV()-1,CV()-1] THEN
NEW_DT[CV()-1,CV()-1,CV()-1,CV()-1,CV()-1,CV()-1] ELSE
NEW_DT[CV(), CV(), CV(), CV(), CV(), CV()] END)
)
GROUP BY PATIENT_NUM,
MEM_NUM,
DOB,
PRV_TYPE,
DIAG_CD,
NEW_DT

I tried selecting data without the group by, and hardcoded values in the rules for rn =2 and it sets NEW_DT to Null

Any suggestions are appreciated!

Thanks
RK

Re: Get min and max dates by grouping dates with 30 day range [message #253809 is a reply to message #253758] Tue, 24 July 2007 21:59 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Volder wrote on Tue, 24 July 2007 16:39
rleishman wrote on Mon, 23 July 2007 21:21

However, given the appropriate indexing, the PL/SQL solution would be more efficient still ...


but your PL/SQL block still just prints the text (dbms_output.)
And if you want to use this result next - what changes would you do to your PLSQL?

So at this moment we can't compare them in the perfomance - cause they do different things.


rleishman wrote on Sun, 22 July 2007 15:05
...Now this is simple and clear - and it took me 5 minutes to write. If you want to reference this in a SQL, you can use a PIPELINED table function - I'll leave that as an exercise for you.
Re: Get min and max dates by grouping dates with 30 day range [message #253826 is a reply to message #253758] Wed, 25 July 2007 00:37 Go to previous messageGo to next message
Volder
Messages: 38
Registered: April 2007
Location: Russia
Member
oraraj wrote on Tue, 24 July 2007 15:04
Volder,

I have other columns that I need along with the dates.

So I have added those columns in the query below.



1. First of all you should give us input and expected output - but not the query changed by you.

2. You just doing what guys were talking about - you don't know MODEL CLAUSE - and you're trying to use it in your code. That's bad. Read about it and learn it first. Or your boss will fire you - cause it takes a lot time to understand what you have written Laughing

3. If I catch what you want - the PARTITION BY will help you:

SQL> with t as (    select 111 patient_control_num, to_date('04/20/2007', 'mm/dd/yyyy') dt from dual union all -- Grp1
  2                 select 111 patient_control_num, to_date('05/25/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  3                 select 111 patient_control_num, to_date('06/03/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  4                 select 222 patient_control_num, to_date('04/15/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  5                 select 222 patient_control_num, to_date('04/17/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  6                 select 222 patient_control_num, to_date('04/28/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  7                 select 222 patient_control_num, to_date('06/02/2007', 'mm/dd/yyyy') dt from dual union all -- Grp2
  8                 select 222 patient_control_num, to_date('07/06/2007', 'mm/dd/yyyy') dt from dual union all -- Grp3
  9                 select 222 patient_control_num, to_date('07/10/2007', 'mm/dd/yyyy') dt from dual) -- Grp3
 10         --
 11  select num, min(dt) min_dt, max(dt) max_dt from (
 12                 select * from t
 13                  model
 14                   partition by (patient_control_num num)
 15                   dimension by (row_number() over (partition by patient_control_num order by dt) rn)
 16                   measures(dt, dt+30 new_dt)
 17                   rules(new_dt[rn>1] order by rn = case when dt[CV()]<=new_dt[CV()-1] then new_dt[CV()-1] else new_dt[CV()] end)
 18                     )
 19                     group by num, new_dt
 20                     order by 1,2
 21  /

       NUM MIN_DT      MAX_DT
---------- ----------- -----------
       111 20.04.2007  20.04.2007
       111 25.05.2007  03.06.2007
       222 15.04.2007  28.04.2007
       222 02.06.2007  02.06.2007
       222 06.07.2007  10.07.2007

SQL>
Re: Get min and max dates by grouping dates with 30 day range [message #254014 is a reply to message #253826] Wed, 25 July 2007 08:58 Go to previous message
oraraj
Messages: 8
Registered: December 2005
Junior Member
Well thanks Volder!

I read some examples on rules and so I was trying to change the rules but did not realise all I need to do was just partition by which I have used so many times in analytic functions but found out now how I could use it in sql model clause in this example! Smile

Anyway, I added all my 5 columns in the partition by and got the desired output Smile

Thanks
RK

Previous Topic: Trigger questions (merged)
Next Topic: error in trigger
Goto Forum:
  


Current Time: Sat Dec 10 14:44:08 CST 2016

Total time taken to generate the page: 0.08471 seconds