Home » SQL & PL/SQL » SQL & PL/SQL » Analytical grouping of data (Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production)
Analytical grouping of data [message #651261] Fri, 13 May 2016 12:08 Go to next message
vduhan
Messages: 2
Registered: May 2016
Junior Member
Hi Gurus,

I'm trying to group data based on when the value changes. There are 3 columns (mydate, key and group). Requirement is to find out how many consecutive days the value has been stale for a particular key before it changes.

Sample data:
WITH tmp AS (
SELECT to_date('02MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
SELECT to_date('03MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
SELECT to_date('04MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
SELECT to_date('05MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
SELECT to_date('06MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
SELECT to_date('09MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
SELECT to_date('10MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
SELECT to_date('11MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.02 AS value from dual  UNION ALL
SELECT to_date('12MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.02 AS value from dual  UNION ALL
SELECT to_date('13MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
SELECT to_date('02MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
SELECT to_date('03MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
SELECT to_date('04MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
SELECT to_date('05MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
SELECT to_date('06MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
SELECT to_date('09MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
SELECT to_date('10MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.85 AS value from dual  UNION ALL
SELECT to_date('11MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
SELECT to_date('12MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.85 AS value from dual  UNION ALL
SELECT to_date('13MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual 
ORDER BY key,mydate, value)
SELECT * FROM tmp;


The data is already ordered based on key, mydate and value. The group changes (increments) whenever there is a change in the key/value pair.

In the above example, for "key" = 158353, the value (0.01) is stale from date 02-May to 10-May, so group for that should be, say G1.
On 11-May for "key" = 158353, the value changes from (0.01) to (0.02) and remains stale from date 11-May to 12-May, so group for that should be, say G2.
On 13-May for "key" = 158353, the value again changes from (0.02) back to (0.01) and remains stale only for a day, so group for that should be, say G3... and so on.

I was able to identify when exactly the key/value pair changes using LAG/LEAD functions but having trouble naming/assigning a group of all the items and assign proper group buckets to the same.

Expected output:
MYDATE     | KEY    | VALUE | GROUP
02/05/2016 | 158353 | 0.01  | G1
03/05/2016 | 158353 | 0.01  | G1
04/05/2016 | 158353 | 0.01  | G1
05/05/2016 | 158353 | 0.01  | G1
06/05/2016 | 158353 | 0.01  | G1
09/05/2016 | 158353 | 0.01  | G1
10/05/2016 | 158353 | 0.01  | G1
11/05/2016 | 158353 | 0.02  | G2
12/05/2016 | 158353 | 0.02  | G2
13/05/2016 | 158353 | 0.01  | G3
02/05/2016 | 333333 | 1.7   | G4
03/05/2016 | 333333 | 1.7   | G4
04/05/2016 | 333333 | 2     | G5
05/05/2016 | 333333 | 2     | G5
06/05/2016 | 333333 | 2     | G5
09/05/2016 | 333333 | 2     | G5
10/05/2016 | 333333 | 1.85  | G6
11/05/2016 | 333333 | 1.7   | G7
12/05/2016 | 333333 | 1.85  | G8
13/05/2016 | 333333 | 1.7   | G9


Really eager to see if there are any good ideas on how to implement this.



[Updated on: Fri, 13 May 2016 12:27]

Report message to a moderator

Re: Analytical grouping of data [message #651262 is a reply to message #651261] Fri, 13 May 2016 12:30 Go to previous messageGo to next message
Michel Cadot
Messages: 68644
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Thanks for the test case.
Here are 2 solutions depending if you want to start group number at 1 at each key or have unique group numbers:
SQL> col grp format a3
SQL> break on key dup skip 1
SQL> WITH tmp AS (
  2  SELECT to_date('02MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  3  SELECT to_date('03MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  4  SELECT to_date('04MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  5  SELECT to_date('05MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  6  SELECT to_date('06MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  7  SELECT to_date('09MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  8  SELECT to_date('10MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  9  SELECT to_date('11MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.02 AS value from dual  UNION ALL
 10  SELECT to_date('12MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.02 AS value from dual  UNION ALL
 11  SELECT to_date('13MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
 12  SELECT to_date('02MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
 13  SELECT to_date('03MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
 14  SELECT to_date('04MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 15  SELECT to_date('05MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 16  SELECT to_date('06MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 17  SELECT to_date('09MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 18  SELECT to_date('10MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.85 AS value from dual  UNION ALL
 19  SELECT to_date('11MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
 20  SELECT to_date('12MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.85 AS value from dual  UNION ALL
 21  SELECT to_date('13MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual
 22  ),
 23  data as (
 24    select tmp.*,
 25           case when lag(value) over (partition by key order by mydate) != value then 1 end nb
 26    from tmp
 27  )
 28  select key, mydate, value,
 29         'G'||to_char(count(nb) over (partition by key order by mydate)+1) grp
 30  from data
 31  ORDER BY key, mydate
 32  /
       KEY MYDATE           VALUE GRP
---------- ----------- ---------- ---
    158353 02-MAY-2016        .01 G1
    158353 03-MAY-2016        .01 G1
    158353 04-MAY-2016        .01 G1
    158353 05-MAY-2016        .01 G1
    158353 06-MAY-2016        .01 G1
    158353 09-MAY-2016        .01 G1
    158353 10-MAY-2016        .01 G1
    158353 11-MAY-2016        .02 G2
    158353 12-MAY-2016        .02 G2
    158353 13-MAY-2016        .01 G3

    333333 02-MAY-2016        1.7 G1
    333333 03-MAY-2016        1.7 G1
    333333 04-MAY-2016          2 G2
    333333 05-MAY-2016          2 G2
    333333 06-MAY-2016          2 G2
    333333 09-MAY-2016          2 G2
    333333 10-MAY-2016       1.85 G3
    333333 11-MAY-2016        1.7 G4
    333333 12-MAY-2016       1.85 G5
    333333 13-MAY-2016        1.7 G6


20 rows selected.

SQL> WITH tmp AS (
  2  SELECT to_date('02MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  3  SELECT to_date('03MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  4  SELECT to_date('04MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  5  SELECT to_date('05MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  6  SELECT to_date('06MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  7  SELECT to_date('09MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  8  SELECT to_date('10MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  9  SELECT to_date('11MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.02 AS value from dual  UNION ALL
 10  SELECT to_date('12MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.02 AS value from dual  UNION ALL
 11  SELECT to_date('13MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
 12  SELECT to_date('02MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
 13  SELECT to_date('03MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
 14  SELECT to_date('04MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 15  SELECT to_date('05MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 16  SELECT to_date('06MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 17  SELECT to_date('09MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 18  SELECT to_date('10MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.85 AS value from dual  UNION ALL
 19  SELECT to_date('11MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
 20  SELECT to_date('12MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.85 AS value from dual  UNION ALL
 21  SELECT to_date('13MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual
 22  ),
 23  data as (
 24    select tmp.*,
 25           case
 26             when    lag(key) over (order by key, mydate) != key
 27                  or lag(value) over (order by key, mydate) != value
 28               then 1
 29           end nb
 30    from tmp
 31  )
 32  select key, mydate, value,
 33         'G'||to_char(count(nb) over (order by key, mydate)+1) grp
 34  from data
 35  ORDER BY key, mydate
 36  /
       KEY MYDATE           VALUE GRP
---------- ----------- ---------- ---
    158353 02-MAY-2016        .01 G1
    158353 03-MAY-2016        .01 G1
    158353 04-MAY-2016        .01 G1
    158353 05-MAY-2016        .01 G1
    158353 06-MAY-2016        .01 G1
    158353 09-MAY-2016        .01 G1
    158353 10-MAY-2016        .01 G1
    158353 11-MAY-2016        .02 G2
    158353 12-MAY-2016        .02 G2
    158353 13-MAY-2016        .01 G3

    333333 02-MAY-2016        1.7 G4
    333333 03-MAY-2016        1.7 G4
    333333 04-MAY-2016          2 G5
    333333 05-MAY-2016          2 G5
    333333 06-MAY-2016          2 G5
    333333 09-MAY-2016          2 G5
    333333 10-MAY-2016       1.85 G6
    333333 11-MAY-2016        1.7 G7
    333333 12-MAY-2016       1.85 G8
    333333 13-MAY-2016        1.7 G9


20 rows selected.
Re: Analytical grouping of data [message #651263 is a reply to message #651262] Fri, 13 May 2016 12:31 Go to previous messageGo to next message
Michel Cadot
Messages: 68644
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

I see you added the expected output when I was writing my queries, so the second one for your case.
Thanks for formatting your expected output, it is appreciated to see aligned columns.

[Updated on: Fri, 13 May 2016 12:34]

Report message to a moderator

icon14.gif  Re: Analytical grouping of data [message #651264 is a reply to message #651263] Fri, 13 May 2016 12:51 Go to previous messageGo to next message
vduhan
Messages: 2
Registered: May 2016
Junior Member
Michel,

I tried using the seq (currval and nextval) with LAG earlier (something on the lines of .. if there is match then use currval else nextval) and since I was using both currval & nextval in the same SQL, wasn't able to get the desired result ... fiddled with it for more than an hour with no luck. Then decided to involve few more brains from the forum, took 15 mins break and woah, you are already ready with the solution. You are a superstar!! Cool

Being truthful, was expecting only a few pointers but you have made my "Friday" with a full fledged solution. Thanks a bunch again!!
Re: Analytical grouping of data [message #651265 is a reply to message #651264] Fri, 13 May 2016 13:05 Go to previous message
Michel Cadot
Messages: 68644
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

You can avoid the "+1" in the last formula using all the parameters of LAG function (assuming key and value are not NULL otherwise the use of NVL function should be added, this is also true for the previous queries):
SQL> WITH tmp AS (
  2  SELECT to_date('02MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  3  SELECT to_date('03MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  4  SELECT to_date('04MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  5  SELECT to_date('05MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  6  SELECT to_date('06MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  7  SELECT to_date('09MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  8  SELECT to_date('10MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
  9  SELECT to_date('11MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.02 AS value from dual  UNION ALL
 10  SELECT to_date('12MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.02 AS value from dual  UNION ALL
 11  SELECT to_date('13MAY2016','DDMONYYYY') as mydate, 158353 AS key, 0.01 AS value from dual  UNION ALL
 12  SELECT to_date('02MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
 13  SELECT to_date('03MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
 14  SELECT to_date('04MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 15  SELECT to_date('05MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 16  SELECT to_date('06MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 17  SELECT to_date('09MAY2016','DDMONYYYY') as mydate, 333333 AS key, 2.00 AS value from dual  UNION ALL
 18  SELECT to_date('10MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.85 AS value from dual  UNION ALL
 19  SELECT to_date('11MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual  UNION ALL
 20  SELECT to_date('12MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.85 AS value from dual  UNION ALL
 21  SELECT to_date('13MAY2016','DDMONYYYY') as mydate, 333333 AS key, 1.70 AS value from dual
 22  ),
 23  data as (
 24    select tmp.*,
 25           case
 26             when    lag(key,1,key-1) over (order by key, mydate) != key
 27                  or lag(value,1,value-1) over (order by key, mydate) != value
 28               then 1
 29           end flag
 30    from tmp
 31  )
 32  select key, mydate, value,
 33         'G'||to_char(count(flag) over (order by key, mydate)) grp
 34  from data
 35  ORDER BY key, mydate
 36  /
       KEY MYDATE           VALUE GRP
---------- ----------- ---------- ---
    158353 02-MAY-2016        .01 G1
    158353 03-MAY-2016        .01 G1
    158353 04-MAY-2016        .01 G1
    158353 05-MAY-2016        .01 G1
    158353 06-MAY-2016        .01 G1
    158353 09-MAY-2016        .01 G1
    158353 10-MAY-2016        .01 G1
    158353 11-MAY-2016        .02 G2
    158353 12-MAY-2016        .02 G2
    158353 13-MAY-2016        .01 G3

    333333 02-MAY-2016        1.7 G4
    333333 03-MAY-2016        1.7 G4
    333333 04-MAY-2016          2 G5
    333333 05-MAY-2016          2 G5
    333333 06-MAY-2016          2 G5
    333333 09-MAY-2016          2 G5
    333333 10-MAY-2016       1.85 G6
    333333 11-MAY-2016        1.7 G7
    333333 12-MAY-2016       1.85 G8
    333333 13-MAY-2016        1.7 G9

Previous Topic: Execute Windows Command
Next Topic: attendance sheet
Goto Forum:
  


Current Time: Wed Apr 24 05:21:23 CDT 2024