Home » SQL & PL/SQL » SQL & PL/SQL » when "group by" isn't enough... (Oracle 11g)
when "group by" isn't enough... [message #626505] |
Tue, 28 October 2014 08:41 |
|
quirks
Messages: 82 Registered: October 2014
|
Member |
|
|
Hello,
I'd like to simplify the following table:
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 08.12.0014| X
09.12.0014| 09.12.0014| X
09.12.0014| 09.12.0014| Y
10.12.0014| 10.12.0014| X
11.12.0014| 11.12.0014| X
13.12.0014| 13.12.0014| Y
21.12.0014| 21.12.0014| X
22.12.0014| 22.12.0014| X
I want to group together those Values that have the same GROUP_VALUE and where the VALID_FROM and the VALID_TO of the following ROW is not further appart than a day.
The final result should look like this.
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 11.12.0014| X
09.12.0014| 09.12.0014| Y
13.12.0014| 13.12.0014| Y
21.12.0014| 22.12.0014| X
I've Tried a group by Statement:
SELECT
MIN(VALID_FROM) AS VALID_FROM
,MAX(VALID_TO) AS VALID_TO
,GROUP_VALUE
FROM
TEST_TABLE
GROUP BY
GROUP_VALUE;
But then those rows that have a larger gap between the current and the next one are grouped together as well.
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 22.12.0014| X
09.12.0014| 13.12.0014| Y
I'm somwhow stuck.
Do you have any suggestions?
Cheers
Quirks
Below you'll find the test table.
CREATE TABLE TEST_TABLE(
VALID_FROM DATE
,VALID_TO DATE
,GROUP_VALUE CHAR(1 BYTE)
);
INSERT INTO
TEST_TABLE
VALUES
(TO_DATE('12/10/0014', 'MM/DD/YYYY'), TO_DATE('12/10/0014', 'MM/DD/YYYY'), 'X');
INSERT INTO
TEST_TABLE
VALUES
(TO_DATE('12/09/0014', 'MM/DD/YYYY'), TO_DATE('12/09/0014', 'MM/DD/YYYY'), 'Y');
INSERT INTO
TEST_TABLE
VALUES
(TO_DATE('12/08/0014', 'MM/DD/YYYY'), TO_DATE('12/08/0014', 'MM/DD/YYYY'), 'X');
INSERT INTO
TEST_TABLE
VALUES
(TO_DATE('12/13/0014', 'MM/DD/YYYY'), TO_DATE('12/13/0014', 'MM/DD/YYYY'), 'Y');
INSERT INTO
TEST_TABLE
VALUES
(TO_DATE('12/21/0014', 'MM/DD/YYYY'), TO_DATE('12/21/0014', 'MM/DD/YYYY'), 'X');
INSERT INTO
TEST_TABLE
VALUES
(TO_DATE('12/22/0014', 'MM/DD/YYYY'), TO_DATE('12/22/0014', 'MM/DD/YYYY'), 'X');
INSERT INTO
TEST_TABLE
VALUES
(TO_DATE('12/09/0014', 'MM/DD/YYYY'), TO_DATE('12/09/0014', 'MM/DD/YYYY'), 'X');
INSERT INTO
TEST_TABLE
VALUES
(TO_DATE('12/11/0014', 'MM/DD/YYYY'), TO_DATE('12/11/0014', 'MM/DD/YYYY'), 'X');
[Updated on: Tue, 28 October 2014 08:43] Report message to a moderator
|
|
|
|
|
|
|
|
Re: when "group by" isn't enough... [message #626524 is a reply to message #626509] |
Tue, 28 October 2014 09:48 |
|
quirks
Messages: 82 Registered: October 2014
|
Member |
|
|
Michel Cadot wrote on Tue, 28 October 2014 08:52
I don't understand, can you explain each line of your result.
I'll try.
VALID_FROM and VALID_TO marks the time from when to when on specific GROUP_VALUE exists.
When a GROUP_VALUE appears on two days that follow each other, then I assume that it existet the whole time.
e.g.
These both values occour on two successive days.
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 08.12.0014| X
09.12.0014| 09.12.0014| X
So it should better be:
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 09.12.0014| X
Meaning, the Value is valid from 8.12 to 9.12.
If you look at my example, there are four values that follow in a days distance.
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 08.12.0014| X
09.12.0014| 09.12.0014| X
10.12.0014| 10.12.0014| X
11.12.0014| 11.12.0014| X
So they should be grouped together:
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 11.12.0014| X
But wait, there are further Rows with GROUP_VALUE = 'X'
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 08.12.0014| X
09.12.0014| 09.12.0014| X
10.12.0014| 10.12.0014| X
11.12.0014| 11.12.0014| X
21.12.0014| 21.12.0014| X
22.12.0014| 22.12.0014| X
Between the last two values and the first four values ia a gap of 10 days. Thats why thes last two values must be grouped together indipendently:
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 11.12.0014| X
21.12.0014| 22.12.0014| X
The both rows with GROUP_VALUE = 'Y' are single occurances of this specific value and therefore there is no need to group them together. Below you'll find which values should be grouped together by the following rule:
Group togehter values with the same GROUP_VALUE which occure within the same time frame (which is defined by the fact that they are not further appart then one single day).
VALID_FROM| VALID_TO|GROUP_VALUE Group
----------------------------------
08.12.0014| 08.12.0014| X 1
09.12.0014| 09.12.0014| X 1
09.12.0014| 09.12.0014| Y 2
10.12.0014| 10.12.0014| X 1
11.12.0014| 11.12.0014| X 1
13.12.0014| 13.12.0014| Y 3
21.12.0014| 21.12.0014| X 4
22.12.0014| 22.12.0014| X 4
So, the final result should look like this:
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 11.12.0014| X
09.12.0014| 09.12.0014| Y
13.12.0014| 13.12.0014| Y
21.12.0014| 22.12.0014| X
Please excuse my bad english. I'm not a native speaker.
[Updated on: Tue, 28 October 2014 09:51] Report message to a moderator
|
|
|
Re: when "group by" isn't enough... [message #626525 is a reply to message #626524] |
Tue, 28 October 2014 10:27 |
cookiemonster
Messages: 13920 Registered: September 2008 Location: Rainy Manchester
|
Senior Member |
|
|
Couple of obvious questions:
1) are date_from and date_to always midnight? And if not what effect does that have on the matching logic?
2) What happens when date_from and date_to for a given row aren't the same date? Can they overlap other rows?
|
|
|
Re: when "group by" isn't enough... [message #626526 is a reply to message #626525] |
Tue, 28 October 2014 10:33 |
|
quirks
Messages: 82 Registered: October 2014
|
Member |
|
|
cookiemonster wrote on Tue, 28 October 2014 10:27Couple of obvious questions:
1) are date_from and date_to always midnight? And if not what effect does that have on the matching logic?
2) What happens when date_from and date_to for a given row aren't the same date? Can they overlap other rows?
1) Yes, it's always the pure date with no time component.
2a) It is possible that the VALID_FROM and VALID_TO is not the same date, but VALID_FROM is always <= VALID_TO.
2b) Unless you ignore the GROUP_VALUE column, they can not overlap.
[Updated on: Tue, 28 October 2014 10:34] Report message to a moderator
|
|
|
Re: when "group by" isn't enough... [message #626527 is a reply to message #626526] |
Tue, 28 October 2014 10:35 |
cookiemonster
Messages: 13920 Registered: September 2008 Location: Rainy Manchester
|
Senior Member |
|
|
So you could have this:
VALID_FROM| VALID_TO|GROUP_VALUE Group
----------------------------------
08.12.0014| 21.12.0014| X 1
22.12.0014| 24.12.0014| X 1
And it would become this:
VALID_FROM| VALID_TO|GROUP_VALUE Group
----------------------------------
08.12.0014| 24.12.0014| X 1
|
|
|
|
Re: when "group by" isn't enough... [message #626529 is a reply to message #626524] |
Tue, 28 October 2014 10:42 |
|
sandeep_orafaq
Messages: 88 Registered: September 2014
|
Member |
|
|
Based on your Explanation, it would be appropriate to use analytical function. It is also possible without it. With analytical function it becomes very easy to handle these complex Scenarios:
SQL> SELECT MIN(VALID_FROM) VALID_FROM, MAX(VALID_TO) VALID_TO, GROUP_VALUE
2 FROM (SELECT VALID_FROM,
3 VALID_TO,
4 GROUP_VALUE,
5 VALID_FROM - ROW_NUMBER() OVER(PARTITION BY GROUP_VALUE ORDER BY VALID_FROM, GROUP_VALUE) AS DIFF
6 FROM TEST_TABLE)
7 GROUP BY DIFF, GROUP_VALUE
8 ORDER BY VALID_FROM
9 /
VALID_FROM VALID_TO GROUP_VALUE
----------- ----------- -----------
12/8/0014 12/11/0014 X
12/9/0014 12/9/0014 Y
12/13/0014 12/13/0014 Y
12/21/0014 12/22/0014 X
You can change the date format based on required output. my session has different Settings.
[Updated on: Tue, 28 October 2014 10:44] Report message to a moderator
|
|
|
|
|
Re: when "group by" isn't enough... [message #626532 is a reply to message #626531] |
Tue, 28 October 2014 11:00 |
|
quirks
Messages: 82 Registered: October 2014
|
Member |
|
|
[quote title=EdStevens wrote on Tue, 28 October 2014 10:52]quirks wrote on Tue, 28 October 2014 10:33cookiemonster wrote on Tue, 28 October 2014 10:27
1) Yes, it's always the pure date with no time component.
So you say, but it is VERY easy to simply "know" this without actually proving it. ANd when one does attempt to prove it .. whoa! where did those times come from?
select to_char(valid_from,'dd-mon-yyyy hh24:mi:ss'),
to_char(valid_to,'dd-mon-yyyy hh24:mi:ss')
from test_table;
See anything surprising?
Yes, I know. But in my case the time component is 0.
|
|
|
|
Re: when "group by" isn't enough... [message #626534 is a reply to message #626533] |
Tue, 28 October 2014 11:23 |
|
quirks
Messages: 82 Registered: October 2014
|
Member |
|
|
sandeep_orafaq wrote on Tue, 28 October 2014 11:10VALID_FROM != VALID_TO . This sceanrio will be handled differently. the logic can be adjusted based on what we want to achieve.
You hit the point, as well as cookiemonster did in the first place. The statment of sandeep_orafaq only works if VALID_FROM is the same then VALID_TO.
How would one cope with rows where there is a range of time between VALID_FROM and VALID_TO like this:
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 08.12.0014| X
09.12.0014| 10.12.0014| X
11.12.0014| 11.12.0014| X
The final result should look like this:
VALID_FROM| VALID_TO|GROUP_VALUE
----------------------------------
08.12.0014| 11.12.0014| X
Here are the test data.
TRUNCATE TABLE TEST_TABLE;
INSERT INTO
TEST_TABLE
VALUES
(TO_DATE('12/08/0014', 'MM/DD/YYYY'), TO_DATE('12/08/0014', 'MM/DD/YYYY'), 'X');
INSERT INTO
TEST_TABLE
VALUES
(TO_DATE('12/09/0014', 'MM/DD/YYYY'), TO_DATE('12/10/0014', 'MM/DD/YYYY'), 'X');
INSERT INTO
TEST_TABLE
VALUES
(TO_DATE('12/11/0014', 'MM/DD/YYYY'), TO_DATE('12/11/0014', 'MM/DD/YYYY'), 'X');
COMMIT;
|
|
|
|
Re: when "group by" isn't enough... [message #626567 is a reply to message #626534] |
Wed, 29 October 2014 04:09 |
|
sandeep_orafaq
Messages: 88 Registered: September 2014
|
Member |
|
|
As I mentioned we can use a seperate logic to achieve the results we want. In case of varying VALID_FROM and VALID_TO. We need to find out the continuous time intervals using additional logic. But again the analytical functions will be best to use.
For example. for the below mentioned data, we can modify our query to achieve the desired output.
SQL> delete from TEST_TABLE;
8 rows deleted
SQL>
SQL> insert into TEST_TABLE (valid_from, valid_to, group_value)
2 values (to_date('14-12-2014', 'dd-mm-yyyy'), to_date('15-12-2014', 'dd-mm-yyyy'), 'X');
1 row inserted
SQL> insert into TEST_TABLE (valid_from, valid_to, group_value)
2 values (to_date('09-12-2014', 'dd-mm-yyyy'), to_date('10-12-2014', 'dd-mm-yyyy'), 'Y');
1 row inserted
SQL> insert into TEST_TABLE (valid_from, valid_to, group_value)
2 values (to_date('08-12-2014', 'dd-mm-yyyy'), to_date('09-12-2014', 'dd-mm-yyyy'), 'X');
1 row inserted
SQL> insert into TEST_TABLE (valid_from, valid_to, group_value)
2 values (to_date('13-12-2014', 'dd-mm-yyyy'), to_date('18-12-2014', 'dd-mm-yyyy'), 'Y');
1 row inserted
SQL> insert into TEST_TABLE (valid_from, valid_to, group_value)
2 values (to_date('19-12-2014', 'dd-mm-yyyy'), to_date('25-12-2014', 'dd-mm-yyyy'), 'X');
1 row inserted
SQL> insert into TEST_TABLE (valid_from, valid_to, group_value)
2 values (to_date('25-12-2014', 'dd-mm-yyyy'), to_date('26-12-2014', 'dd-mm-yyyy'), 'X');
1 row inserted
SQL> insert into TEST_TABLE (valid_from, valid_to, group_value)
2 values (to_date('09-12-2014', 'dd-mm-yyyy'), to_date('14-12-2014', 'dd-mm-yyyy'), 'X');
1 row inserted
SQL> insert into TEST_TABLE (valid_from, valid_to, group_value)
2 values (to_date('16-12-2014', 'dd-mm-yyyy'), to_date('17-12-2014', 'dd-mm-yyyy'), 'X');
1 row inserted
SQL> commit
2 /
Commit complete
SQL> select * from TEST_TABLE t ORDER BY group_value,valid_from;
VALID_FROM VALID_TO GROUP_VALUE
----------- ----------- -----------
12/8/2014 12/9/2014 X
12/9/2014 12/14/2014 X
12/14/2014 12/15/2014 X
12/16/2014 12/17/2014 X
12/19/2014 12/25/2014 X
12/25/2014 12/26/2014 X
12/9/2014 12/10/2014 Y
12/11/2014 12/18/2014 Y
8 rows selected
SQL>
SQL> SELECT GROUP_VALUE, MIN(VALID_FROM) VALID_FROM, MAX(VALID_TO) VALID_TO
2 FROM (SELECT GROUP_VALUE,
3 VALID_FROM,
4 VALID_TO,
5 MAX(FLAG1) OVER(PARTITION BY GROUP_VALUE ORDER BY VALID_TO) FLAG2
6 FROM ( SELECT GROUP_VALUE,
7 VALID_FROM,
8 VALID_TO,
9 CASE WHEN (
10 (
11 LAG(VALID_TO) OVER(PARTITION BY GROUP_VALUE ORDER BY VALID_TO) != (VALID_FROM)
12 AND LAG(VALID_TO) OVER(PARTITION BY GROUP_VALUE ORDER BY VALID_TO) != (VALID_FROM-1)--- this will ensure if there is just 1 day diff between VALID_TO of previous row and VALID_FROM of current row
13 )
14 OR ROW_NUMBER() OVER(PARTITION BY GROUP_VALUE ORDER BY VALID_TO) = 1)
15 THEN
16 ROW_NUMBER() OVER(PARTITION BY GROUP_VALUE ORDER BY VALID_TO)
17 ELSE
18 NULL
19 END FLAG1
20 FROM TEST_TABLE
21 )
22 )
23 GROUP BY GROUP_VALUE, FLAG2
24 ORDER BY VALID_FROM
25 /
GROUP_VALUE VALID_FROM VALID_TO
----------- ----------- -----------
X 12/8/2014 12/17/2014
Y 12/9/2014 12/10/2014
Y 12/13/2014 12/18/2014
X 12/19/2014 12/26/2014
[Updated on: Wed, 29 October 2014 04:37] Report message to a moderator
|
|
|
Re: when "group by" isn't enough... [message #626570 is a reply to message #626567] |
Wed, 29 October 2014 05:12 |
|
quirks
Messages: 82 Registered: October 2014
|
Member |
|
|
First and foremost I want to thank you all for your help. It realy is. I feel that I've learned a little bit.
I'm still pondering about the last solution. I figured out how the innermost statement works, and it's obvious how the outermost one does. But the middle one gives me head aches. Why does: MAX(FLAG1) OVER (PARTITION BY GROUP_VALUE ORDER BY VALID_TO) FLAG2 doesn't flag each row where GROUP_VALUE = 'X' with 4?
[Updated on: Wed, 29 October 2014 05:13] Report message to a moderator
|
|
|
|
Re: when "group by" isn't enough... [message #626573 is a reply to message #626570] |
Wed, 29 October 2014 05:45 |
cookiemonster
Messages: 13920 Registered: September 2008 Location: Rainy Manchester
|
Senior Member |
|
|
Well first off, with that data flag1 is never 4, for X it's 1 and 5.
So if we run the middle bit, with extra columns:
SQL> SELECT GROUP_VALUE,
2 VALID_FROM,
3 VALID_TO,
4 MAX(FLAG1) OVER(PARTITION BY GROUP_VALUE ORDER BY VALID_TO) FLAG2,
5 flag1
6 FROM ( SELECT GROUP_VALUE,
7 VALID_FROM,
8 VALID_TO,
9 CASE WHEN (
10 (
11 LAG(VALID_TO) OVER(PARTITION BY GROUP_VALUE ORDER BY VALID_TO) != (VALID_FROM)
12 AND LAG(VALID_TO) OVER(PARTITION BY GROUP_VALUE ORDER BY VALID_TO) != (VALID_FROM-1)
13 --- this will ensure if there is just 1 day diff between VALID_TO of previous row and VALID_FROM of current row
14 )
15 OR ROW_NUMBER() OVER(PARTITION BY GROUP_VALUE ORDER BY VALID_TO) = 1)
16 THEN ROW_NUMBER() OVER(PARTITION BY GROUP_VALUE ORDER BY VALID_TO)
17 ELSE
18 NULL
19 END FLAG1
20 FROM TEST_TABLE
21 );
GROUP_VALUE VALID_FROM VALID_TO FLAG2 FLAG1
----------- ----------- ----------- ---------- ----------
X 08-Dec-2014 09-Dec-2014 1 1
X 09-Dec-2014 14-Dec-2014 1
X 14-Dec-2014 15-Dec-2014 1
X 16-Dec-2014 17-Dec-2014 1
X 19-Dec-2014 25-Dec-2014 5 5
X 25-Dec-2014 26-Dec-2014 5
Y 09-Dec-2014 10-Dec-2014 1 1
Y 13-Dec-2014 18-Dec-2014 2 2
8 rows selected
SQL>
So flag1 is set for the first row for a given group_value any subsequent row where there's a gap between that row and the previous one.
Flag2 fills in the gaps so that set of rows can be treated as a single set.
It works because of the windowing clause. The windowing clause isn't included in that query, so the default one is used, which is: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
So for each row in that result set it looks at all the rows from the first to the current row and sees what the max value of flag1 is. To work out which rows are between the first and the current you look at the order by clause - which is valid_to.
So for the first 4 records the only value of flag1 that can be considered for the max are 1 and null. So you get 1.
For rows 5 and 6 the values of flag1 that can be considered for the max are 1,5 and null. So you get 5.
Since flag1 comes from row_number it will always increment with a new set of ranges for a given group_value.
|
|
|
|
|
Goto Forum:
Current Time: Tue Apr 23 19:49:08 CDT 2024
|