Re: Analytics question maybe - combine over lapping date rows
From: <jefferstone_at_gmail.com>
Date: Fri, 16 Nov 2012 07:28:13 -0800 (PST)
Message-ID: <2c6f080e-f553-471e-9490-6a934f0fcbba_at_googlegroups.com>
The solution by Maxim does not handle collapsing date ranges such as:
This should consolidate down to a single date range: 11-01-2012 11-15-2012, but the third and fourth ranges will be detected as an interval change and you end up with 3 ranges. After much googling and experimenting, I finally found a solution posted here http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/b5dd3d09-7299-4c5c-8e9d-5f06b93a9cbd, which I modified to handle null stop dates and tested on large sets of date ranges. Here is the code:
JOIN your_table t1
ON s1.primary_key = t1.primary_key
AND (s1.start_datetime <= t1.stop_datetime -- s1 starts before t1 ends
WHERE NOT EXISTS
(SELECT NULL
GROUP BY s1.primary_key, s1.start_datetime ORDER BY s1.primary_key, s1.start_datetime ; Received on Fri Nov 16 2012 - 16:28:13 CET
Date: Fri, 16 Nov 2012 07:28:13 -0800 (PST)
Message-ID: <2c6f080e-f553-471e-9490-6a934f0fcbba_at_googlegroups.com>
The solution by Maxim does not handle collapsing date ranges such as:
11-01-2012 11-09-2012 11-02-2012 11-03-2012 11-04-2012 11-05-2012 11-08-2012 11-15-2012
This should consolidate down to a single date range: 11-01-2012 11-15-2012, but the third and fourth ranges will be detected as an interval change and you end up with 3 ranges. After much googling and experimenting, I finally found a solution posted here http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/b5dd3d09-7299-4c5c-8e9d-5f06b93a9cbd, which I modified to handle null stop dates and tested on large sets of date ranges. Here is the code:
SELECT s1.primary_key,
s1.start_datetime,
MIN(t1.stop_datetime) AS stop_datetime
FROM your_table s1
JOIN your_table t1
ON s1.primary_key = t1.primary_key
AND (s1.start_datetime <= t1.stop_datetime -- s1 starts before t1 ends
OR t1.stop_datetime IS NULL)
AND NOT EXISTS -- filter out all but the t1's with max stop time
(SELECT NULL
FROM your_table t2
WHERE t1.primary_key = t2.primary_key
-- t1 stop time falls within t2 (in which case filter out t1)
AND t1.stop_datetime >= t2.start_datetime
AND (t1.stop_datetime < t2.stop_datetime
OR t2.stop_datetime IS NULL)
)
WHERE NOT EXISTS
(SELECT NULL
FROM your_table s2
WHERE s1.primary_key = s2.primary_key
-- s1 start time falls within s2 (in which case filter out s1)
AND s1.start_datetime > s2.start_datetime
AND (s1.start_datetime <= s2.stop_datetime
OR s2.stop_datetime IS NULL)
)
GROUP BY s1.primary_key, s1.start_datetime ORDER BY s1.primary_key, s1.start_datetime ; Received on Fri Nov 16 2012 - 16:28:13 CET
