Re: Analytics question maybe - combine over lapping date rows
From: <jefferstone_at_gmail.com>
Date: Fri, 16 Nov 2012 07:28:13 -0800 (PST)
Message-ID: <2c6f080e-f553-471e-9490-6a934f0fcbba_at_googlegroups.com>
The solution by Maxim does not handle collapsing date ranges such as:
This should consolidate down to a single date range: 11-01-2012 11-15-2012, but the third and fourth ranges will be detected as an interval change and you end up with 3 ranges. After much googling and experimenting, I finally found a solution posted here http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/b5dd3d09-7299-4c5c-8e9d-5f06b93a9cbd, which I modified to handle null stop dates and tested on large sets of date ranges. Here is the code:
JOIN your_table t1
ON s1.primary_key = t1.primary_key
AND (s1.start_datetime <= t1.stop_datetime -- s1 starts before t1 ends
WHERE NOT EXISTS
(SELECT NULL
GROUP BY s1.primary_key, s1.start_datetime ORDER BY s1.primary_key, s1.start_datetime ; Received on Fri Nov 16 2012 - 16:28:13 CET
Date: Fri, 16 Nov 2012 07:28:13 -0800 (PST)
Message-ID: <2c6f080e-f553-471e-9490-6a934f0fcbba_at_googlegroups.com>
The solution by Maxim does not handle collapsing date ranges such as:
11-01-2012 11-09-2012 11-02-2012 11-03-2012 11-04-2012 11-05-2012 11-08-2012 11-15-2012
This should consolidate down to a single date range: 11-01-2012 11-15-2012, but the third and fourth ranges will be detected as an interval change and you end up with 3 ranges. After much googling and experimenting, I finally found a solution posted here http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/b5dd3d09-7299-4c5c-8e9d-5f06b93a9cbd, which I modified to handle null stop dates and tested on large sets of date ranges. Here is the code:
SELECT s1.primary_key,
s1.start_datetime, MIN(t1.stop_datetime) AS stop_datetimeFROM your_table s1
JOIN your_table t1
ON s1.primary_key = t1.primary_key
AND (s1.start_datetime <= t1.stop_datetime -- s1 starts before t1 ends
OR t1.stop_datetime IS NULL) AND NOT EXISTS -- filter out all but the t1's with max stop time)
(SELECT NULL
FROM your_table t2 WHERE t1.primary_key = t2.primary_key -- t1 stop time falls within t2 (in which case filter out t1) AND t1.stop_datetime >= t2.start_datetime AND (t1.stop_datetime < t2.stop_datetime OR t2.stop_datetime IS NULL)
WHERE NOT EXISTS
(SELECT NULL
FROM your_table s2 WHERE s1.primary_key = s2.primary_key -- s1 start time falls within s2 (in which case filter out s1) AND s1.start_datetime > s2.start_datetime AND (s1.start_datetime <= s2.stop_datetime OR s2.stop_datetime IS NULL))
GROUP BY s1.primary_key, s1.start_datetime ORDER BY s1.primary_key, s1.start_datetime ; Received on Fri Nov 16 2012 - 16:28:13 CET