# Re: SQL competition: the shortest interval coalesce/pack query

Date: Fri, 03 Dec 2004 16:04:14 GMT

Message-ID: <280sd.51091$Vk6.17422_at_twister.nyc.rr.com>

"Mikito Harakiri" <mikharakiri_at_iahu.com> wrote in message news:2uPrd.61$G45.36_at_news.oracle.com...

> Write the shortest sql query that would return a minimal cover of a set of

*> intervals. For example, given
**>
**> Intervals = {(x=1,y=3), (x=2,y=5), (x=4, y=11), (x=10,y=12), (x=20,y=21)}
**>
**> it should return
**>
**> cover(Intervals) = {(x=1,y=12), (x=20,y=21)}
**>
**> "Developing Time-Oriented..." book by R.Snordgrass demonstrates several
**> rather lenthy solutions (pp.160+). The book is freely downloadable.
**>
**> To be able to compare results, lets agree that each of the clauses (select,
**> from, where, group by, having) starts a new line. (Each subquery/inner view
**> must obey this rule as well.) Also, no more than 2 column expressions are
**> allowed to fit on a single line in the select clause. Likewise, no more than
**> 2 view/tables are allowed on a single line within from clause, no more than
**> 2 predicates in the each line of the where clause, etc. If anybody knows
**> alternative (maybe even standard) sql formatting rules, you are welcome to
**> suggest those.
*

CREATE TABLE Intervals

(

begin_interval INT NOT NULL,

end_interval INT NOT NULL,

CHECK (end_interval > begin_interval),

PRIMARY KEY (begin_interval, end_interval)
);

INSERT INTO Intervals (begin_interval, end_interval)
**VALUES (1,3);
**

INSERT INTO Intervals (begin_interval, end_interval)
**VALUES (2,5);
**

INSERT INTO Intervals (begin_interval, end_interval)
**VALUES (4,11);
**

INSERT INTO Intervals (begin_interval, end_interval)
**VALUES (10,12);
**

INSERT INTO Intervals (begin_interval, end_interval)
**VALUES (20,21);
**
WITH Gaps (begin_gap, end_gap)

AS (SELECT MAX(I1.end_interval), I2.begin_interval FROM Intervals AS I1 INNER JOIN Intervals AS I2 ON I1.begin_interval < I2.begin_interval GROUP BY I2.begin_interval HAVING MAX(I1.end_interval) < I2.begin_interval) SELECT MIN(I.begin_interval) AS begin_interval, COALESCE(I.begin_gap, MAX(I.end_interval)) AS end_interval FROM (SELECT I.begin_interval, I.end_interval, MIN(G.begin_gap) AS begin_gap FROM Intervals AS I LEFT OUTER JOIN Gaps AS G ON I.end_interval <= G.begin_gap GROUP BY I.begin_interval, I.end_interval) AS IGROUP BY I.begin_gap

ORDER BY begin_interval;

Executing on DB2:

begin_interval end_interval

1 12 20 21

The idea is to first find any gaps in the interval set. That is, if we have intervals (a1, a2) and (b1, b2) then (a2, b1) is a gap if there doesn't exist an interval (c1, c2) where c1 < b1 and c2 > a2. We than associate each interval with its closest succeeding gap. So each gap will be joined with all intervals that immediately precede it with no intervening gaps. Of course, an interval may have no succeeding gap so we associate those intervals with NULL (hence the LEFT OUTER JOIN). Then for each gap, including the special NULL gap, we find the minimum and maximum endpoints for the set of associated intervals.

For those who don't use a flavor of SQL that implements the WITH clause from Standard SQL, either define a view or simply expand the body of Gaps in the main query. It's only referenced once and was defined this way for readability.

-- JAGReceived on Fri Dec 03 2004 - 17:04:14 CET