Re: SQL competition: the shortest interval coalesce/pack query

From: John Gilson <jag_at_acm.org>
Date: Fri, 03 Dec 2004 16:04:14 GMT
Message-ID: <280sd.51091$Vk6.17422_at_twister.nyc.rr.com>


"Mikito Harakiri" <mikharakiri_at_iahu.com> wrote in message news:2uPrd.61$G45.36_at_news.oracle.com...
> Write the shortest sql query that would return a minimal cover of a set of
> intervals. For example, given
>
> Intervals = {(x=1,y=3), (x=2,y=5), (x=4, y=11), (x=10,y=12), (x=20,y=21)}
>
> it should return
>
> cover(Intervals) = {(x=1,y=12), (x=20,y=21)}
>
> "Developing Time-Oriented..." book by R.Snordgrass demonstrates several
> rather lenthy solutions (pp.160+). The book is freely downloadable.
>
> To be able to compare results, lets agree that each of the clauses (select,
> from, where, group by, having) starts a new line. (Each subquery/inner view
> must obey this rule as well.) Also, no more than 2 column expressions are
> allowed to fit on a single line in the select clause. Likewise, no more than
> 2 view/tables are allowed on a single line within from clause, no more than
> 2 predicates in the each line of the where clause, etc. If anybody knows
> alternative (maybe even standard) sql formatting rules, you are welcome to
> suggest those.

CREATE TABLE Intervals
(
begin_interval INT NOT NULL,
end_interval INT NOT NULL,
CHECK (end_interval > begin_interval),
PRIMARY KEY (begin_interval, end_interval) );

INSERT INTO Intervals (begin_interval, end_interval) VALUES (1,3);
INSERT INTO Intervals (begin_interval, end_interval) VALUES (2,5);
INSERT INTO Intervals (begin_interval, end_interval) VALUES (4,11);
INSERT INTO Intervals (begin_interval, end_interval) VALUES (10,12);
INSERT INTO Intervals (begin_interval, end_interval) VALUES (20,21); WITH Gaps (begin_gap, end_gap)

           AS (SELECT MAX(I1.end_interval), I2.begin_interval
                  FROM Intervals AS I1
                              INNER JOIN
                              Intervals AS I2
                              ON I1.begin_interval < I2.begin_interval
                  GROUP BY I2.begin_interval
                  HAVING MAX(I1.end_interval) < I2.begin_interval)
   SELECT MIN(I.begin_interval) AS begin_interval,
                  COALESCE(I.begin_gap, MAX(I.end_interval)) AS end_interval
   FROM (SELECT I.begin_interval,
                                I.end_interval,
                               MIN(G.begin_gap) AS begin_gap
                FROM Intervals AS I
                            LEFT OUTER JOIN
                            Gaps AS G
                            ON I.end_interval <= G.begin_gap
                GROUP BY I.begin_interval, I.end_interval) AS I
   GROUP BY I.begin_gap
   ORDER BY begin_interval;

Executing on DB2:

begin_interval end_interval

 1                     12
 20                   21

The idea is to first find any gaps in the interval set. That is, if we have intervals (a1, a2) and (b1, b2) then (a2, b1) is a gap if there doesn't exist an interval (c1, c2) where c1 < b1 and c2 > a2. We than associate each interval with its closest succeeding gap. So each gap will be joined with all intervals that immediately precede it with no intervening gaps. Of course, an interval may have no succeeding gap so we associate those intervals with NULL (hence the LEFT OUTER JOIN). Then for each gap, including the special NULL gap, we find the minimum and maximum endpoints for the set of associated intervals.

For those who don't use a flavor of SQL that implements the WITH clause from Standard SQL, either define a view or simply expand the body of Gaps in the main query. It's only referenced once and was defined this way for readability.

--
JAG
Received on Fri Dec 03 2004 - 17:04:14 CET

Original text of this message