Re: SQL competition: the shortest interval coalesce/pack query

From: John Gilson <jag_at_acm.org>
Date: Mon, 06 Dec 2004 06:33:34 GMT
Message-ID: <23Tsd.24032$ld2.8890392_at_twister.nyc.rr.com>


"Lennart Jonsson" <lennart_at_kommunicera.umea.se> wrote in message news:6dae7e65.0412040742.572da937_at_posting.google.com...

> "John Gilson" <jag_at_acm.org> wrote in message news:<ujesd.60193$Vk6.30697_at_twister.nyc.rr.com>...

> > "Lennart Jonsson" <lennart_at_kommunicera.umea.se> wrote in message
> > news:6dae7e65.0412031821.4df452c6_at_posting.google.com...
> > > lennart_at_kommunicera.umea.se (Lennart Jonsson) wrote in message
> > news:<6dae7e65.0412031228.3051dff_at_posting.google.com>...
> > > > "Mikito Harakiri" <mikharakiri_at_iahu.com> wrote in message
> > news:<2uPrd.61$G45.36_at_news.oracle.com>...
> > >
> > > [...]
> > > >
>
> [...]
> >

> > Lennart, as you've noted, your query does exhibit problems. For example,
> > starting with the sample interval set, if you remove (20,21) so that you no
> > longer have a gap, that is,
> > INSERT INTO Intervals (x,y)
> > VALUES (1,3),(2,5),(4,11),(10,12);
> > then the query returns
> > 1 11
> > 2 12
> > Also, if we add an interval to the interval set that's contiguous with say
> > (20,21), e.g.,
> > INSERT INTO Intervals (x,y)
> > VALUES (1,3),(2,5),(4,11),(10,12),(20,21),(21,25);
> > then the query returns
> > 1 12
> > 20 21
> > 21 25
> >

> > However, using recursion is an interesting take on this so here's an
> > offering:
> >

> > CREATE TABLE Intervals
> > (
> > x INT NOT NULL,
> > y INT NOT NULL,
> > CHECK (y > x),
> > PRIMARY KEY (x,y)
> > );
> >

> > WITH Cover (x,y,n)
> > AS (SELECT x,y,
> > (SELECT COUNT(*) FROM Intervals)
> > FROM Intervals
> > UNION ALL
> > SELECT CASE WHEN C.x <= I.x THEN C.x ELSE I.x END,
> > CASE WHEN C.y >= I.y THEN C.y ELSE I.y END,
> > C.n - 1
> > FROM Intervals AS I, Cover AS C
> > WHERE I.x <= C.y AND
> > I.y >= C.x AND
> > (I.x <> C.x OR I.y <> C.y) AND
> > C.n > 1)
> > SELECT DISTINCT C1.x, C1.y
> > FROM Cover AS C1
> > WHERE NOT EXISTS (SELECT *
> > FROM Cover AS C2
> > WHERE C2.x <= C1.x AND
> > C2.y >= C1.y AND
> > (C1.x <> C2.x OR C1.y <> C2.y))
> > ORDER BY C1.x;
>
> Hi John. Very interesting and I will have a look at it later.
>
> In fact I hadnt started worrying about the correctness of my attempt,
> but merely the performance. Because of the nature of the problem the
> cover set grows very fast. I took the liberty to take you code for a
> run
>
> select * from intervals
>
> X           Y
> ----------- -----------
>           1           3
>           2           5
>           4          11
>          10          12
>          20          21
>           4          16
>           5          17
>           6          18
>
> WITH Cover (x,y,n)
> AS (SELECT x,y,
>                       (SELECT COUNT(*) FROM Intervals)
>        FROM Intervals
>        UNION ALL
>        SELECT CASE WHEN C.x <= I.x THEN C.x ELSE I.x END,
>                       CASE WHEN C.y >= I.y THEN C.y ELSE I.y END,
>                       C.n - 1
>        FROM Intervals AS I, Cover AS C
>        WHERE I.x <= C.y AND
>                       I.y >= C.x AND
>                       (I.x <> C.x OR I.y <> C.y) AND
>                       C.n > 1)
> SELECT count(C1.x)
> FROM Cover AS C1;
>
> 1
> -----------
> SQL0347W  The recursive common table expression "LELLE.COVER" may
> contain an
> infinite loop.  SQLSTATE=01605
>
>     2379720
>
>
> But your query terminates very quickly so perhaps the optimizer is
> smart enough to push the predicates inside the cover clause?
>
>
> /Lennart

Hi Lennart, I'm glad you did bother to take the code out for a spin. Though the query returns the correct result with the data you provided, the row count in the unrestricted Cover table is several orders of magnitude greater than expected! On further inspection, one can be more discriminating in defining both the base and recursive cases for Cover:

CREATE TABLE Intervals
(
x INT NOT NULL,
y INT NOT NULL,
CHECK (y > x),
PRIMARY KEY (x,y)
);

INSERT INTO Intervals (x, y)
VALUES (1,3),(2,5),(4,11),(10,12),(20,21),(4,16),(5,17),(6,18); WITH IntervalTally (n) AS (SELECT COUNT(*) FROM Intervals),

           Cover (x,y,n) AS
           (SELECT I1.x,I1.y,
                           T.n
            FROM Intervals AS I1, IntervalTally AS T
            WHERE NOT EXISTS (SELECT *
                                                   FROM Intervals AS I2
                                                   WHERE I2.x <= I1.x AND
                                                                  I2.y >= I1.y AND
                                                                  (I2.x <> I1.x OR I2.y <> I1.y))
            UNION ALL
            SELECT CASE WHEN C.x <= I.x THEN C.x ELSE I.x END,
                           CASE WHEN C.y >= I.y THEN C.y ELSE I.y END,
                           C.n - 1
            FROM Intervals AS I, Cover AS C
            WHERE I.x <= C.y AND
                           I.y >= C.x AND
                           (I.x < C.x OR I.y > C.y) AND
                           C.n > 1)

SELECT DISTINCT C1.x, C1.y
FROM Cover AS C1
WHERE NOT EXISTS (SELECT *
                                        FROM Cover AS C2
                                        WHERE C2.x <= C1.x AND
                                                       C2.y >= C1.y AND
                                                       (C1.x <> C2.x OR C1.y <> C2.y))
ORDER BY C1.x;

1 18
20 21

Now let's take a look at the row tally in Cover:

WITH IntervalTally (n) AS (SELECT COUNT(*) FROM Intervals),

           Cover (x,y,n) AS
           (SELECT I1.x,I1.y,
                           T.n
            FROM Intervals AS I1, IntervalTally AS T
            WHERE NOT EXISTS (SELECT *
                                                   FROM Intervals AS I2
                                                   WHERE I2.x <= I1.x AND
                                                                  I2.y >= I1.y AND
                                                                  (I2.x <> I1.x OR I2.y <> I1.y))
            UNION ALL
            SELECT CASE WHEN C.x <= I.x THEN C.x ELSE I.x END,
                           CASE WHEN C.y >= I.y THEN C.y ELSE I.y END,
                           C.n - 1
            FROM Intervals AS I, Cover AS C
            WHERE I.x <= C.y AND
                           I.y >= C.x AND
                           (I.x < C.x OR I.y > C.y) AND
                           C.n > 1)

SELECT COUNT(*)
FROM Cover;

204

Now that seems more reasonable.

--
JAG
Received on Mon Dec 06 2004 - 07:33:34 CET

Original text of this message