Re: SQL competition: the shortest interval coalesce/pack query
Date: Fri, 03 Dec 2004 09:45:34 +0100
Message-ID: <0a90r0tudtmkmm0edh1p9u8h5qurr7qbi5_at_4ax.com>
On Thu, 2 Dec 2004 17:28:40 -0800, Mikito Harakiri wrote:
>Write the shortest sql query that would return a minimal cover of a set of
>intervals. For example, given
>
>Intervals = {(x=1,y=3), (x=2,y=5), (x=4, y=11), (x=10,y=12), (x=20,y=21)}
>
>it should return
>
>cover(Intervals) = {(x=1,y=12), (x=20,y=21)}
>
>"Developing Time-Oriented..." book by R.Snordgrass demonstrates several
>rather lenthy solutions (pp.160+). The book is freely downloadable.
>
>To be able to compare results, lets agree that each of the clauses (select,
>from, where, group by, having) starts a new line. (Each subquery/inner view
>must obey this rule as well.) Also, no more than 2 column expressions are
>allowed to fit on a single line in the select clause. Likewise, no more than
>2 view/tables are allowed on a single line within from clause, no more than
>2 predicates in the each line of the where clause, etc. If anybody knows
>alternative (maybe even standard) sql formatting rules, you are welcome to
>suggest those.
>
Hi Mikito,
I didn't check out the book you mention, so I don't know if my solutions are shorter. They are tested on SQL Server 2000. I tried not to use any proprietary code, so it should be easily ported to other platforms. I used my own formatting preferences, reformat as you see fit for either your personal style or for comparisons.
Here's a query that uses a self-join and three nested correlated subqueries:
SELECT a.x, MAX(b.y) AS y FROM Intervals AS a INNER JOIN Intervals AS b ON b.y > a.x
WHERE NOT EXISTS
(SELECT *
FROM Intervals AS c WHERE a.x - 1 BETWEEN c.x AND c.y)AND NOT EXISTS
(SELECT *
FROM Intervals AS d WHERE d.y > a.x AND d.y < b.y
AND NOT EXISTS
(SELECT *
FROM Intervals AS e
WHERE d.y + 1 BETWEEN e.x AND e.y)) GROUP BY a.x
And this is essentially the same format, but converted to use left anti semi joins instead of subqueries. I don't think it's shorter, but it might execute better on some platforms and some people prefer this format over subqueries. Your call.
SELECT a.x, MAX(b.y) AS y FROM Intervals AS a INNER JOIN Intervals AS b ON b.y > a.x LEFT JOIN Intervals AS c ON a.x - 1 BETWEEN c.x AND c.y
LEFT JOIN (Intervals AS d
LEFT JOIN Intervals AS e
ON d.y + 1 BETWEEN e.x AND e.y) ON d.y > a.x AND d.y < b.y AND e.x IS NULL WHERE c.x IS NULL AND d.x IS NULL GROUP BY a.x
If the table is large, the correlated subqueries (version 1) or the quintuple self-join (version 2) will probably make it slow. But you asked for a short query, not for a quick one :-)
It might be worthwile to investigate if a numbers table can be used to construct a shorter or faster solution, but I decided to make my solution as vanilla as possible, using only the Intervals table and nothing else.
BTW, this is the SQL I used to create table and test data:
CREATE TABLE Intervals
(x int NOT NULL
,y int NOT NULL
,PRIMARY KEY (x, y)
)
go
INSERT INTO Intervals (x, y)
SELECT 1, 3 UNION ALL SELECT 2, 5 UNION ALL SELECT 4, 11 UNION ALL
SELECT 10, 12 UNION ALL
SELECT 20, 21
go
Best, Hugo
-- (Remove _NO_ and _SPAM_ to get my e-mail address)Received on Fri Dec 03 2004 - 09:45:34 CET