Re: Access Paths: Will these produce equivalent results?

From: John Gilson <jag_at_acm.org>
Date: Fri, 09 May 2003 22:24:04 GMT
Message-ID: <8UVua.15613$h42.5565_at_twister.nyc.rr.com>


"techy lname" <techy789_at_yahoo.com> wrote in message news:3b990a48.0305091041.157ec3e8_at_posting.google.com...
> We are migrating an MS SQLServer database to Oracle, and have come
> across a query that runs significantly faster on SQLServer than
> Oracle. It is an MS Access generated sql that looks like:
>
> select colA, colB, sum(colC), sum(colD)
> from tab1
> where colE = 'somelit'
> group by colA, colB
> having colA = 'somelit' and colB = 'somelit'
>
> When comparing access plans between Oracle and SQLServer, Oracle
> first performs the roll up, then filters the results. However,
> SQLServer applies the "having " predicate to tab1 *first*, then
> performs the summations. In fact for SQLServer, the access path for
> the above query is identical for this one:
>
> select colA, colB, sum(colC), sum(colD)
> from tab1
> where colE = 'somelit' and colA = 'somelit' and colB = 'somelit'
> group by colA, colB
>
> My question is, is this correct behavior? Is SQLServer guaranteed to
> produce the same results, no matter what's in the tables?

This is a correct transformation. SQL is (in)famous for providing multiple ways to skin a proverbial cat. Let's write the original query equivalently without using GROUP BY and HAVING as

SELECT colA,

               colB,
               (SELECT SUM(colC)
                FROM tab1
                WHERE colA = D.colA AND
                               colB = D.colB) AS total_colC,
               (SELECT SUM(colD)
                FROM tab1
                WHERE colA = D.colA AND
                               colB = D.colB) AS total_colD
FROM (SELECT DISTINCT colA, colB
             FROM tab1
             WHERE colE = 'somelit') AS D -- GROUP BY
WHERE colA = 'somelit' AND colB = 'somelit' -- HAVING

The WHERE clause

WHERE colA = 'somelit' AND colB = 'somelit'

will restrict D to at most 1 row. If we place this restriction within the derived table D, since A AND (B AND C) is equivalent to A AND B AND C, we'll achieve the same effect, i.e.,

SELECT colA,

               colB,
               (SELECT SUM(colC)
                FROM tab1
                WHERE colA = D.colA AND
                               colB = D.colB) AS total_colC,
               (SELECT SUM(colD)
                FROM tab1
                WHERE colA = D.colA AND
                               colB = D.colB) AS total_colD
FROM (SELECT DISTINCT colA, colB
             FROM tab1
             WHERE colE = 'somelit' AND
                            colA = 'somelit' AND colB = 'somelit') AS D -- GROUP BY

If we now use the reverse of our initial transformation on this query so that we employ GROUP BY we get

SELECT colA,

               colB,
               SUM(colC) AS total_colC,
               SUM(colD) AS total_colD

FROM tab1
WHERE colE = 'somelit' AND

               colA = 'somelit' AND colB = 'somelit' GROUP BY colA, colB

which is the other query given.

This transformation should improve performance as restriction will be to rows and not groups so unnecessary groups won't be formed to only be discarded by the HAVING clause.

Hope this helps.

Regards,
jag Received on Sat May 10 2003 - 00:24:04 CEST

Original text of this message