Re: Why Correlated Query vs. plain old Query

From: L Carl Pedersen <l.carl.pedersen_at_dartmouth.edu>
Date: Fri, 17 Feb 1995 13:39:24 -0500
Message-ID: <l.carl.pedersen-1702951339240001_at_kip-1-sn-25.dartmouth.edu>


In article <3i0ogo$pe1_at_hacgate2.hac.com>, Ed Bruce <edward_at_igate1.hac.com> wrote:

>Can anyone explain to me the following SQL I got to work
>vs. the SQL I didn't.
>
>As a preface, I have two tables, A and B. Table A contains
>a foreign key to Table B's primary key, let's call it:
>A.B_pk. The question I wanted to answer is: What rows in
>table B are not referenced in table A. My approach to
>answering this was to ask the non-negative question first.
>What rows in table B are referenced in table A. Which I
>answered with the following SQL:
>
>select B.pk from B where B.pk in ( select A.B_pk from A );
>
>I then blindly used the negative of this to produce the
>following:
>
>select B.pk from B where B.pk not in ( select A.B_pk from
>A );
>
>Which when executed returned no rows. The SQL that finally
>worked was:
>
>select B.pk from B b1 where B.pk not in
> ( select A.B_pk from A where b1.pk = A.B_pk );
>
>What I don't understand is why the second select doesn't
>work and why I had to use the third select. Any
>enlightenment is much appreciated. I can not swear that
>the second SQL would work on other RDBMS, but I do believe
>I have structured queries like this on VAX RDB, Ingres,
>and Sybase. Though it has been a few years since I last
>used Ingres or Sybase. This is with Oracle 6.0 running
>under SGI IRIX 4.0.1, Version 6.2.0.
>
>Ed Bruce

first of all, in my opinion, the responses from Rohrer and Blackburn are not correct - at least not completely.

>select B.pk from B where B.pk not in ( select A.B_pk from
>A );

the only reason this would return no rows is if one of the vaues for b_pk in A is null. since the comparison b.pk != a.pk will be false for a.pk null, the query returns no rows. it doesn't "know" that b.pk is not in a, so it says it isn't. aside from that problem, your query is logically correct and will work as long as there are no nulls.

by using the correlated subquery you are causing the subquery to be empty for those rows that do not match, "not in" the empty set is always true, so it works.

as for performance, "not exists" is not ALWAYS faster, correlated subqueries are often SLOWER - depending on the situation, and non-correlated subqueries are usually optimized into joins. they do not always cause full table scans. many of the conventional rules of thumb are dangerous if they are always believed.

it is true that NOT IN is very often slow and generally to be avoided, but i would not guarantee it is always slower.

in your particular case, i would not be at all surprised to find that the fastest way to do this would be:

   select pk from b minus select b_pk from a;

depends on relative sizes, number of matches etc. you need to experiment if you care about performance. Received on Fri Feb 17 1995 - 19:39:24 CET

Original text of this message