Re: SQL Humor

From: Mike Hodgson <mike.hodgson_at_mallesons.nospam.com>
Date: Fri, 19 Aug 2005 22:44:13 +1000
Message-ID: <u9z01uLpFHA.4056_at_TK2MSFTNGP10.phx.gbl>


Perhaps it's an implementation detail of Oracle then (I'm not very familiar with how Oracle performs the various operations).

What would happen if you had slightly more realistic data (like both tables with more than 1 row in them)?  For example, say you had:
InsurancePolicies {policy_id, broker_id, inception_date, ...} - (80000 rows) clustered index on policy_id (don't know what Oracle parlance for "clustered index" is)
InsuranceBrokers {broker_id, broker_name, ...} - (200 rows) clustered index on broker_id

then you said:

select * from InsuranceBrokers b
where broker_id in
    (
    select distinct broker_id from InsurancePolicies p
    where p.inception_date > '20050101'
    )

select * from InsuranceBrokers b
where exists
    (
    select * from InsurancePolicies p
    where p.broker_id = b.broker_id
    and p.inception_date > '20050101'
    )

Both queries should return the same data, namely all the brokers who own 1 or more policies  that started this year.  Now, with the IN() query, if the physical data in InsurancePolicies is sorted by policy_id, then how does the query engine know it's got all of the policies that started this year unless it goes through every single row in InsurancePolicies?  With the EXISTS() version, as soon as the query engine finds that the broker in question owns a single policy that started this year it would stop trawling through the 80000 row policy table.

Best case scenario for EXISTS(), the first policy row for that broker started this year so that broker is included in the result set (scan 1 row out of 80000); worst case scenario, the only policy the broker owns that started this year was created yesterday (and so has the greatest policy_id and so is last in the physical order of rows in the table - ie. full index scan; scan 80000 rows out of 80000).  For all cases for IN() the query engine needs to go through every policy row that (that started this year) to compile the distinct list to present back to the outer query - i.e. full index scan.  Perhaps Oracle have done some particular optimisations in that area, but I believe that's the way Microsoft deal with it.

Bit of a dumb example really because an inner join would be the best way to write that query anyway (well it would in SQL Server - I assume the same would hold true for Oracle) but it's the simplest example my poor tired brain would come up with at 10:30 on a Friday night.

--
mike hodgson
blog: http://sqlnerd.blogspot.com



Tony Andrews wrote:
Paul wrote:
  
Mike Hodgson wrote:
    
The EXISTS() predicate is typically a fairly efficient predicate because
it only needs to scan until it gets a match, at which time it returns.
The worst case scenario (it finds a match on the last physical row, or
it doesn't find any matching row) is the same I/O as the IN() predicate
case because IN() will evaluate the entire subquery.
      
Why does IN() need to evaluate the entire subquery? Couldn't it in
theory work exactly the same as EXISTS() at the physical level?
    

Yes it could, and indeed does (Oracle 9i).  In the following example,
IN and EXISTS are processed the same way, and DO NOT evaluate the
entire subquery:

SQL> create table t1 as select object_id, object_name from all_objects;

Table created.

SQL> alter table t1 add constraint t1_pk primary key (object_id);

Table altered.

SQL> create table t2 as select object_id, object_name from all_objects
where rownum=1;

Table created.

SQL> alter table t2 add constraint t2_pk primary key (object_id);

Table altered.

SQL> analyze table t1 compute statistics;

Table analyzed.

SQL> analyze table t2 compute statistics;

Table analyzed.

SQL> select count(*) from t1;

  COUNT(*)
----------
     47355

SQL> select count(*) from t2;

  COUNT(*)
----------
         1

SQL> set autotrace on
SQL> select * from t2 where object_id in (select object_id from t1);

 OBJECT_ID OBJECT_NAME
---------- ------------------------------
     18164 /1005bd30_LnkdConstant


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=30)
   1    0   NESTED LOOPS (Cost=1 Card=1 Bytes=30)
   2    1     TABLE ACCESS (FULL) OF 'T2' (Cost=1 Card=1 Bytes=26)
   3    1     INDEX (UNIQUE SCAN) OF 'T1_PK' (UNIQUE)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          3  physical reads
          0  redo size
        227  bytes sent via SQL*Net to client
        314  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select * from t2 where exists (select null from t1 where
t1.object_id = t2.object_id);

 OBJECT_ID OBJECT_NAME
---------- ------------------------------
     18164 /1005bd30_LnkdConstant


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=26)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'T2' (Cost=1 Card=1 Bytes=26)
   3    1     INDEX (UNIQUE SCAN) OF 'T1_PK' (UNIQUE) (Cost=1 Card=1 B
          ytes=4)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        227  bytes sent via SQL*Net to client
        314  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

  
Received on Fri Aug 19 2005 - 14:44:13 CEST

Original text of this message