Re: SQL Humor

From: Tony Andrews <andrewst_at_onetel.com>
Date: 19 Aug 2005 07:16:17 -0700
Message-ID: <1124460977.808341.113400_at_g49g2000cwa.googlegroups.com>


Mike Hodgson wrote:
> Perhaps it's an implementation detail of Oracle then (I'm not very
> familiar with how Oracle performs the various operations).
>
> What would happen if you had slightly more realistic data (like both
> tables with more than 1 row in them)? For example, say you had:
> InsurancePolicies {policy_id, broker_id, inception_date, ...} - (80000
> rows) clustered index on policy_id (don't know what Oracle parlance for
> "clustered index" is)
> InsuranceBrokers {broker_id, broker_name, ...} - (200 rows) clustered
> index on broker_id
>
> then you said:
>
> select * from InsuranceBrokers b
> where broker_id in
> (
> select distinct broker_id from InsurancePolicies p
> where p.inception_date > '20050101'
> )
>
> select * from InsuranceBrokers b
> where exists
> (
> select * from InsurancePolicies p
> where p.broker_id = b.broker_id
> and p.inception_date > '20050101'
> )
>
> Both queries should return the same data, namely all the brokers who own
> 1 or more policies that started this year. Now, with the IN() query,
> if the physical data in InsurancePolicies is sorted by policy_id, then
> how does the query engine know it's got all of the policies that started
> this year unless it goes through every single row in InsurancePolicies?
> With the EXISTS() version, as soon as the query engine finds that the
> broker in question owns a single policy that started this year it would
> stop trawling through the 80000 row policy table.
>
> Best case scenario for EXISTS(), the first policy row for that broker
> started this year so that broker is included in the result set (scan 1
> row out of 80000); worst case scenario, the only policy the broker owns
> that started this year was created yesterday (and so has the greatest
> policy_id and so is last in the physical order of rows in the table -
> ie. full index scan; scan 80000 rows out of 80000). For all cases for
> IN() the query engine needs to go through every policy row that (that
> started this year) to compile the distinct list to present back to the
> outer query - i.e. full index scan. Perhaps Oracle have done some
> particular optimisations in that area, but I believe that's the way
> Microsoft deal with it.
>
> Bit of a dumb example really because an inner join would be the best way
> to write that query anyway (well it would in SQL Server - I assume the
> same would hold true for Oracle) but it's the simplest example my poor
> tired brain would come up with at 10:30 on a Friday night.

Here is the best simulation of your example I can come up with at the moment. It shows that Oracle is indeed choosing different plans this time, though the elapsed time is similar either way (EXISTS slightly faster, 0.3 secs rather than 0.4)

>From the Oracle docs, it appears that "where x in (select PKCOL from
y)" is a special case, that can be optimized into a join.

SQL> select count(*) from insurance_brokers;

  COUNT(*)


        67

SQL> select count(*), count(distinct broker_id) from insurance_policies;

  COUNT(*) COUNT(DISTINCTBROKER_ID)

---------- ------------------------
     30881                       30

SQL> select * from Insurance_Brokers b
  2 where broker_id in

  3     (
  4     select distinct broker_id from Insurance_Policies p
  5     where p.inception_date > date '2005-01-01'
  6     );

 BROKER_ID BROKER_NAME

---------- ------------------------------
         0 SYS
         5 SYSTEM
        11 OUTLN
        18 DBSNMP
        20 WMSYS
        41 AURORA$JIS$UTILITY$
        29 ORDSYS
        30 ORDPLUGINS
        31 MDSYS
        32 CTXSYS
        34 XDB
        38 FSC
        39 RB
        42 OSE$HTTP$ADMIN
        56 PORTAL30
       121 FLOWS_010600
       120 FLOWS_FILES
        58 PORTAL30_SSO
        60 PORTAL30_SSO_PS
        61 PORTAL30_DEMO
        62 SCOTT
       118 AFOSTER
        71 WORKFLOW
       122 SX3COM
       128 TOAD
       132 BMS1
       134 DESDIR
       141 JREED
       145 NARROW
       146 ARROW

30 rows selected.

Execution Plan


   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=94 Card=30 Bytes=720
          )

   1    0   HASH JOIN (Cost=94 Card=30 Bytes=720)
   2    1     VIEW OF 'VW_NSO_1' (Cost=92 Card=30 Bytes=390)
   3    2       SORT (UNIQUE) (Cost=92 Card=30 Bytes=270)
   4    3         TABLE ACCESS (FULL) OF 'INSURANCE_POLICIES' (Cost=9
          Card=30881 Bytes=277929)

   5    1     TABLE ACCESS (FULL) OF 'INSURANCE_BROKERS' (Cost=1 Card=
          67 Bytes=737)

SQL> select * from Insurance_Brokers b
  2 where exists

  3      (
  4      select * from Insurance_Policies p
  5      where p.broker_id = b.broker_id
  6      and p.inception_date > date '2005-01-01'
  7      );

 BROKER_ID BROKER_NAME

---------- ------------------------------
         0 SYS
         5 SYSTEM
        11 OUTLN
        18 DBSNMP
        20 WMSYS
        41 AURORA$JIS$UTILITY$
        29 ORDSYS
        30 ORDPLUGINS
        31 MDSYS
        32 CTXSYS
        34 XDB
        38 FSC
        39 RB
        42 OSE$HTTP$ADMIN
        56 PORTAL30
       121 FLOWS_010600
       120 FLOWS_FILES
        58 PORTAL30_SSO
        60 PORTAL30_SSO_PS
        61 PORTAL30_DEMO
        62 SCOTT
       118 AFOSTER
        71 WORKFLOW
       122 SX3COM
       128 TOAD
       132 BMS1
       134 DESDIR
       141 JREED
       145 NARROW
       146 ARROW

30 rows selected.

Execution Plan


   0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=3 Bytes=33)    1 0 FILTER

   2    1     TABLE ACCESS (FULL) OF 'INSURANCE_BROKERS' (Cost=1 Card=
          3 Bytes=33)

   3    1     TABLE ACCESS (BY INDEX ROWID) OF 'INSURANCE_POLICIES' (C
          ost=6 Card=1029 Bytes=9261)

   4    3       INDEX (RANGE SCAN) OF 'IP_IB' (NON-UNIQUE) (Cost=2 Car
          d=1029)
Received on Fri Aug 19 2005 - 16:16:17 CEST

Original text of this message