Re: SQL Humor
Date: 19 Aug 2005 07:16:17 -0700
Message-ID: <1124460977.808341.113400_at_g49g2000cwa.googlegroups.com>
Mike Hodgson wrote:
> Perhaps it's an implementation detail of Oracle then (I'm not very
> familiar with how Oracle performs the various operations).
>
> What would happen if you had slightly more realistic data (like both
> tables with more than 1 row in them)? For example, say you had:
> InsurancePolicies {policy_id, broker_id, inception_date, ...} - (80000
> rows) clustered index on policy_id (don't know what Oracle parlance for
> "clustered index" is)
> InsuranceBrokers {broker_id, broker_name, ...} - (200 rows) clustered
> index on broker_id
>
> then you said:
>
> select * from InsuranceBrokers b
> where broker_id in
> (
> select distinct broker_id from InsurancePolicies p
> where p.inception_date > '20050101'
> )
>
> select * from InsuranceBrokers b
> where exists
> (
> select * from InsurancePolicies p
> where p.broker_id = b.broker_id
> and p.inception_date > '20050101'
> )
>
> Both queries should return the same data, namely all the brokers who own
> 1 or more policies that started this year. Now, with the IN() query,
> if the physical data in InsurancePolicies is sorted by policy_id, then
> how does the query engine know it's got all of the policies that started
> this year unless it goes through every single row in InsurancePolicies?
> With the EXISTS() version, as soon as the query engine finds that the
> broker in question owns a single policy that started this year it would
> stop trawling through the 80000 row policy table.
>
> Best case scenario for EXISTS(), the first policy row for that broker
> started this year so that broker is included in the result set (scan 1
> row out of 80000); worst case scenario, the only policy the broker owns
> that started this year was created yesterday (and so has the greatest
> policy_id and so is last in the physical order of rows in the table -
> ie. full index scan; scan 80000 rows out of 80000). For all cases for
> IN() the query engine needs to go through every policy row that (that
> started this year) to compile the distinct list to present back to the
> outer query - i.e. full index scan. Perhaps Oracle have done some
> particular optimisations in that area, but I believe that's the way
> Microsoft deal with it.
>
> Bit of a dumb example really because an inner join would be the best way
> to write that query anyway (well it would in SQL Server - I assume the
> same would hold true for Oracle) but it's the simplest example my poor
> tired brain would come up with at 10:30 on a Friday night.
Here is the best simulation of your example I can come up with at the moment. It shows that Oracle is indeed choosing different plans this time, though the elapsed time is similar either way (EXISTS slightly faster, 0.3 secs rather than 0.4)
>From the Oracle docs, it appears that "where x in (select PKCOL from
y)" is a special case, that can be optimized into a join.
SQL> select count(*) from insurance_brokers;
COUNT(*)
67
SQL> select count(*), count(distinct broker_id) from insurance_policies;
COUNT(*) COUNT(DISTINCTBROKER_ID)
---------- ------------------------ 30881 30
SQL> select * from Insurance_Brokers b
2 where broker_id in
3 ( 4 select distinct broker_id from Insurance_Policies p 5 where p.inception_date > date '2005-01-01' 6 );
BROKER_ID BROKER_NAME
---------- ------------------------------ 0 SYS 5 SYSTEM 11 OUTLN 18 DBSNMP 20 WMSYS 41 AURORA$JIS$UTILITY$ 29 ORDSYS 30 ORDPLUGINS 31 MDSYS 32 CTXSYS 34 XDB 38 FSC 39 RB 42 OSE$HTTP$ADMIN 56 PORTAL30 121 FLOWS_010600 120 FLOWS_FILES 58 PORTAL30_SSO 60 PORTAL30_SSO_PS 61 PORTAL30_DEMO 62 SCOTT 118 AFOSTER 71 WORKFLOW 122 SX3COM 128 TOAD 132 BMS1 134 DESDIR 141 JREED 145 NARROW 146 ARROW
30 rows selected.
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=94 Card=30 Bytes=720 ) 1 0 HASH JOIN (Cost=94 Card=30 Bytes=720) 2 1 VIEW OF 'VW_NSO_1' (Cost=92 Card=30 Bytes=390) 3 2 SORT (UNIQUE) (Cost=92 Card=30 Bytes=270) 4 3 TABLE ACCESS (FULL) OF 'INSURANCE_POLICIES' (Cost=9 Card=30881 Bytes=277929) 5 1 TABLE ACCESS (FULL) OF 'INSURANCE_BROKERS' (Cost=1 Card= 67 Bytes=737)
SQL> select * from Insurance_Brokers b
2 where exists
3 ( 4 select * from Insurance_Policies p 5 where p.broker_id = b.broker_id 6 and p.inception_date > date '2005-01-01' 7 );
BROKER_ID BROKER_NAME
---------- ------------------------------ 0 SYS 5 SYSTEM 11 OUTLN 18 DBSNMP 20 WMSYS 41 AURORA$JIS$UTILITY$ 29 ORDSYS 30 ORDPLUGINS 31 MDSYS 32 CTXSYS 34 XDB 38 FSC 39 RB 42 OSE$HTTP$ADMIN 56 PORTAL30 121 FLOWS_010600 120 FLOWS_FILES 58 PORTAL30_SSO 60 PORTAL30_SSO_PS 61 PORTAL30_DEMO 62 SCOTT 118 AFOSTER 71 WORKFLOW 122 SX3COM 128 TOAD 132 BMS1 134 DESDIR 141 JREED 145 NARROW 146 ARROW
30 rows selected.
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=3 Bytes=33) 1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'INSURANCE_BROKERS' (Cost=1 Card= 3 Bytes=33) 3 1 TABLE ACCESS (BY INDEX ROWID) OF 'INSURANCE_POLICIES' (C ost=6 Card=1029 Bytes=9261) 4 3 INDEX (RANGE SCAN) OF 'IP_IB' (NON-UNIQUE) (Cost=2 Car d=1029)Received on Fri Aug 19 2005 - 16:16:17 CEST