Re: SQL Humor

From: Paul <paul_at_test.com>
Date: Fri, 19 Aug 2005 14:27:16 +0100
Message-ID: <4305e248$0$22929$ed2619ec_at_ptn-nntp-reader01.plus.net>


Mike Hodgson wrote:
> select * from InsuranceBrokers b
> where broker_id in
> (
> select distinct broker_id from InsurancePolicies p
> where p.inception_date > '20050101'
> )
...
> Best case scenario for EXISTS(), the first policy row for that broker
> started this year so that broker is included in the result set (scan 1
> row out of 80000); worst case scenario, the only policy the broker owns
> that started this year was created yesterday (and so has the greatest
> policy_id and so is last in the physical order of rows in the table -
> ie. full index scan; scan 80000 rows out of 80000). For all cases for
> IN() the query engine needs to go through every policy row that (that
> started this year) to compile the distinct list to present back to the
> outer query - i.e. full index scan.

The optimizer should be able to recognise that the distinct is unnecessary, so it wouldn't need to go through the whole table.

Just because the subquery uses "distinct" it doesn't mean that the DBMS must materialise that internally if it's not necessary. Though I guess index statistics may cause it to choose this option if it thinks it would be faster.

If you think about it, both queries are logically identical so it would be possible (at least in theory) for them to use identical query plans, if the query optimizer is clever enough.

Paul. Received on Fri Aug 19 2005 - 15:27:16 CEST

Original text of this message