Re: anti-joins and performance

From: paul c <toledobythesea_at_oohay.ac>
Date: Wed, 14 Nov 2012 18:24:10 -0800
Message-ID: <k81jop$6rl$2_at_speranza.aioe.org>


On 12/11/2012 9:39 PM, Fredrik Bertilsson wrote:
> Are there any way for an anti-join like below to perform better than O(N),

N=number of rows in table1? In my case (Informix) the number of rows in both table1

  and table2 are huge, and the expected result is only a few rows.

Are there any chance that a database engine could return a result without

actually having to make a full table scan in table1?
>
> select *
> from table1 t1
> where not exists (select * from table2 t2 where t2.key=t1.key)

I presume this is the same as asking is it possible to have a row in table1 whose "key" doesn't match the key of any row in table2 and avoid "reading" that table1 row? I'd say the answer to that is no, ie., it's not logically possible to know that a table1 row's key doesn't match a table2 key without knowing the table1 key.

So my guess is that you are really intending a situation that has physical nuances not mentioned in the question as written.

As far as I know, "table scan" means sequential reading of all the physical blocks and thus all the rows of a table. I would think any dbms that supports clustered indexes would give O(N) performance, by scanning both tables in parallel, also that this would be pretty good performance too. Is it possible that the keys don't have clustered indexes? If neither does, performance could be pretty poor. Non-clustered indexes wouldn't be good enough, nor would some pathologically fragmented underlying filesystem where every physical block was far away from the sequentially adjacent block.

I'm saying this without knowing anything about Informix, not even what terminology it uses. Received on Thu Nov 15 2012 - 03:24:10 CET

Original text of this message