Re: anti-joins and performance

From: Cimode <cimode_at_hotmail.com>
Date: Fri, 21 Dec 2012 00:02:21 -0800 (PST)
Message-ID: <e761b981-2aba-4edc-a5e5-66d151da6482_at_googlegroups.com>


Le mardi 13 novembre 2012 05:39:22 UTC, Fredrik Bertilsson a écrit :
> Are there any way for an anti-join like below to perform better than O(N), N=number of rows in table1? In my case (Informix) the number of rows in both table1 and table2 are huge, and the expected result is only a few rows. Are there any chance that a database engine could return a result without actually having to make a full table scan in table1?
>
>
>
> select *
>
> from table1 t1
>
> where not exists (select * from table2 t2 where t2.key=t1.key)

Try the opposite.

select *
from table1
where key not in (select key from table1 t1 inner join table2 t2 on t1.key=t2.key )

Hope this helps.

_at_paulc
I would like to correct a few misconceptions about direct image systems, you unknowingly are spreading in your comment:

> On modern storage IO subsystems such as RAID arrays, it does not make sense to speak of sequentiality regarding rows but it could make sense to speak about adjacency. All IO read/write operations of rows presented to user can result in non physically sequential IO's decided by the IO subsystem (such as the IO controller). In other words, what is usually *presented* as a sequential read most of the time in fact ends up as a physical read on multiple disks in a *time sequence* determined by the IO subsystem algorthythmics. This is a consequence of logical physical layer confusion direct image system induce since they do not achieve true physical data independence.
> On direct image systems, the difference between clustered vs full table scans simply imply the quantity of logical IO's performed to determine which one meets a specific predicate. WIth a clustered index scan, such quantity is reduced by keeping range value statistics about values stored and skipping the one that do not meet the predicate. In other words, a clustered index scan is only more effective/economic method than full scan but not more efficient.
> A direct image system implementation can not be a relational implementation because it *inherently* the principle of data independence since the data physical encoding is directly bound the user data presentation. The latest transrelational model is an another example of direct image system, which explains it was doomed to fail.

Regards Received on Fri Dec 21 2012 - 09:02:21 CET

Original text of this message