Finding index keys in a v large table that are not in a large table.

From: stephen O'D <stephen.odonnell_at_gmail.com>
Date: 3 Feb 2005 05:03:42 -0800
Message-ID: <1107435822.758185.195840@c13g2000cwb.googlegroups.com>

Ok, I have two tables called audit and audit_summary. Both are indexed on audit_seq (audit is non unique and audit_summary is unique):-

Audit defn:-

audit_seq, message_number, message_text

Audit_summary defn:-

audit_seq, <various other fields parsed out of the message text>

In an online transaction processing system, many records can be written into
the audit table all with the same audit_seq per transaction.

Every so often, a job starts up that gets all the new audit_seqs that are
not in the audit_summary table and inserts the new records into it.

These tables are getting pretty big eg Audit is >15Million rows and Audit_summary is around a few million.

Anyone have any thoughts on the most efficient way to extract the new audit_seqs in the audit table?

select audit_seq,message_text
from audit
where audit_seq > (select max(audit_seq) from audit_summary)

is no good as there is potential for audit_seqs to get skipped.

I created two test tables called big (~ 1 Million rows) and small (~100K
rows) both with columns audit_seq,object_id.

I then ran the following:-

1 select big.audit_seq, object_id
2 from big, (select audit_seq

  3                from big
  4                minus
  5                select audit_seq
  6                from small) seqs

7* where big.audit_seq = seqs.audit_seq SQL> / 804033 rows selected.

Execution Plan

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4187 Card=1204031 By
          tes=30100775)

   1    0   HASH JOIN (Cost=4187 Card=1204031 Bytes=30100775)
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=470 Card=1004032 Byte
          s=12048384)

   3    1     VIEW (Cost=3054 Card=1204031 Bytes=15652403)
   4    3       MINUS
   5    4         SORT (UNIQUE) (Cost=2536 Card=1004032 Bytes=4016128)
   6    5           TABLE ACCESS (FULL) OF 'BIG' (Cost=470 Card=100403
          2 Bytes=4016128)

   7    4         SORT (UNIQUE) (Cost=518 Card=199999 Bytes=799996)
   8    7           TABLE ACCESS (FULL) OF 'SMALL' (Cost=94 Card=19999
          9 Bytes=799996)

Statistics

         18  recursive calls
         47  db block gets
       4335  consistent gets
       5578  physical reads
          0  redo size

20061109 bytes sent via SQL*Net to client 4342120 bytes received via SQL*Net from client

      53604  SQL*Net roundtrips to/from client
          5  sorts (memory)
          1  sorts (disk)
     804033  rows processed

Then I tried ...

1 select audit_seq, object_id
2* from big where not exists (select null from small where big.audit_seq =
small.audit_seq)
SQL> / 804033 rows selected.

Execution Plan

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=470 Card=50202 Bytes
          =602424)

   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=470 Card=50202 Bytes=
          602424)

   3    1     INDEX (UNIQUE SCAN) OF 'SMALL_INDEX' (UNIQUE) (Cost=1 Ca
          rd=1 Bytes=4)

Statistics

          0  recursive calls
         15  db block gets
    2063539  consistent gets
       1443  physical reads
          0  redo size

20061109 bytes sent via SQL*Net to client 4342120 bytes received via SQL*Net from client

      53604  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
     804033  rows processed

Now the consistent gets are way bigger for the not exits (I imagine it is
something todo with repeated index access on small for each row in big, but
i am not sure), so I am thinking the 'minus' query is better - BUT the minus query requires two full scans of BIG - so why are the consistent gets so big for the not exists query?

Surely there must be a better way? I also tired using where not in (as oppose to the not
exists) but it didnt return any data for ages longer than the other two methods, so i killed it.

On the production system the table access full on Big takes a very long time
which is becoming unacceptable!

Any ideas?

Thanks,

Stephen. Received on Thu Feb 03 2005 - 07:03:42 CST