Re: Book Review: "SQL Tuning" by Dan Tow

From: Mikito Harakiri <mikharakiri_at_iahu.com>
Date: Fri, 2 Jan 2004 10:42:10 -0800
Message-ID: <a4jJb.3$Ma6.145@news.oracle.com>

"Ryan" <rgaffuri_at_cox.net> wrote in message news:Dr5Jb.59057$hf1.31218_at_lakeread06...
> I flipped through this book. I like his 'method'. Ill buy it eventually.
> However, this statement above(which Im assuming is from the book) is
wrong.
> HJ with full scans of both tables routinely outperforms NJs by 1000% or
more
> when a large number of rows are involved. Its not even close.
>
> Bench mark it. Dont just repeat what you read.

Excuse me, but do you understand the difference between proof and counter example? If I'm claiming that HJ is never significantly faster than indexed NL, then I can't prove it with just one random benchmark. However, you can refute my claim with just one test case.

If you still insist, here is an example (by Hakan Jakobson):

create table foo1(c1 number, c2 number, c3 number, c varchar2(50), c5 varchar2(50), c6 varchar2(50));

declare i integer;
begin
for i in 1..500000 loop
insert into foo1 values(i, 500001-i, mod(i, 10),

'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA',

'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA',

'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA');
     if mod(i, 100) = 0 then commit; end if; end loop;

end;
/

create table foo2(c1 integer);
declare i integer;
begin
   for i in 0..9 loop
     insert into foo2 values(i);
   end loop;
   commit;
end;
/

create index foo1_c3 on foo1(c3);
analyze table foo1 compute statistics;
analyze table foo2 compute statistics;

select /*+use_nl(foo1 foo2) ordered index(foo1)*/ count(c6) from foo2, foo1 where foo1.c3 = foo2.c1;

select /*+use_hash(foo1 foo2)*/ count(c6) from foo1, foo2 where foo1.c3 = foo2.c1;

And the performance difference is 12 times difference in HJ favor.

We can change join order, though:

SQL>select --+use_nl(foo2) ordered
  2 count(c6)
  3 from foo1, foo2
  4 where foo1.c3 = foo2.c1;

Elapsed: 00:01:09.00

And difference reduced to 2 times.

Then, Hakan suggested recreating foo2 as follows:

create table foo2(c1 number, c2 number, c3 number, c4 varchar2(50), c5 varchar2(50), c6 varchar2(50)); declare i integer;
begin
for i in 1..5000 loop

     insert into foo2 values(i, 5001-i, mod(i, 10),
                 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA',
     'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA',
 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'); if mod(i,
 100) = 0 then commit;
      end if;

end loop;
end;
/

which gives the following timing:

NL foo2->foo1 6 min
NL foo1->foo2 40 sec (I added index for foo2.c1)

HJ                       24 sec

In another exchange Benoit Dageville suggested distributing values more randomly to leverage poor performance of random io, but once again, I never see a case where performance difference is significant.

Or, maybe you don't quite believe that one can easily build an opposite tescase where NL is 10000 times faster than HJ? I thought this claim is obvious. Received on Fri Jan 02 2004 - 12:42:10 CST