Re: Straight SQL always put perform PL/SQL?

From: <fitzjarrell_at_cox.net>
Date: Mon, 10 Dec 2007 11:11:23 -0800 (PST)
Message-ID: <65fcbc46-3494-43f1-a7a4-3fa1bd979aff@w28g2000hsf.googlegroups.com>

Comments embedded.
On Dec 10, 8:51 am, dshprope..._at_gmail.com wrote:
> http://tkyte.blogspot.com/2006/10/slow-by-slow.html
> In there Tom Kytes says:
> * You should do it in a single SQL statement if at all possible.
> * If you cannot do it in a single SQL Statement, then do it in PL/SQL.
>
> I was a true beliver of that until this case. We have a huge table,
> and we need to update it with
> values from another table.
>
> Let me give you an example
> create table test (id number,col1 varchar2(20)) pctfree 99;
> (this is only for creating a huge table with lots of columns)

Sorry, no, this creates a table with a huge number of extents.

> declare
> v_number number :=1;
> begin
> loop
> insert into test values (v_number, 'a');
> v_number := v_number+1;
> exit when v_number > 1000000;
> end loop;
> commit;
> end;
>
> create table test1 as select * from test where id < 400000;
> update test1 set col1='b';
> commit;
>
> create index test_ind on test(id);
> create index test1_ind on test1(id);

> analyze table test compute statistics;
> analyze table test1 compute statistics;

Antiquated utility to generate statistics; use dbms_stats.gather_table_stats instead. And cascade that to the indexes; your statistics don't include the index stats as you'll also need to run analyze index to get those generated.

> SQL> update test
> set col1=(select col1 from test1
> where test.id=test1.id )
> where exists
> (select 1 from test1
> where test.id=test1.id);
> 399999 rows updated.
> Elapsed: 00:04:15.01

4 minutes to update the table via SQL*Plus ... possibly a poorly configured system? Or, possibly, due to the missing index statistics. My attempt at this didn't take nearly as long with SQL*Plus. I've posted my 'stab' at this at the end.

> SQL> declare
> 2 v_count number;
> 3 cursor test_curs is
> 4 select test.id,test.col1 from test,test1
> 5 where test.id=test1.id;
> 6 begin
> 7 for test_cur in test_curs
> 8 loop
> 9 update test
> 10 set col1=test_cur.col1
> 11 where id=test_cur.id;
> 12 end loop;
> 13 commit;
> 14 end;
> 15 /
> PL/SQL procedure successfully completed.
> Elapsed: 00:04:30.82
>

And a little over 4 minutes here; 15 seconds difference between the two. Again, a poorly configured system? A slow disk subsystem? Or, again, missing index statistics?

> Here the difference is not huge yet, but in our case, single SQL will
> take an hour to finish,

And how did you compute that figure?

> while
> PL/SQL only take 5 min.

Again, you've proven this how?

> The bigger the table test, the huger the
> difference.

We're still waiting for your proof. The above 'test' doesn't qualify.

>
> And the reason is simple, as there will be 40% of rows needs to be
> updated, Oracle will pick full
> table scan of table test(a huge table).
> While in PL/SQL, "UPDATE TEST
> SET COL1=:B2 WHERE ID=:B1"
> will be able to use index.

And will be a row-by-row update, quite inefficient for PL/SQL.

>
> Any comments will be highly appreciated

Let's look at another example, basically your code altered to set a pctfree of 90 rather than 99:

SQL>
SQL> create table test (id number,col1 varchar2(20)) pctfree 90;

Table created.

Elapsed: 00:00:00.04
SQL>
SQL> declare
  2 v_number number :=1;
  3 begin
  4 loop
  5 insert into test values (v_number, 'a');   6 v_number := v_number+1;
  7 exit when v_number > 1000000;
  8 end loop;
  9 commit;
10 end;
11 /

PL/SQL procedure successfully completed.

Elapsed: 00:04:50.96

SQL>
SQL>
SQL> create table test1

2 as select * from test where id < 400000;

Table created.

Elapsed: 00:00:03.64
SQL>
SQL> update test1 set col1='b';

399999 rows updated.

Elapsed: 00:00:46.50

Execution Plan

Plan hash value:
4001801719

| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)|
Time     |
----------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |       |   413K|  4849K|   179  (16)|
00:00:03 |
|   1 |  UPDATE            | TEST1 |       |       |
|          |

| 2 | TABLE ACCESS FULL| TEST1 | 413K| 4849K| 179 (16)| 00:00:03 |

Note

dynamic sampling used for this statement

Statistics

455 recursive
calls

413417 db block
gets

1647 consistent
gets

663 physical
reads
120968492 redo
size

402 bytes sent via SQL*Net to
client

309 bytes received via SQL*Net from client

3 SQL*Net roundtrips to/from
client

2 sorts
(memory)

0 sorts
(disk)

399999 rows
processed

SQL> commit;

Commit complete.

Elapsed: 00:00:00.03
SQL>
SQL> create index test_ind on test(id);

Index created.

Elapsed: 00:00:12.43
SQL> create index test1_ind on test1(id);

Index created.

Elapsed: 00:00:05.01
SQL> exec dbms_stats.gather_table_stats(user, 'TEST', method_opt=>'for all indexed columns', cascade=>TRUE); -- also generates index stats, which you didn't do.

PL/SQL procedure successfully completed.

Elapsed: 00:00:40.09
SQL> exec dbms_stats.gather_table_stats(user, 'TEST1', method_opt=>'for all indexed columns', cascade=>TRUE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:15.48

SQL> -- analyze table test compute statistics;
SQL> -- analyze table test1 compute statistics;
SQL> update test
  2  	  set col1=(select col1 from test1
  3  	  where test.id=test1.id )
  4  	  where exists
  5  	  (select 1 from test1
  6  	  where test.id=test1.id);

399999 rows updated.

Elapsed: 00:01:18.31 -- Hmm, with proper statistics on both table and index the update takes only a minute and eighteen seconds. Your four-minute-and-15-second run probably suffered from not having those indexes analyzed.

Execution Plan

Plan hash value:
3519711655

| Id  | Operation                    | Name      | Rows  | Bytes |
TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |           |   397K|
4273K|       |  4887   (5)| 00:00:59 |
|   1 |  UPDATE                      | TEST      |       |
|       |            |          |
|*  2 |   HASH JOIN RIGHT SEMI       |           |   397K|  4273K|
6608K|  4887   (5)| 00:00:59 |
|   3 |    TABLE ACCESS FULL         | TEST1     |   397K|
1942K|       |   175  (14)| 00:00:03 |
|   4 |    TABLE ACCESS FULL         | TEST      |  1005K|
5892K|       |  3432   (4)| 00:00:42 |
|   5 |   TABLE ACCESS BY INDEX ROWID| TEST1     |     1 |     6
|       |     4   (0)| 00:00:01 |
|*  6 |    INDEX RANGE SCAN          | TEST1_IND |     1 |
|       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

2 -
access("TEST"."ID"="TEST1"."ID")

6 -
access("TEST1"."ID"=:B1)

Statistics

181 recursive
calls

409089 db block
gets

1216618 consistent
gets

1 physical
reads

99698676 redo
size

402 bytes sent via SQL*Net to
client

440 bytes received via SQL*Net from client

3 SQL*Net roundtrips to/from
client

1 sorts
(memory)

0 sorts
(disk)

399999 rows
processed

Yes, it's a full table scan on TEST, but the index is used for the subquery.

SQL> declare
  2 v_count number;
  3 cursor test_curs is
  4 select test.id,test.col1 from test,test1   5 where test.id=test1.id;
  6 begin
  7 for test_cur in test_curs
  8 loop
  9 update test
10 set col1=test_cur.col1
11 where id=test_cur.id;
12 end loop;
13 commit;
14 end;
15 /

PL/SQL procedure successfully completed.

Elapsed: 00:02:50.68
SQL> Oh, gee, it takes twice as long with PL/SQL and properly computed statistics to perform the same task. I can't understand how you determined, from your findings, that this will require over an hour to complete with SQL*Plus and only 5 minutes, regardless of the table size, in PL/SQL. Your test is flawed, obviously, possibly due to your use of analyze table in the absence of the corresponding analyze index command, as I should produce something similar to your output if it were a reliable example. But a doubling of the execution time through PL/SQL doesn't match with your 15 second difference (and, again, your times appear to be quite inflated for the task at hand).

Post any proof you have for your contentions, as they're just unsupported claims at the moment.

David Fitzjarrell Received on Mon Dec 10 2007 - 13:11:23 CST