Re: Straight SQL always put perform PL/SQL?

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: Mon, 10 Dec 2007 12:51:22 -0800 (PST)
Message-ID: <4cdb4a7d-ef9d-4f04-9c04-03a362b7275a@j20g2000hsi.googlegroups.com>

On Dec 10, 2:11 pm, "fitzjarr..._at_cox.net" <fitzjarr..._at_cox.net> wrote:
> Comments embedded.
> On Dec 10, 8:51 am, dshprope..._at_gmail.com wrote:
>
> >http://tkyte.blogspot.com/2006/10/slow-by-slow.html
> > In there Tom Kytes says:
> > * You should do it in a single SQL statement if at all possible.
> > * If you cannot do it in a single SQL Statement, then do it in PL/SQL.
>
> > I was a true beliver of that until this case. We have a huge table,
> > and we need to update it with
> > values from another table.
>
> > Let me give you an example
> > create table test (id number,col1 varchar2(20)) pctfree 99;
> > (this is only for creating a huge table with lots of columns)
>
> Sorry, no, this creates a table with a huge number of extents.
>
>
>
>
>
> > declare
> > v_number number :=1;
> > begin
> > loop
> > insert into test values (v_number, 'a');
> > v_number := v_number+1;
> > exit when v_number > 1000000;
> > end loop;
> > commit;
> > end;
>
> > create table test1 as select * from test where id < 400000;
> > update test1 set col1='b';
> > commit;
>
> > create index test_ind on test(id);
> > create index test1_ind on test1(id);
> > analyze table test compute statistics;
> > analyze table test1 compute statistics;
>
> Antiquated utility to generate statistics; use
> dbms_stats.gather_table_stats instead. And cascade that to the
> indexes; your statistics don't include the index stats as you'll also
> need to run analyze index to get those generated.
>
> > SQL> update test
> > set col1=(select col1 from test1
> > where test.id=test1.id )
> > where exists
> > (select 1 from test1
> > where test.id=test1.id);
> > 399999 rows updated.
> > Elapsed: 00:04:15.01
>
> 4 minutes to update the table via SQL*Plus ... possibly a poorly
> configured system? Or, possibly, due to the missing index
> statistics. My attempt at this didn't take nearly as long with
> SQL*Plus. I've posted my 'stab' at this at the end.
>
>
>
>
>
> > SQL> declare
> > 2 v_count number;
> > 3 cursor test_curs is
> > 4 select test.id,test.col1 from test,test1
> > 5 where test.id=test1.id;
> > 6 begin
> > 7 for test_cur in test_curs
> > 8 loop
> > 9 update test
> > 10 set col1=test_cur.col1
> > 11 where id=test_cur.id;
> > 12 end loop;
> > 13 commit;
> > 14 end;
> > 15 /
> > PL/SQL procedure successfully completed.
> > Elapsed: 00:04:30.82
>
> And a little over 4 minutes here; 15 seconds difference between the
> two. Again, a poorly configured system? A slow disk subsystem? Or,
> again, missing index statistics?
>
> > Here the difference is not huge yet, but in our case, single SQL will
> > take an hour to finish,
>
> And how did you compute that figure?
>
> > while
> > PL/SQL only take 5 min.
>
> Again, you've proven this how?
>
> > The bigger the table test, the huger the
> > difference.
>
> We're still waiting for your proof. The above 'test' doesn't qualify.
>
>
>
> > And the reason is simple, as there will be 40% of rows needs to be
> > updated, Oracle will pick full
> > table scan of table test(a huge table).
> > While in PL/SQL, "UPDATE TEST
> > SET COL1=:B2 WHERE ID=:B1"
> > will be able to use index.
>
> And will be a row-by-row update, quite inefficient for PL/SQL.
>
>
>
> > Any comments will be highly appreciated
>
> Let's look at another example, basically your code altered to set a
> pctfree of 90 rather than 99:
>
> SQL>
> SQL> create table test (id number,col1 varchar2(20)) pctfree 90;
>
> Table created.
>
> Elapsed: 00:00:00.04
> SQL>
> SQL> declare
> 2 v_number number :=1;
> 3 begin
> 4 loop
> 5 insert into test values (v_number, 'a');
> 6 v_number := v_number+1;
> 7 exit when v_number > 1000000;
> 8 end loop;
> 9 commit;
> 10 end;
> 11 /
>
> PL/SQL procedure successfully completed.
>
> Elapsed: 00:04:50.96
> SQL>
> SQL>
> SQL> create table test1
> 2 as select * from test where id < 400000;
>
> Table created.
>
> Elapsed: 00:00:03.64
> SQL>
> SQL> update test1 set col1='b';
>
> 399999 rows updated.
>
> Elapsed: 00:00:46.50
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value:
> 4001801719
>
> -----------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
> Time |
> -----------------------------------------------------------------------------
> | 0 | UPDATE STATEMENT | | 413K| 4849K| 179 (16)|
> 00:00:03 |
> | 1 | UPDATE | TEST1 | | |
> | |
> | 2 | TABLE ACCESS FULL| TEST1 | 413K| 4849K| 179 (16)|
> 00:00:03 |
> -----------------------------------------------------------------------------
>
> Note
> -----
> - dynamic sampling used for this
> statement
>
> Statistics
> ----------------------------------------------------------
> 455 recursive
> calls
> 413417 db block
> gets
> 1647 consistent
> gets
> 663 physical
> reads
> 120968492 redo
> size
> 402 bytes sent via SQL*Net to
> client
> 309 bytes received via SQL*Net from
> client
> 3 SQL*Net roundtrips to/from
> client
> 2 sorts
> (memory)
> 0 sorts
> (disk)
> 399999 rows
> processed
>
> SQL> commit;
>
> Commit complete.
>
> Elapsed: 00:00:00.03
> SQL>
> SQL> create index test_ind on test(id);
>
> Index created.
>
> Elapsed: 00:00:12.43
> SQL> create index test1_ind on test1(id);
>
> Index created.
>
> Elapsed: 00:00:05.01
> SQL> exec dbms_stats.gather_table_stats(user, 'TEST', method_opt=>'for
> all indexed columns', cascade=>TRUE); -- also generates index stats,
> which you didn't do.
>
> PL/SQL procedure successfully completed.
>
> Elapsed: 00:00:40.09
> SQL> exec dbms_stats.gather_table_stats(user, 'TEST1',
> method_opt=>'for all indexed columns', cascade=>TRUE);
>
> PL/SQL procedure successfully completed.
>
> Elapsed: 00:00:15.48
> SQL> -- analyze table test compute statistics;
> SQL> -- analyze table test1 compute statistics;
> SQL> update test
> 2 set col1=(select col1 from test1
> 3 where test.id=test1.id )
> 4 where exists
> 5 (select 1 from test1
> 6 where test.id=test1.id);
>
> 399999 rows updated.
>
> Elapsed: 00:01:18.31 -- Hmm, with proper statistics on both table
> and index the update takes only a minute and eighteen seconds. Your
> four-minute-and-15-second run probably suffered from not having those
> indexes analyzed.
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value:
> 3519711655
> ---------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes |
> TempSpc| Cost (%CPU)| Time |
> ---------------------------------------------------------------------------------------------------
> | 0 | UPDATE STATEMENT | | 397K|
> 4273K| | 4887 (5)| 00:00:59 |
> | 1 | UPDATE | TEST | |
> | | | |
> |* 2 | HASH JOIN RIGHT SEMI | | 397K| 4273K|
> 6608K| 4887 (5)| 00:00:59 |
> | 3 | TABLE ACCESS FULL | TEST1 | 397K|
> 1942K| | 175 (14)| 00:00:03 |
> | 4 | TABLE ACCESS FULL | TEST | 1005K|
> 5892K| | 3432 (4)| 00:00:42 |
> | 5 | TABLE ACCESS BY INDEX ROWID| TEST1 | 1 | 6
> | | 4 (0)| 00:00:01 |
> |* 6 | INDEX RANGE SCAN | TEST1_IND | 1 |
> | | 3 (0)| 00:00:01 |
> ---------------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation
> id):
> ---------------------------------------------------
>
> 2 -
> access("TEST"."ID"="TEST1"."ID")
> 6 -
> access("TEST1"."ID"=:B1)
>
> Statistics
> ----------------------------------------------------------
> 181 recursive
> calls
> 409089 db block
> gets
> 1216618 consistent
> gets
> 1 physical
> reads
> 99698676 redo
> size
> 402 bytes sent via SQL*Net to
> client
> 440 bytes received via SQL*Net from
> client
> 3 SQL*Net roundtrips to/from
> client
> 1 sorts
> (memory)
> 0 sorts
> (disk)
> 399999 rows
> processed
>
> -- Yes, it's a full table scan on TEST, but the index is used for the
> subquery.
>
> SQL> declare
> 2 v_count number;
> 3 cursor test_curs is
> 4 select test.id,test.col1 from test,test1
> 5 where test.id=test1.id;
> 6 begin
> 7 for test_cur in test_curs
> 8 loop
> 9 update test
> 10 set col1=test_cur.col1
> 11 where id=test_cur.id;
> 12 end loop;
> 13 commit;
> 14 end;
> 15 /
>
> PL/SQL procedure successfully completed.
>
> Elapsed: 00:02:50.68
> SQL>
>
> Oh, gee, it takes twice as long with PL/SQL and properly computed
> statistics to perform the same task. I can't understand how you
> determined, from your findings, that this will require over an hour to
> complete with SQL*Plus and only 5 minutes, regardless of the table
> size, in PL/SQL. Your test is flawed, obviously, possibly due to your
> use of analyze table in the absence of the corresponding analyze index
> command, as I should produce something similar to your output if it
> were a reliable example. But a doubling of the execution time through
> PL/SQL doesn't match with your 15 second difference (and, again, your
> times appear to be quite inflated for the task at hand).
>
> Post any proof you have for your contentions, as they're just
> unsupported claims at the moment.
>
> David Fitzjarrell

This is an interesting test. I just performed a slightly modified version of the same test on Oracle 10.2.0.2 running on Windows x64, and found similar results.

One possible explanation is the utilization of the temp tablespace while executing the single SQL statement that performs the update - the temp tablespace is not needed for the PL/SQL method. If there is intense competition for the hard disk (possibly caused by the effective SORT_AREA_SIZE/HASH_AREA_SIZE being too small), but the table and index blocks are in the buffer cache, it is theoretically possible for the PL/SQL solution to complete faster. Such a result may indicate an Oracle configuration issue.

The results from my test run:

set timing on

create table test (id number,col1 varchar2(20)) pctfree 90;

Table created.

Elapsed: 00:00:00.10

declare
v_number number :=1;
begin
loop
insert into test values (v_number, 'a'); v_number := v_number+1;
exit when v_number > 1000000;
end loop;
commit;
end;
/

PL/SQL procedure successfully completed.

Elapsed: 00:00:31.81

create table test1
as select * from test where id < 400000;

Table created.

Elapsed: 00:00:03.25

update test1 set col1='b';

399999 rows updated.

Elapsed: 00:00:07.00

commit;

Elapsed: 00:00:00.00

create index test_ind on test(id);

Index created.

Elapsed: 00:00:03.93

create index test1_ind on test1(id);

Elapsed: 00:00:01.09

exec dbms_stats.gather_table_stats(user, 'TEST', method_opt=>'for all indexed columns', cascade=>TRUE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:05.42

exec dbms_stats.gather_table_stats(user, 'TEST1', method_opt=>'for all indexed columns', cascade=>TRUE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.75

This is set to all so that the DBMS_XPLAN will return additional information alter session set statistics_level=all;

Session altered.

Elapsed: 00:00:00.00

update test

    set col1=(select col1 from test1
    where test.id=test1.id )
    where exists
    (select 1 from test1
     where test.id=test1.id);

399999 rows updated.

Elapsed: 00:00:23.40

SELECT
*
FROM
TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS'));

| Id  | Operation                    | Name      | Starts | E-Rows | A-
Rows |A-Time      | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/

M | Max-Tmp |

|   1 |  UPDATE                      | TEST      |      1 |
|      0 |00:00:23.39 |    1625K|  45541 |   2263 |       |
|          |         |
|*  2 |   HASH JOIN RIGHT SEMI       |           |      1 |
397K|    399K|00:00:06.86 |   15620 |  17276 |   2263 |    12M|
3870K|          |   19456 |
|   3 |    TABLE ACCESS FULL         | TEST1     |      1 |
397K|    399K|00:00:00.01 |     668 |     72 |      0 |       |
|          |         |
|   4 |    TABLE ACCESS FULL         | TEST      |      1 |   1003K|
1000K|00:00:03.13 |   14952 |  14941 |      0 |       |
|          |         |
|   5 |   TABLE ACCESS BY INDEX ROWID| TEST1     |    399K|      1
|    399K|00:00:06.79 |    1200K|   4435 |      0 |       |
|          |         |
|*  6 |    INDEX RANGE SCAN          | TEST1_IND |    399K|      1
|    399K|00:00:04.19 |     800K|   4140 |      0 |       |
|          |         |
-----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

2 - access("TEST"."ID"="TEST1"."ID")
6 - access("TEST1"."ID"=:B1)

--If I read the plan correctly, Oracle used 19.4MB of temp space, and performed 399,000 index range scans of the TEST1_IND index as well as one full tablescan for each of TEST1 and TEST.

declare
v_count number;
cursor test_curs is
select test.id,test.col1 from test,test1 where test.id=test1.id;
begin
for test_cur in test_curs
loop
  update test
  set col1=test_cur.col1
  where id=test_cur.id;
end loop;
commit;
end;
/

PL/SQL procedure successfully completed.

Elapsed: 00:02:18.65

SELECT
(120+18.65)/(23.40)
FROM
DUAL; (120+18.65)/(23.40)

5.92521368

The single UPDATE SQL statement completed almost 6 times faster than the PL/SQL code on my system.

It might be helpful for the OP to look at a DBMS_XPLAN to determine the step that is taking the longest in the single UPDATE SQL statement. As you pointed out, the OP is not analyzing the index.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc. Received on Mon Dec 10 2007 - 14:51:22 CST