Re: query enhancement

From: Frank van Bortel <frank.van.bortel_at_gmail.com>
Date: Sat, 16 Jun 2007 14:48:52 +0200
Message-ID: <f50m7i$pac$1@news3.zwoll1.ov.home.nl>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

keeling wrote:
> On Jun 15, 11:24 am, Frank van Bortel <frank.van.bor..._at_gmail.com>
> wrote:
> keeling wrote:

>>>> index on this column. As a side note, I'm testing for the presence of
>>>> null, because if found, I update all 'null' values with '-1'. I'm
>>>> doing this because I intend to use this column as part of a primary
>>>> key constraint.

> Totally unnecessary:
> update table set column_a =-1 where column_a is null.
>
> Fastest method ever.
> Still want to know how many? Use sql%rowcount after the update,
> it holds the number of changed records
>

> some of my customer's table row count approaches 100 million. The
> approach last suggested has been tried and deemed unacceptable due to
> duration of update. Note, this functionally exists in a script that
> upgrades a schema to be compliant with the latest version of our
> software. customers may run this script repeatedly; I don't want them
> to incur the pain of doing the update unnecessarily.

Nonsense - your update will only last longer. 100M records isn't all that much, although finding out which ones are NULL will *always* end up in a FTS as NULLS are not indexed.

If you insist on small steps, because your have you considered:

update table set column_a =-1 where column_a is null and rownum < 100000 (or another arbitrary small number)
update table set column_a =-1 where column_a is null and <your PK_column(s)> between <two_values>

Checking:
SQL*Plus: Release 10.2.0.3.0 - Production on Sat Jun 16 12:18:25 2007

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production with the Partitioning, OLAP and Data Mining options

SQL> create table bla (owner varchar2(30) not null, id number not null, empty_col varchar2(10)) tablespace users; Table created.

SQL> insert into bla(owner, id) select owner, object_id from all_objects; 58200 rows created.

SQL> insert into bla select * from bla;
58200 rows created.
[repeated several times...]

SQL> create index bla1 on bla(id);
Index created.

SQL> update bla set empty_col='filled' where mod(id,3)=0; 4964096 rows updated.

SQL> exec dbms_stats.gather_table_stats( user, 'BLA', method_opt =>'for all indexed columns', cascade => true ); PL/SQL procedure successfully completed.

SQL> delete from plan_table;
0 rows deleted.

SQL> select count(*) from bla;
COUNT(*)
- ----------
14899200

Not 100M, but 14M rows - should be significant enough... And about one in three (4,964,096/14,899,200) is NOT NULL, 2/3 are NULL:

SQL> update bla set empty_col='Not Empty' where empty_col is null; 9935104 rows updated.
Elapsed: 00:10:42.78

That is not so bad - not even 11 minutes for almost 10M rows.

SQL> explain plan for
2 update bla set empty_col='Not Empty' where empty_col is null; Explained.

SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT
-

Plan hash value: 1761968933

|   0 | UPDATE STATEMENT   |      |   743K|  5082K|  8386   (2)| 00:01:41 |
|   1 |  UPDATE            | BLA  |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| BLA  |   743K|  5082K|  8386   (2)| 00:01:41 |

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT

-
--------------------------------------------------------------------------------
- ---------------------------------------------------

2 - filter("EMPTY_COL" IS NULL)
14 rows selected.

FTS - of course, what else. Let's try the FBI thing (provided you can add FBI):

SQL> create index fbi on bla (nvl(empty_col,'NULL')); Index created.
Elapsed: 00:01:04.95

SQL> exec dbms_stats.gather_table_stats( user, 'BLA',cascade => true ); PL/SQL procedure successfully completed. Elapsed: 00:00:35.54

SQL> update bla set empty_col='Not Empty' where nvl(empty_col,'NULL')='NULL';
Well, I killed that one after 45 minutes.

SQL> explain plan for
2 update bla set empty_col='Not Empty' where nvl(empty_col,'NULL')='NULL';

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-

Plan hash value: 1761968933

|   0 | UPDATE STATEMENT   |      |  9829K|    37M| 20016   (1)| 00:04:01 |
|   1 |  UPDATE            | BLA  |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| BLA  |  9829K|    37M| 20016   (1)| 00:04:01 |

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT

-
--------------------------------------------------------------------------------
- ---------------------------------------------------

2 - filter(NVL("EMPTY_COL",'NULL')='NULL')

14 rows selected.

Still FTS - which figures, as almost 2/3rd of the table is to be updated. Reading 14M and updating 9M while doing so, is cheaper than reading the index for 9M, and updating a table of 9M.

Your data may be distributed differently, so test! - anyway, a straight update is the fastest way to go - even with a *select* using a FBI, that is faster, the update (and maintence on the index) is far slower!.
I did this on an intel E6600 based PC, Win XP Pro, single HD; Total System Global Area 629145600 bytes

Fixed Size                  1292132 bytes
Variable Size             318769308 bytes
Database Buffers          301989888 bytes
Redo Buffers                7094272 bytes

-- Regards, Frank van Bortel

Top-posting is one way to shut me up...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)

iD8DBQFGc9w0Lw8L4IAs830RAroVAJ9+0TF89hCQ7hqzvFBkt/Ph6HcsnQCfYepR wI/lo2cFkF4mC8POhqSQVvQ=
=qp3P
-----END PGP SIGNATURE----- Received on Sat Jun 16 2007 - 07:48:52 CDT