Using DBMS_STATS makes query execute WORSE.

From: Alek Lapuc <obimbus_at_poczta.onet.pl>
Date: 4 Oct 2004 01:41:39 -0700
Message-ID: <7d3c208.0410040041.2fa7b8e9@posting.google.com>

Hi.

This subject returned from time to time on this group. I have searched the archive of groups on goole -- but suggested anwsers helped me little.
I cannot add any hints to the statement nor tell DB to use specific index. Query is hardcoded in application, that is delivered as out-of-the-box program with no source code. (To answer an obvious question: yes, we have a technical support license, however we failed to convince them that this is a bug.)

I have the following problem:

We have a Oracle 9.2.0 database on our system. There is a bunch of data in it (at this moment more than 150 GigaBytes) in approx. 500 tables.

And there is a problem with one querry on one table. Let's call the table TABLE_T.

Table TABLE_T has something around 36,000,000 (thirty six million) rows.
There are several columns, but one of them, let's call it COLUMN_C, is the most significant. It is of VARCHAR2(255) type. Table holds 4 (four) different values in this column. Each value is in around 25% of rows (giving around 8,000,000 rows per value).

One of client application performs a following query every time it starts:

SELECT DISTINCT a, b, c, d, COLUMN_C

           FROM TABLE_T
          WHERE TABLE_T.COLUMN_C = :1;

Initially there was no index on COLUMN_C -- and the query resulted in table scan, which is expected. Time of execution: approx. 15-20 minutes.

Than we have added nonunique index on COLUMN_C. Time of execution shrinked to approx. 2-3 minutes, which is acceptable.

But then again we have estimated statistics with following query: (in fact statistics were estimated for all 500 tables with statements like following)

BEGIN
SYS.DBMS_STATS.GATHER_TABLE_STATS (

      OwnName        => 'DBUSER'
     ,TabName        => 'TABLE_T'


    ,Estimate_Percent  => 33

    ,Degree            => 4

    ,Cascade           => TRUE

    ,No_Invalidate     => FALSE);

END;
/

After that optimizer always chooses to do full scan on TABLE_T, ignoring the index on COLUMN_C.

It cannot startup time longer than 10 minutes -- this is a timeout in a clustered system after which a program gets launched again. The previous copy is being killed -- so as a result the program never starts (and we have orphaned table full scans in DB...)

To end in an optimistic way, two things helped:

Deleting estimated statistics on TABLE_T. Index on COLUMN_C is used again. However having statistics gathered for all the table is suggested by the supplier of client applications that we are deploying/customizing.
Setting a nonunique index on ALL columns of TABLE_T in order that is used in the mentioned query (a, b, c, d, COLUMN_C). This is not acceptable from obvious reasons ;)

If someone should have any suggestions I would be more than gratful to hear them.

Best regards,

Aleksander Lapuc Received on Mon Oct 04 2004 - 03:41:39 CDT