Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Using DBMS_STATS makes query execute WORSE.

Re: Using DBMS_STATS makes query execute WORSE.

From: Howard J. Rogers <howardjr_at_dizwell.com>
Date: Mon, 04 Oct 2004 09:21:05 +1000
Message-Id: <4161db8e$0$20580$afc38c87@news.optusnet.com.au>


Bobby Durrett wrote:

> You could try adjusting these init.ora parameters:
> 
> DB_FILE_MULTIBLOCK_READ_COUNT
> OPTIMIZER_INDEX_COST_ADJ
> 
> You can use these to encourage the optimizer to choose an index over a
> full table scan.
> 
> - Bobby

>> I have the following problem:
>>
>> We have a Oracle 9.2.0 database on our system. There is a bunch of
>> data in it (at this moment more than 150 GigaBytes) in approx. 500
>> tables.
>>
>> And there is a problem with one querry on one table. Let's call the
>> table TABLE_T.
>>
>> Table TABLE_T has something around 36,000,000 (thirty six million)
>> rows.
>> There are several columns, but one of them, let's call it COLUMN_C, is
>> the most significant. It is of VARCHAR2(255) type. Table holds 4
>> (four) different values in this column. Each value is in around 25% of
>> rows (giving around 8,000,000 rows per value).
>>
>> One of client application performs a following query every time it
>> starts:
>>
>> SELECT DISTINCT a, b, c, d, COLUMN_C
>> FROM TABLE_T
>> WHERE TABLE_T.COLUMN_C = :1;
>>
>> Initially there was no index on COLUMN_C -- and the query resulted in
>> table scan, which is expected. Time of execution: approx. 15-20
>> minutes.
>>
>> Than we have added nonunique index on COLUMN_C. Time of execution
>> shrinked to approx. 2-3 minutes, which is acceptable.
>>
>> But then again we have estimated statistics with following query: (in
>> fact statistics were estimated for all 500 tables with statements like
>> following)
>>
>> BEGIN
>> SYS.DBMS_STATS.GATHER_TABLE_STATS (
>> OwnName => 'DBUSER'
>> ,TabName => 'TABLE_T'
>> ,Estimate_Percent => 33
>> ,Degree => 4
>> ,Cascade => TRUE
>> ,No_Invalidate => FALSE);
>> END;
>> /
>>
>> After that optimizer always chooses to do full scan on TABLE_T,
>> ignoring the index on COLUMN_C.

And quite right too! When there were no statistics, the optimiser was using rule-based optimiser rules which state (amongst other things) 'if there's an index, use it... no matter how dumb it is to do so'. Once you calculate statistics, the optimiser is finally aware of how dumb it is to use an index to retrieve one quarter of the rows. Generally speaking, the cost-based optimiser only wants to use indexes that are selective for around 2 to 5% of the table rows, not 25%.

So when the statistics are in place, the optimiser is correctly deducing that it's going to be visiting much of the table in any case, so why not just do so and bypass the index altogether.

>> It cannot startup time longer than 10 minutes -- this is a timeout in
>> a clustered system after which a program gets launched again. The
>> previous copy is being killed -- so as a result the program never
>> starts (and we have orphaned table full scans in DB...)
>>
>> To end in an optimistic way, two things helped:
>>
>> 1. Deleting estimated statistics on TABLE_T. Index on COLUMN_C is used
>> again. However having statistics gathered for all the table is
>> suggested by the supplier of client applications that we are
>> deploying/customizing.

Did they suggest the index? Apparently not, given that you said you were responsible for creating it in the first place... now you know why vendors get a bit iffy when users start mucking around with their design! And no wonder this particular vendor refuses to record the thing as a bug!!

However, you report that things are working fine with the index, and poorly without it... so have you collected system statistics, to let the optimiser know about your CPU and I/O load patterns? Bobby's suggestions to adjust the two parameters he mentioned will also persuade the optimiser that full scans are more expensive, and hence index access a good idea -though they will do so for all queries, not just this one, which maybe is not such a good idea. Point is, perhaps the optimiser is just a bit too optimistic about how cheap full table scans are generally, and if only it knew the truth of the matter, it might volunteer to use the index.

Have you tried creating a bitmapped index on the table? They are good for low cardinality columns such as yours (but are hopeless with dealing with lots of DML, so if this is a DML-intensive table, scrap that idea).

Can you partition the table?

Ultimately, it might just be easier and cheaper to leave the statistics out for this table. Alternatively, if you are desperate to have statistics for everything (and if you ever migrate to 10g this is going to be a major consideration), have a look at DBMS_STATS.SET_TABLE_STATS, which allows you to write your own statistics for the table, even though you may be lying through your teeth as you do so. If, for example, you were to use this procedure to tell the optimiser that there were 4 million possible values for the column instead of 4, I think you will find the index not only looks attractice to the optimiser, but looks positively alluring.

Regards
HJR
>>
>> 2. Setting a nonunique index on ALL columns of TABLE_T in order that
>> is used in the mentioned query (a, b, c, d, COLUMN_C). This is not
>> acceptable from obvious reasons ;)
>>
>> If someone should have any suggestions I would be more than gratful to
>> hear them.
>>
>> Best regards,
>>
>> Aleksander Lapuc
Received on Sun Oct 03 2004 - 18:21:05 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US