Re: Optimizer issue - cost of full table scans

From: Pavel Ermakov <ocp.pauler_at_gmail.com>
Date: Mon, 13 Sep 2010 17:14:28 +0400
Message-ID: <AANLkTi=GC7pONOj1kyB0xU=hiTdYSrRjyQ4owaL7Wwp0_at_mail.gmail.com>



Hi

Try to gather table statistics with this approach:

exec
dbms_stats.gather_table_stats(ownname=>'GMMI_CDI','CUST_ADDRESS',estimate_percent=>null,method_opt=>'FOR ALL COLUMNS SIZE 1',cascade=>true)

exec
dbms_stats.gather_table_stats(ownname=>'GMMI_STG','NAME_ADDRESS_STG',estimate_percent=>null,method_opt=>'FOR ALL COLUMNS SIZE 1',cascade=>true)

Best regards, Pavel.

2010/9/10 Brad Peek <brad_peek_at_yahoo.com>

>
>
> Listers – Is there a way to “inform” the optimizer of the true cost of the
> full scans? I think there is, and I fully intend to RTFM, but I’m hoping
> for a quick answer….
>
>
>
> Background:
>
> Our company has been in production on an Exadata V2 quarter-rack (SATA
> drives) for several months now and have been generally pleased with
> performance. However, over the past few days we have had a number of
> queries that switched from hash joins to nested loop joins and the
> performance of these queries is MUCH worse.
>
>
>
> This is an ETL application using Informatica to apply batch updates from
> staging tables to our target tables.
>
>
>
> I’ve just started looking at this, but I have begun to wonder if part of
> the problem is that the optimizer is over-estimating the cost of full table
> scans. Right from the beginning (i.e. before this current problem) I
> noticed that the time estimates in the execution plans were significantly
> higher than our actual response times. My theory is that as our tables
> have grown we have reached a tipping point that has now caused the optimizer
> to favor nested loops, and that the inflated cost of the full scans is a
> contributing factor.
>
>
>
> I feel the fact that we are on Exadata is likely compounding the issue
> since full scans are much faster that non-Exadata while single-block reads
> are about the same (discounting the potential speed-up from flash cache).
>
>
>
> Most of these queries are quite simple. Below is a representative example:
>
>
>
> SELECT gmmi_stg.name_address_stg.batch_id
>
> , gmmi_stg.name_address_stg.first_name
>
> , gmmi_stg.name_address_stg.last_name
>
> , gmmi_stg.name_address_stg.oid_customer
>
> , gmmi_stg.name_address_stg.oid_address
>
> FROM gmmi_cdi.cust_address
>
> , gmmi_stg.name_address_stg
>
> WHERE cust_address.oid_customer (+) = name_address_stg.oid_customer
>
> AND cust_address.oid_address (+) = name_address_stg.oid_address
>
> AND name_address_stg.ident_matched = 'Y'
>
> AND name_address_stg.address_matched = 'Y'
>
> AND cust_address.oid_cust_address IS NULL
>
> AND name_address_stg.batch_id = (select 100908091626685.0 from
> dual) ;
>
>
>
> Since I know the SELECT FROM DUAL on the last line will look odd to some
> people, let me explain that we use that syntax to trick the optimizer into
> treating the literal as a bind variable rather than checking the value
> against the BATCH_ID column histogram. We found that the optimizer would
> greatly underestimate the cardinality if the value was not present when the
> statistics were gathered last. At any rate, I don’t think it is part of
> the current nested loop issue since that syntax has not changed.
>
>
>
> Note that If I “ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=200;” then it
> goes back to hash joins.
>
>
>
> Thanks in advance. BPeek
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Sep 13 2010 - 08:14:28 CDT

Original text of this message