Re: suddenly direct path read from small tables

From: Martin Berger <martin.a.berger_at_gmail.com>
Date: Wed, 26 Oct 2016 09:14:22 +0200
Message-ID: <CALH8A91Onvk1GzJDsRy+ficswVCC3igpQMFbRCkWt7CNqV4q1Q_at_mail.gmail.com>



Hi Samuel,

Do you agree to focus on the plan change you posted already? If you do, in the 2 plans, the number of estimated rows from accessing TABLA_A increased from 1 to 1M.
Can you check what the real number for this row might be? Is it close to 1 or the 1M?

Martin

2016-10-25 10:01 GMT+02:00 samuel guiñales cristobal < samuelg.cristobal_at_gmail.com>:

> > you sent a set of emails today, but I could not find an answer to the
> questions raised by Stefan, LS Cheng or me.
> sorry, what question i did not answer?
>
> >Based on your last email, you saw a change in execution plan.
> >As stated before, this has nothing to do with _stt!
>
> I know that parameter(small_table_threshold) impact on runing execution
> decision to do direct
> path read for FTS. no CBO, no stats.
>
> >If you want to focus on the reason for the change in your plan, you have
> to identify, what made the plan change?
> >Beside different optimizer environments I see the major reasons for
> different plans in changed statistics or (first) bind variables.
> >You will have some history about statistics in your database, but it's
> hard to identify optimizer environments or binds -
> >especially when the instance was restarted/crashed.
> >And probably you do not even care much why the plan changed, you want
> "your preferred" plan - or plans.
> >This leads me back to the reasons why a plan changes.
> >Anyhow, identify if you suffer changing statistics or binds.
> >Then address those.
> >It will not help to post some random values here, bring it down to a
> complete picture.
>
> A Index degradation for purge process some days before storage fail lead
> to do rebuild indexes and gather statistics
> for one of the databases with de direct path read problem.
>
> Very diferent problem now, impact more databases, couldnt restore stats,
> statistics history not available.
>
>
> ---
> Samuel Guiñales Cristobal <samuelg.cristobal_at_gmail.com>
> «Que tu alimento sea tu medicina, y la medicina tu alimento.» Hipócrates
>
> On 24 October 2016 at 20:59, Martin Berger <martin.a.berger_at_gmail.com>
> wrote:
>
>> Hi Samuel,
>>
>> you sent a set of emails today, but I could not find an answer to the
>> questions raised by Stefan, LS Cheng or me.
>>
>> Based on your last email, you saw a change in execution plan.
>> As stated before, this has nothing to do with _stt!
>> If you want to focus on the reason for the change in your plan, you have
>> to identify, what made the plan change?
>> Beside different optimizer environments I see the major reasons for
>> different plans in changed statistics or (first) bind variables.
>> You will have some history about statistics in your database, but it's
>> hard to identify optimizer environments or binds - especially when the
>> instance was restarted/crashed.
>>
>> And probably you do not even care much why the plan changed, you want
>> "your preferred" plan - or plans.
>>
>> This leads me back to the reasons why a plan changes.
>>
>> Anyhow, identify if you suffer changing statistics or binds.
>> Then address those.
>>
>> It will not help to post some random values here, bring it down to a
>> complete picture.
>>
>>
>> Martin
>>
>>
>> 2016-10-24 13:12 GMT+02:00 samuel guiñales cristobal <
>> samuelg.cristobal_at_gmail.com>:
>>
>>> For add more info.
>>>
>>> We´re reviewing also information related with query execution plans[1].
>>>
>>> We´ve detected that execution plans has changed on at least two
>>> instances
>>>
>>> and two queries on them after storage failure.
>>>
>>>
>>> Question is how turn back this behavewith out use workaround fakeing
>>> table stats
>>>
>>> or any other like setting "event 10949" and
>>> "_very_large_object_threshold"?
>>>
>>> is posible?
>>>
>>>
>>>
>>>
>>> [1]
>>>
>>> *Before incident:*
>>>
>>> * Buffer Gets: 34,492,899 45,746.6 9.05*
>>>
>>> * Disk Reads: 0 0.0 .00*
>>>
>>> ------------------------------------------------------------
>>> --------------------
>>>
>>> | Operation | PHV/Object Name | Rows |
>>> Bytes| Cost |
>>>
>>> ------------------------------------------------------------
>>> --------------------
>>>
>>> |SELECT STATEMENT |----- 3237944248 ----| |
>>> | 291 |
>>>
>>> |NESTED LOOPS | | |
>>> | |
>>>
>>> | NESTED LOOPS | | 279 |
>>> 36K| 291 |
>>>
>>> | INDEX RANGE SCAN |INDEX1 | 279 |
>>> 15K| 9 |
>>>
>>> | INDEX UNIQUE SCAN |INDEX2 | 1 |
>>> | 1 |
>>>
>>> | TABLE ACCESS BY INDEX ROWID |TABLA _A | 1 | 79
>>> | 2 |
>>>
>>> --------------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> *During and after the incident*
>>>
>>> * Buffer Gets: 7,235,001 39,752.8 8.84*
>>>
>>> * Disk Reads: 7,294,145 40,077.7 92.56*
>>>
>>> ------------------------------------------------------------
>>> --------------------
>>>
>>> | Operation | PHV/Object Name | Rows |
>>> Bytes| Cost |
>>>
>>> ------------------------------------------------------------
>>> --------------------
>>>
>>> |SELECT STATEMENT |----- 2237180378 ----| |
>>> | 5080 |
>>>
>>> |HASH JOIN | | 6K|
>>> 847K| 5080 |
>>>
>>> | INDEX RANGE SCAN |INDEX1 | 6K|
>>> 351K| 127 |
>>>
>>> *| TABLE ACCESS FULL |TABLA_A | 1M| 130M|
>>> 4943 |*
>>>
>>> ------------------------------------------------------------
>>> --------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---
>>> Samuel Guiñales Cristobal <samuelg.cristobal_at_gmail.com>
>>> «Que tu alimento sea tu medicina, y la medicina tu alimento.» Hipócrates
>>>
>>> On 24 October 2016 at 12:44, samuel guiñales cristobal <
>>> samuelg.cristobal_at_gmail.com> wrote:
>>>
>>>> workaround applyed in this case was this to go under
>>>> _small_table_threshold(2416) like in the other database.
>>>>
>>>> EXEC DBMS_STATS.SET_TABLE_STATS('user','table_name',*numblks=>1000*);
>>>>
>>>>
>>>> Regards
>>>> ---
>>>> Samuel Guiñales Cristobal <samuelg.cristobal_at_gmail.com>
>>>> «Que tu alimento sea tu medicina, y la medicina tu alimento.» Hipócrates
>>>>
>>>> On 24 October 2016 at 12:08, samuel guiñales cristobal <
>>>> samuelg.cristobal_at_gmail.com> wrote:
>>>>
>>>>> yes, my mistake to paste, cause this problem afecting two
>>>>> databases(same direct path read io eait issue) i pasted
>>>>> wrong the other databsase info sorry i correct:
>>>>>
>>>>> table is 41MB, number of rows: 35473
>>>>>
>>>>> SELECT name,block_size,buffers FROM v$buffer_pool; ---> 120832 buffers
>>>>> SELECT blocks FROM dba_tables WHERE table_name = 'NOMBRE_TABLA''; -->
>>>>> 5158
>>>>> _small_table_threshold 2416
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Samuel Guiñales Cristobal <samuelg.cristobal_at_gmail.com>
>>>>> «Que tu alimento sea tu medicina, y la medicina tu
>>>>> alimento.» Hipócrates
>>>>>
>>>>> On 24 October 2016 at 11:47, Ls Cheng <exriscer_at_gmail.com> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> You only have 1GB db cache and 1680MB sga max size but in your first
>>>>>> post you said you have 1451136 database memory buffer which is 11337MB if
>>>>>> you have 8K db block size which is clearly wrong. That confused probably
>>>>>> all of us because if your table is 273MB clearly it passes the 2% threshold
>>>>>> (it's actually 27% of 1024MB) therefore it's perfectly a direct path read
>>>>>> candidate. Why it does so probably it changed from index excution plan to
>>>>>> FTS plan.
>>>>>>
>>>>>> BR
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 24, 2016 at 11:40 AM, samuel guiñales cristobal <
>>>>>> samuelg.cristobal_at_gmail.com> wrote:
>>>>>>
>>>>>>> Hi Cheng,
>>>>>>>
>>>>>>> ASMM no doubt. allways same little mem, why now problems?
>>>>>>>
>>>>>>> sga_max_size big integer 1680M
>>>>>>> sga_target big integer 0
>>>>>>>
>>>>>>> NAME SIZE MB
>>>>>>> -------------------- ----------
>>>>>>> Shared Pool Size 480
>>>>>>> Large Pool Size 32
>>>>>>> Java Pool Size 128
>>>>>>> sga_max_size: 1680
>>>>>>> shared_pool_size: 480
>>>>>>> large_pool_size: 32
>>>>>>> java_pool_size: 128
>>>>>>> db_cache_size: 1024
>>>>>>> log_buffer: 6.140625
>>>>>>> sort_area_size: .0625
>>>>>>> pga_aggregate_target: 256
>>>>>>>
>>>>>>> % Free SharedPool
>>>>>>> -----------------
>>>>>>> 11.54
>>>>>>>
>>>>>>> NAME POOL SIZE MB
>>>>>>> -------------------- ------------ ----------
>>>>>>> free memory shared pool 32.9287186
>>>>>>> free memory large pool 32
>>>>>>> free memory java pool 128
>>>>>>> row cache shared pool 8.23996735
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>> Samuel Guiñales Cristobal <samuelg.cristobal_at_gmail.com>
>>>>>>> «Que tu alimento sea tu medicina, y la medicina tu
>>>>>>> alimento.» Hipócrates
>>>>>>>
>>>>>>> On 24 October 2016 at 11:28, Ls Cheng <exriscer_at_gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> AMM => memory_target
>>>>>>>> ASMM => sga_target
>>>>>>>>
>>>>>>>> Do you use one of them? I guess so otherwise you would not have
>>>>>>>> 1451136 db block buffers with 1GB db_cache_size. What Martin is suggesting
>>>>>>>> is that if you dont have a large enough minimum db_cache_size then due to
>>>>>>>> ASMM or AMM nature you could end up with a quite small db cache size
>>>>>>>> therefore a table which was ont 2% of db cache can suddently become so and
>>>>>>>> become a direct path read candidate.
>>>>>>>>
>>>>>>>> Aggregate v$sgastat by pool and show us the output we we can see
>>>>>>>> the current sga distribution.
>>>>>>>>
>>>>>>>> BR
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 24, 2016 at 11:21 AM, samuel guiñales cristobal <
>>>>>>>> samuelg.cristobal_at_gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Martin,
>>>>>>>>>
>>>>>>>>> >Not only the number of blocks in cache can be very volatile (and
>>>>>>>>> quite small when the instance started).
>>>>>>>>> >Do you have _small_table_threshold set in your spfile or is it
>>>>>>>>> calculated at startup?
>>>>>>>>> not in spfile, calculated by instance.
>>>>>>>>>
>>>>>>>>> >If not, do run ASMM? If yes, do you set a minimum value for
>>>>>>>>> buffer cache?
>>>>>>>>> no AMM, last start buffer cache size:
>>>>>>>>> db_cache_size big integer 1G
>>>>>>>>> never change in this database
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> >As _small_table_threshold is derived from buffer bache size (2%
>>>>>>>>> afaik), and volatile buffer cache size, it just can happen it was different
>>>>>>>>> >in previous instances, and just due to the crash other parameters
>>>>>>>>> are effective now.
>>>>>>>>> >You can check this in your instances alert.log
>>>>>>>>>
>>>>>>>>> checked all starts in alertlog, allways 1G db_cache_size, no
>>>>>>>>> volatil memory change (no ALTER SYTEM of memory in alertlog)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> >I just threw a lot of details in your direction, maybe let's
>>>>>>>>> start to clarify if it's an optimizer/Plan issue or direct/scattered read?
>>>>>>>>> thanks Martin, after woraround we fix problem, but FTS continue in
>>>>>>>>> memory :(
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> Samuel Guiñales Cristobal <samuelg.cristobal_at_gmail.com>
>>>>>>>>> «Que tu alimento sea tu medicina, y la medicina tu
>>>>>>>>> alimento.» Hipócrates
>>>>>>>>>
>>>>>>>>> On 23 October 2016 at 21:28, Martin Berger <
>>>>>>>>> martin.a.berger_at_gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Samuel,
>>>>>>>>>>
>>>>>>>>>> there can be various reasons why a direct path read is preferred
>>>>>>>>>> against filling buffer cache.
>>>>>>>>>> Just as a reference, here is Tanel Põders article about it.
>>>>>>>>>> http://blog.tanelpoder.com/2012/09/03/optimizer-statistics-d
>>>>>>>>>> riven-direct-path-read-decision-for-full-table-scans-_direct
>>>>>>>>>> _read_decision_statistics_driven/
>>>>>>>>>>
>>>>>>>>>> First of all it's not an optimizer decision. That's the reason
>>>>>>>>>> why rebuilding index didn't help.
>>>>>>>>>> As Tanel (and your workaround) showed, new/different statistics
>>>>>>>>>> can change the behaviour.
>>>>>>>>>>
>>>>>>>>>> So let's separate things:
>>>>>>>>>> You jump from "direct path read" to "FTS" - can you please check,
>>>>>>>>>> if the PLAN changed?
>>>>>>>>>> In this case you have to identify, why the optimizer picked a
>>>>>>>>>> different plan.
>>>>>>>>>>
>>>>>>>>>> Another question would be why it flips from scattered read to
>>>>>>>>>> direct read. That's more about the value of _small_table_threshold, number
>>>>>>>>>> of blocks in buffer cache and others.
>>>>>>>>>> Not only the number of blocks in cache can be very volatile (and
>>>>>>>>>> quite small when the instance started).
>>>>>>>>>> Do you have _small_table_threshold set in your spfile or is it
>>>>>>>>>> calculated at startup?
>>>>>>>>>> If not, do run ASMM? If yes, do you set a minimum value for
>>>>>>>>>> buffer cache?
>>>>>>>>>> As _small_table_threshold is derived from buffer bache size (2%
>>>>>>>>>> afaik), and volatile buffer cache size, it just can happen it was different
>>>>>>>>>> in previous instances, and just due to the crash other parameters are
>>>>>>>>>> effective now.
>>>>>>>>>> You can check this in your instances alert.log
>>>>>>>>>>
>>>>>>>>>> I just threw a lot of details in your direction, maybe let's
>>>>>>>>>> start to clarify if it's an optimizer/Plan issue or direct/scattered read?
>>>>>>>>>>
>>>>>>>>>> Then we can dig deeper.
>>>>>>>>>>
>>>>>>>>>> best regards,
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2016-10-23 20:31 GMT+02:00 samuel guiñales cristobal <
>>>>>>>>>> samuelg.cristobal_at_gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Recently we suffer a storage problem in RAC, all instances down.
>>>>>>>>>>> After recover, detected high "direct path read" from some
>>>>>>>>>>> small tables in some data bases, OS I/O wait 20-40%, hard
>>>>>>>>>>> performance problem.
>>>>>>>>>>>
>>>>>>>>>>> Rebuilding indexes and runing statistics in this tables not
>>>>>>>>>>> solve problem.
>>>>>>>>>>> execution plan changed to do Full table scan and index not used
>>>>>>>>>>> like before.
>>>>>>>>>>>
>>>>>>>>>>> maybe optimizer not involve and is parameter
>>>>>>>>>>> _small_table_threshold?
>>>>>>>>>>> but why now start psyh reads?
>>>>>>>>>>>
>>>>>>>>>>> *some info of one database and workaround :*
>>>>>>>>>>>
>>>>>>>>>>> version............................................: 11.2.0.4
>>>>>>>>>>> OS....................................................: Linux
>>>>>>>>>>> x86-64
>>>>>>>>>>> info: no tunning and diagnostic pack
>>>>>>>>>>> _small_table_threshold..............: 29022
>>>>>>>>>>> Table with high psy rds.(almos all, 95%)
>>>>>>>>>>> -Table size................:
>>>>>>>>>>> 273M
>>>>>>>>>>> -Table block num .....: 34730
>>>>>>>>>>> db memory buffers............................ : 1451136
>>>>>>>>>>>
>>>>>>>>>>> we apply *workaround*,we fake table stats so it looks like that
>>>>>>>>>>> there’s only 20K blocks for that table so avoid phy reads:
>>>>>>>>>>>
>>>>>>>>>>> EXEC DBMS_STATS.SET_TABLE_STATS('user','table_name',*numblks=>20000*);
>>>>>>>>>>>
>>>>>>>>>>> any idea to solve that FTS? why after storage problem?
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>>
>>>>>>>>>>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Oct 26 2016 - 09:14:22 CEST

Original text of this message