Re: Direct read stopped suddenly

From: Pap <oracle.developer35_at_gmail.com>
Date: Wed, 25 May 2022 18:55:39 +0530
Message-ID: <CAEjw_fgd3EVTqvguMTiojz2sOT-GLA_rHiEK3=tRKObr+gA=yg_at_mail.gmail.com>



Also Sayan, as it appears to be availability or presence of certain percentage of blocks in the buffer cache out of total number of blocks of the table , is the deciding factor between the smart scan vs buffered read of an object. So is it good idea to have some kind of alerting done so as to catch this odd situation beforehand and then flush the object from the buffer cache proactively to avoid this issue?

On Wed, 25 May 2022, 3:51 pm Pap, <oracle.developer35_at_gmail.com> wrote:

> Thank you so much Sayan. Here the table size is <6GB whereas we have
> db_cache_size set as 70GB. So i believe this can no way be satisfy that
> VLOT i.e 5 times the buffer cache condition. So in that way it can noway go
> for direct read. But as we know from the statistics this query from
> dba_hist_sqlstat was going for cell smart scan in past days. So was there
> any other criteria too which was making it eligible for going for smart
> scan in past?
>
> And something related to howmuch percent of that object blocks present in
> buffer cache caused it to not go for smart scan this time? As because in
> this case i see out of total ~757K blocks(noted in dba_tables) 756K were
> noted by "ntcache" in the trace file. So it means there must be some
> criteria in which it would be checking what percent of blocks are still
> lying over in buffer cache/dirty and are not flushed into the disk and thus
> it would have taken the decision to go for buffer read rather direct
> read/smart scan. What is that limit?
>
> we have _small_table_threshold = 260356
>
> and _very_large_object_threshold= 500
>
>
> On Wed, 25 May 2022, 4:56 am Sayan Malakshinov, <xt.and.r_at_gmail.com>
> wrote:
>
>> Hi Pap,
>>
>> Do you mean to say on DB node-1 the object was fully cached whereas on
>>> node-2 and 3 and 4 it wasn't, and thus when the query was executed from
>>> node-2/3/4 it was going for smart scan whereas when it was executed from
>>> node-1 it was going for buffered read?
>>
>>
>> yes, absolutely correct
>>
>> And should we then flush it from the buffer cache but it can then happen
>>> again , so how should we handle this odd behaviour?
>>
>>
>> Yes, you can flush buffer cache, but also you can use "table_stats" hint
>> to force serial direct path reads: we know that Oracle choses serial direct
>> path reads when our segment is bigger than "_very_large_object_threshold"
>> percent of buffer cache. By default, it's equal to 500, i.e. 500%, so if
>> that table segment is 5 times bigger than buffer cache, Oracle should
>> choose direct reads.
>>
>> Simple example:
>> https://gist.github.com/xtender/3c2d608f0d8de6b63b78f814903c70ea
>>
>> SQL> _at_vparam_ _very_large_object_threshold
>>
>> NAME VALUE DEFLT TYPE
>> DESCRIPTION
>> ---------------------------------------- ------------ ------------
>> ------------ ------------------------------------------------------------
>> _very_large_object_threshold 500 TRUE number
>> upper threshold level of object size for direct reads
>>
>> SQL> select name,block_size,buffers from v$buffer_pool;
>>
>> NAME BLOCK_SIZE BUFFERS
>> -------------------- ---------- ----------
>> DEFAULT 8192 41118
>>
>> SQL> col vlot new_value vlot;
>> SQL> select buffers*5+1 as vlot from v$buffer_pool;
>>
>> VLOT
>> ----------
>> 205591
>>
>> SQL> create table t_direct as select 1 x from dual;
>>
>> Table created.
>>
>> SQL> select name,value from v$statname n,v$sesstat s where
>> s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads
>> direct';
>>
>> NAME
>> VALUE
>> ----------------------------------------------------------------
>> ----------
>> physical reads direct
>> 0
>>
>> SQL> select count(*) from t_direct;
>>
>> COUNT(*)
>> ----------
>> 1
>>
>> SQL> select name,value from v$statname n,v$sesstat s where
>> s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads
>> direct';
>>
>> NAME
>> VALUE
>> ----------------------------------------------------------------
>> ----------
>> physical reads direct
>> 0
>>
>> SQL> select/*+ table_stats(t_direct set rows=1000000 blocks=&vlot) */
>> count(*) from t_direct;
>>
>> COUNT(*)
>> ----------
>> 1
>>
>> SQL> select name,value from v$statname n,v$sesstat s where
>> s.sid=userenv('sid') and s.statistic#=n.statistic# and name='physical reads
>> direct';
>>
>> NAME
>> VALUE
>> ----------------------------------------------------------------
>> ----------
>> physical reads direct
>> *1*
>>
>> More info:
>> http://orasql.org/2019/04/16/correct-syntax-for-the-table_stats-hint/
>>
>> http://orasql.org/2013/03/07/just-test-of-adaptive-direct-path-reads-with-index_stats/
>>
>>
>> <http://orasql.org/2013/03/07/just-test-of-adaptive-direct-path-reads-with-index_stats/>
>> Full example:
>> https://gist.github.com/xtender/3c2d608f0d8de6b63b78f814903c70ea
>>
>>
>> On Tue, May 24, 2022 at 11:13 PM Pap <oracle.developer35_at_gmail.com>
>> wrote:
>>
>>> Thank you Sayan. This query has been running fine since the last
>>> 5-6months and always using cell smart scan , as i can see from 'offload
>>> returned bytes' column value >0 in dba_hist_sqlstat for this query and thus
>>> the elapsed time was alway <1second and never encountered such behaviour.
>>> Do you mean to say on DB node-1 the object was fully cached whereas on
>>> node-2 and 3 and 4 it wasn't, and thus when the query was executed from
>>> node-2/3/4 it was going for smart scan whereas when it was executed from
>>> node-1 it was going for buffered read? And should we then flush it from the
>>> buffer cache but it can then happen again , so how should we handle this
>>> odd behaviour?
>>>
>>> On Wed, May 25, 2022 at 1:28 AM Sayan Malakshinov <xt.and.r_at_gmail.com>
>>> wrote:
>>>
>>>> Hi Pap,
>>>>
>>>> ntcache is a number of blocks of that object in the buffer cache. Since
>>>> there are almost all blocks in the cache, oracle decides to do not use
>>>> direct reads.
>>>>
>>>>
>>>> Best regards,
>>>> Sayan Malakshinov
>>>> Oracle performance tuning expert
>>>> Oracle Database Developer Choice Award winner
>>>> Oracle ACE
>>>> http://orasql.org
>>>>
>>>> вт, 24 мая 2022 г., 22:29 Pap <oracle.developer35_at_gmail.com>:
>>>>
>>>>> Hello Listers, We have one legacy system on version 11.2.0.4 and it is
>>>>> planned to be migrated to 19C. It's a four node database. We have a select
>>>>> query suddenly stopped doing smartscan from one node and the response time
>>>>> goes from <1 seconds to >5seconds. And the surprising thing is , when we
>>>>> manually trigger the sql , it's doing a smart scan when triggered from
>>>>> other nodes but from one node only it's not doing that and the response
>>>>> time going high for that specific node. It seems that one query showing
>>>>> this behaviour and other running fine.Ideally my understanding was that it
>>>>> should not vary based on DB node as because the storage nodes are common
>>>>> across all the four DB nodes. So I'm wondering , why is it happening?
>>>>>
>>>>> Collected the 10358 trace and below is the trace. It does show a
>>>>> difference in the value of "ntcache" and "ntdist" for both the nodes and
>>>>> that is possibly driving the direct read/smartscan decision. So I wanted to
>>>>> understand what is that and why that value is different in both DB nodes
>>>>> and how to fix this odd behaviour?
>>>>>
>>>>> query:-
>>>>> SELECT c1, c8, c2, c3, c4, c5, c6, c7 from TAB1 where c1 = :1
>>>>>
>>>>> Plan hash value: 2108848690
>>>>>
>>>>> --------------------------------------------------------------------------------------------------------
>>>>> | Id | Operation | Name | Rows |
>>>>> Bytes | Cost (%CPU)| Time |
>>>>>
>>>>> --------------------------------------------------------------------------------------------------------
>>>>> | 0 | SELECT STATEMENT | | |
>>>>> | 1583 (100)| |
>>>>> | 1 | TABLE ACCESS STORAGE FULL| TAB1 | 9 |
>>>>> 711 | 1583 (1)| 00:00:19 |
>>>>>
>>>>> --------------------------------------------------------------------------------------------------------
>>>>> Query Block Name / Object Alias (identified by operation id):
>>>>> -------------------------------------------------------------
>>>>> 1 - SEL$1 / TAB1_at_SEL$1
>>>>>
>>>>>
>>>>> Trace from DB node which is doing smartscan:-
>>>>>
>>>>> NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read
>>>>> from caches(local/remote) and checking storage reduction factors (OLTP/EHCC
>>>>> Comp)
>>>>> NSMTIO: kcbdpc:DirectRead: tsn: 29, objd: 1607047, objn: 1607047
>>>>> ckpt: 1, nblks: 757301, ntcache: 10, ntdist:30
>>>>> Direct Path for tsn 29 objd 1607047 objn 1607047
>>>>> Direct Path 1 ckpt 1, nblks 757301 ntcache 10 ntdist 30
>>>>> Direct Path mndb 0 tdiob 12 txiob 0 tciob 66
>>>>> Direct path diomrc 128 dios 2
>>>>> NSMTIO: Additional Info: VLOT=65089150
>>>>> Object# = 1607047, Object_Size = 757301 blocks
>>>>> SqlId = 64d0vkmtjgq2x, plan_hash_value = 2108848690, Partition# = 0
>>>>> KCBO: prepared ksr msg for cid=131228, tsn=29, obj=1607047, 0
>>>>>
>>>>>
>>>>> Trace from DB node which is not doing smartscan:-
>>>>>
>>>>> NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read
>>>>> from caches(local/remote) and checking storage reduction factors (OLTP/EHCC
>>>>> Comp)
>>>>> NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]: tsn: 29, objd: 1607047,
>>>>> objn: 1607047
>>>>> ckpt: 0, nblks: 757301, ntcache: 756116, ntdist:1185
>>>>> Direct Path for tsn 29 objd 1607047 objn 1607047
>>>>> Direct Path 0 ckpt 0, nblks 757301 ntcache 756116 ntdist 1185
>>>>> Direct Path mndb 18 tdiob 10 txiob 0 tciob 70
>>>>> Direct path diomrc 128 dios 2
>>>>> NSMTIO: Additional Info: VLOT=65089150
>>>>> Object# = 1607047, Object_Size = 757301 blocks
>>>>> SqlId = 64d0vkmtjgq2x, plan_hash_value = 2108848690, Partition# = 0
>>>>>
>>>>
>>
>> --
>> Best regards,
>> Sayan Malakshinov
>> Oracle performance tuning engineer
>> Oracle ACE
>> http://orasql.org
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed May 25 2022 - 15:25:39 CEST

Original text of this message