RE: Simple Insert - very slow

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 23 May 2019 11:29:50 -0400
Message-ID: <049601d5117c$5cc6d890$165489b0$_at_rsiz.com>


specifically, this condition has been seen with subpartitioning hash on unique_id (16) used to create multiple concurrent insert points, followed by massive deletes on monotonically increasing unique_id so that the deleted hashed widely to all 16 storage threads.

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Mark W. Farnham
Sent: Thursday, May 23, 2019 11:12 AM
To: 'Jonathan Lewis'; 'Oracle-L Freelists' Subject: RE: Simple Insert - very slow

nods.
ergo the words massive and uncommitted.

definitely an edge case.
-----Original Message-----
From: Jonathan Lewis [mailto:jonathan_at_jlcomp.demon.co.uk] Sent: Thursday, May 23, 2019 10:46 AM
To: Mark W. Farnham; 'Oracle-L Freelists' Subject: Re: Simple Insert - very slow

Mark,

SInce it's 12.2 I think we'd see a relatively large value for "ASSM gsp:Optimized reject DB" unless a theoretical very large uncommitted delete made space empty in a very large number of blocks that were extremely widely scattered so that no two data blocks with deleted rows appeared in the same L1 bitmap block. (i.e. something like 128 blocks between every pair of blocks with not yet committed free space).

Unless, possibly, the reject cache has been disabled..

Regards
Jonathan Lewis



From: Mark W. Farnham <mwf_at_rsiz.com>
Sent: 23 May 2019 13:27:51
To: Jonathan Lewis; 'Oracle-L Freelists' Subject: RE: Simple Insert - very slow

Perchance do you have any massive uncommitted large deletes in the same tablespace?

IF so, to rule out a particular ASSM bug (and nearly everyone is using ASSM to manage free space lists now, right?) you could try creating a freelist managed tablespace, slapping the table over there (or a clone), and see how it goes there.

That is a "shot in the dark." Don't let that disrupt following JL's advice on a stepwise solution. But if you have resources, you can clone the table and mirror the processes operating on the copy without much human work and without interrupting your search for a root cause.

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Jonathan Lewis
Sent: Thursday, May 23, 2019 5:55 AM
To: 'Oracle-L Freelists'
Subject: Re: Simple Insert - very slow

Sanjay,

I keep forgetting that the "cell physical block read" (etc.) events don't tell you about database block addresses.

It would be sensible to find out something about which blocks were being read - one option would be to use event 10200 for a brief period for that SQL_ID, but since you've got lost write protection enabled we could take advantage of that and dump some redo.

The command is of the form:

alter system dump logfile '/u01/app/oracle/oradata/orcl12c/redo03.log'

        layer 23 opcode 2
        rba min 2781    1
        rba max 2781    1000

;

Layer 23 opcode 2 is the block read record so that is fixed The file name is any online or archived redo log name you fancy the rba min/max has to specify the correct sequence number for the file name (v$log.sequence# / v$log_history.sequence#) the 1, 1000 that I've shown is to limit the dump to the first 1,000 blocks of the log file - since you have 6,000 BRRs per second you might want to limit the dump to a similar number of redo blocks.

A BRR (block read record) looks like:

REDO RECORD - Thread:1 RBA: 0x000add.00000019.01b0 LEN: 0x004c VLD: 0x10 CON_UID: 2846920952

SCN: 0x00000000025a7c13 SUBSCN:  1 05/23/2019 10:42:51 CHANGE #1 CON_ID:3
TYP:2 CLS:6 AFN:9 DBA:0x00407930 OBJ:40 SCN:0x00000000001a1e2a SEQ:2 OP:23.2
ENC:0 RBL:0 FLG:0x0000  Block Read - afn: 9 rdba: 0x00407930
BFT:(1024,4225328) non-BFT:(1,31024)
              scn: 0x00000000001a1e2a seq: 0x02
              flags: 0x00000006 ( dlog ckval )
              where: qeilwh05: qeilbk

The interesting information is likely to be

a)  the TYP and CLS
b)  the rdba - translated to the file/block (BFT is "big file tablespace")
c)  where

What we're interested in is whether Oracle keeps re-reading the same blocks time , and if so what type they are and from which function the read is requested. (And Stefan might be able to tell you the signficance of the function if any one stands out).

Regards
Jonathan Lewis



From: Sanjay Mishra <smishra_97_at_yahoo.com> Sent: 22 May 2019 23:22
To: Jonathan Lewis; 'Oracle-L Freelists'; Mark W. Farnham Subject: Re: Simple Insert - very slow

Mark/Jonathan

When I created a new table with the same compression, Issue is not appearing and not sure if this will come back again after heavily DML operation. As it was fine for several days after imported earlier and suddenly started behaving badly. I will continue to do the test with the original table as suggested above by Stefan/Jonathan so as to make sure RCA is done and the same is not going to pop up again

Tx
Sanjay
On Monday, May 20, 2019, 9:29:15 AM EDT, Mark W. Farnham <mwf_at_rsiz.com> wrote:

Taken all together, I would advocate copying this single table to a tablespace brand new, all its own and re-running the test.

That will quickly factor out accumulated problems from one time run problems. Whether you do this as a clone (not the real table) or for real is an interesting question. A test clone would avoid other activities updating in competition. A move "for real" might make your problem go away without fully learning the root cause.

This would help focus on the nature of your actual problem, together with making it easier to see if the MOS note referenced in the thread is relevant.

The difference between a Millsap trace on the current object versus the trace of the same updates to a clone of the table should be very illuminating.

-----Original Message-----
From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org> [mailto:oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>] On Behalf Of Jonathan Lewis
Sent: Monday, May 20, 2019 7:00 AM
To: Oracle-L Freelists
Subject: Re: Simple Insert - very slow

Sanjay,

I don't use snapper often enough to be sure of exactly how it works, but your figures for a "1 second single snapshot" suggest that it took nearly 2 seconds to do a one second snapshot - which could indicate that the machine is overloaded and work outside the work you are doing is impacting in your attempt to load data.

The things I note about the figures are:

  1. you read 6,111 blocks - and decrypt 6,110 blocks (so presumably the table is either encrypted, or in an enctryped tablespace)
  2. you generate 6,120 redo entries of which 6,110 are for lost write detection (so I think you've got "db_lost_write_protect = full" (or your table is in the SYSTEM tablespace !)
  3. you report only 11 db block changes and 750 bytes of "other" redo - so you haven't managed to insert many rows, and
  4. you've got only 3 "gc current block received" but waited 629 times on "gc current block busy" - which is odd, and also suggests that something else is hammering away at the table you're trying to insert to.
  5. you've examined 2,478 level 1 bitmap blocks leading to 12,388 data block rejections - so there's a massive conflict between the space the bitmap says is available and the space that seems to be available when the session checks the targetted blocks. (This is probably why Stefan is suggesting stack tracing) to find out why there's such a discrepancy between the bitmaps and the actual blocks. I note from an earlier post, though that the table is compressed and has a column txn_upd_dt. If anything is updating the rows in this table you're likely to end up with lots of migrated rows, and there used to be a complete foul-up with migrated rows, ITLs and ASSM that resulted in a major collision and waste of resources.

You've got some trace data for the insert - and identified the blocks being read as from the table. Do you get file and block ids from the trace ? If so then (a) do you see the same blocks being re-read many times over the course of a couple of minutes, and (b) can you dump a few of the blocks that are read (especially if they are in the middle of a stream of consecute block ids) so that we can see if they are the target of a lot of migration and have large numbers of ITL entries (check for "itc" and the number following it).

Regards
Jonathan Lewis



From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org> <oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>> on behalf of Stefan Koehler <contact_at_soocs.de<mailto:contact_at_soocs.de>> Sent: 20 May 2019 10:34
To: Sanjay Mishra
Cc: Oracle-L Freelists
Subject: Re: Simple Insert - very slow

Hello Sanjay,
it seems like this is really an ASSM issue based on the stats of "ASSM gsp*" and "ASSM cbk*".

Now it would be time to gather stack traces and check in more detail with the already mentioned trace events 10320 and/or 10612.

You can also try to disable the ASSM rejection cache (_enable_rejection_cache) or set event 10019 to level 1 but these would be just some shot in the dark.

Best Regards
Stefan Koehler

Independent Oracle performance consultant and researcher Website: http://www.soocs.de
Twitter: _at_OracleSK

> Sanjay Mishra <smishra_97_at_yahoo.com<mailto:smishra_97_at_yahoo.com>> hat
> am 18. Mai 2019 um 19:08

geschrieben:
>
>
> Stefan
>
> Thanks for good document created by you to troubleshoot. Need to check
> as
what is best way to interpret snapper data but pasting the details and if someone can provide any better guidance. Will also check more on Tanel site for any demo to interpret the snapper report. As per Pages 20 and Explain plan on Page 21 is exactly the same scenario.
>
>
> Sampling SID 1070 with interval 1 seconds, taking 1 snapshots...
>
>
>





> SID _at_INST, USERNAME , TYPE, STATISTIC
, DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES
>




> 1070 _at_2, SYSADMIN, STAT, session logical reads
, 13865, 7.73k, , , , , 14.1k total buffer visits
> 1070 _at_2, SYSADMIN, STAT, user I/O wait time
, 141, 78.65, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, non-idle wait time
, 141, 78.65, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, non-idle wait count
, 12230, 6.82k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, physical read total IO requests
, 6112, 3.41k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, physical read requests optimized
, 6111, 3.41k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, physical read total bytes optimized
, 50069504, 27.93M, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, physical read total bytes
, 50069504, 27.93M, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, cell physical IO interconnect bytes
, 50069504, 27.93M, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, gcs messages sent
, 3, 1.67, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, db block gets
, 13860, 7.73k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, db block gets from cache
, 13860, 7.73k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, db block gets from cache (fastpath)
, 7737, 4.32k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, consistent gets
, 1, .56, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, consistent gets from cache
, 1, .56, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, consistent gets pin
, 1, .56, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, consistent gets pin (fastpath)
, 1, .56, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, logical read bytes from cache
, 113541120, 63.34M, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, physical reads
, 6111, 3.41k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, physical reads cache
, 6111, 3.41k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, physical read IO requests
, 6112, 3.41k, , , , , 8.19k bytes per request
> 1070 _at_2, SYSADMIN, STAT, physical read bytes
, 50069504, 27.93M, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, db block changes
, 11, 6.14, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, free buffer requested
, 6112, 3.41k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, hot buffers moved to head of LRU
, 958, 534.39, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, free buffer inspected
, 6144, 3.43k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, shared hash latch upgrades - no wait
, 7, 3.9, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, blocks decrypted
, 6110, 3.41k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, redo entries
, 6120, 3.41k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, redo size
, 465504, 259.67k, , , , , ~ bytes per user commit
> 1070 _at_2, SYSADMIN, STAT, redo entries for lost write detection
, 6110, 3.41k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, redo size for lost write detection
, 464756, 259.25k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, redo subscn max counts
, 7, 3.9, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, file io wait time
, 1408659, 785.78k, , , , , 230.47us bad guess of IO wait time per IO request
> 1070 _at_2, SYSADMIN, STAT, gc current blocks received
, 3, 1.67, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, gc local grants
, 6116, 3.41k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, ASSM cbk:blocks examined
, 12366, 6.9k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, ASSM gsp:L1 bitmaps examined
, 2478, 1.38k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, ASSM gsp:L2 bitmaps examined
, 1, .56, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, ASSM gsp:reject db
, 12388, 6.91k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, STAT, buffer is pinned count
, 230, 128.3, , , , , 1.63 % buffer gets avoided thanks to buffer pin caching
> 1070 _at_2, SYSADMIN, STAT, cell flash cache read hits
, 6723, 3.75k, , , , , ~ per execution
> 1070 _at_2, SYSADMIN, TIME, background cpu time
, 365192, 203.71ms, 20.4%, [## ], , ,
> 1070 _at_2, SYSADMIN, TIME, background elapsed time
, 1273623, 710.45ms, 71.0%, [######## ], , , 28.95 % unaccounted time
> 1070 _at_2, SYSADMIN, WAIT, gc current block busy
, 629, 350.87us, .0%, [ ], 3, 1.67, 209.67us average wait
> 1070 _at_2, SYSADMIN, WAIT, cell single block physical read
, 1557638, 868.88ms, 86.9%, [WWWWWWWWW ], 6746, 3.76k, 230.9us average wait
>
> -- End of Stats snap 1, end=2019-05-18 12:58:58, seconds=1.8
--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l




--
http://www.freelists.org/webpage/oracle-l

--
http://www.freelists.org/webpage/oracle-l






--
http://www.freelists.org/webpage/oracle-l




--
http://www.freelists.org/webpage/oracle-l
Received on Thu May 23 2019 - 17:29:50 CEST

Original text of this message