Re: High consistent Gets and response time in prod

From: Ram Raman <veeeraman_at_gmail.com>
Date: Wed, 5 May 2021 15:05:37 -0500
Message-ID: <CAHSa0M0Zt9g9uV909a7h=PU8PR3pC=XvDN2BDJ9VTvooHfryhA_at_mail.gmail.com>



Thank you all. After I read the response from people about checking the wait events, I was feeling bit embarrassed on how I didnt think of that one before. I did a quick check on it yesterday among other things, found it was waiting on db file scattered read, fetching block after block - "for a rownum <= 1". I was confused but I will do more digging and post my update

Ram

On Sun, May 2, 2021 at 8:17 AM Mark W. Farnham <mwf_at_rsiz.com> wrote:

> “In which case, immediately running the query a second time should run
> much faster, after the first query cleans up the mess?”
>
>
>
> IF the blocks come into the SGA as opposed to the PGA only.
>
>
>
> Unless I missed some fix up patch recently that tracks cleaned out blocks
> into the PGA and then flushes them through the SGA for permanence or they
> inflicted share consistency on the PGA (I sure hope not.)
>
>
>
> I still think the odds-on favorite is empty front accumulated in
> production which “magically” (not) disappears when test is refreshed. Huge
> empty front is exacerbated by delayed cleanout, but does not require
> delayed cleanout to be a problem.
>
>
>
> Either snapper or reload could confirm that. Reloading before a snapper
> run would likely change the symptoms, so what you do next depends on
> whether you want the exact current problem completely diagnosed or just
> want it to go away. A reload rules out empty front and delayed cleanout if
> it is still slow, but does not prove either was the problem if the problem
> goes away.
>
>
>
> mwf
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *Tim Gorman
> *Sent:* Saturday, May 01, 2021 10:45 PM
> *To:* xt.and.r_at_gmail.com; jpiwowar_at_gmail.com
> *Cc:* Ram Raman; ORACLE-L
> *Subject:* Re: High consistent Gets and response time in prod
>
>
>
> Please take Sayan's advice and use snapper, paying attention to statistics
> to verify that you're reading through millions of UNDO records to build a
> consistent read image of each table block. This would indicate that there
> are dozens or hundreds of in-flight (uncommitted) or recently-commited
> transactions on that table.
>
> If it is true that there are dozens or hundreds of sessions performing
> generating UNDO on that table, then you ought to prove it. Start in
> V$TRANSACTION, then join to V$SESSION and from there to V$LOCK to find the
> 'TM' type enqueue locks on the table in question? Or perhaps start by
> scanning V$SQL for DML statements against the table, and then tie all those
> SQL_IDs back to V$SESSION? There are several ways to do this, and I'm not
> sure if either of these is optimal.
>
> At any rate, if you cannot prove that there are dozens or hundreds of
> uncommitted or recently-committed transactions on the table, then it might
> be something like delayed-block cleanout as Mark suggested. In which case,
> immediately running the query a second time should run much faster, after
> the first query cleans up the mess?
>
>
> On 4/30/2021 5:07 PM, Sayan Malakshinov wrote:
>
> Oh, sorry, I've just re-read it carefully (it's 3am here and looks like my
> brain is too tired :)
>
> 1. rownum disables adaptive serial direct path reads, so you don't need to
> care about them
>
> 2. index access may not help you in case of lots of uncommitted changes in
> those long-running transactions (depends on what exactly were changed and
> how many times)
>
>
>
> So it would be better to provide session snapper output and check those
> long-running sessions and v$transaction
>
>
>
> On Sat, May 1, 2021 at 2:55 AM Sayan Malakshinov <xt.and.r_at_gmail.com>
> wrote:
>
> Hi Ram,
>
>
>
> 11.5M blocks
>
>
>
> Most likely there were massive deletes and you have a lot of empty blocks.
>
> Anyway I wouldn't guess and suggest you just start your query and run
> Tanel Poder's session snapper on your session for 5-10 seconds to get real
> info what's going on:
>
> https://tanelpoder.com/snapper/
>
> And just post its output here.
>
> It doesn't require nor diagnostics nor tuning packs.
>
> Anyway for such a strange query, I would suggest forcing IFS by primary
> key index, to avoid "adaptive serial direct path reads" which lead to
> object checkpoint. for example using SQL Patch which also doesn't require
> those packs.
>
>
>
> On Sat, May 1, 2021 at 2:36 AM John Piwowar <jpiwowar_at_gmail.com> wrote:
>
> You'll probably get better info from dbms_monitor and tkprof than
> autotrace, but I think your main culprit/clue is the combination of high
> consistent gets and "parallel queries running against the prod table for
> 30+ hours." Depending on what those PQ are doing, you are likely to be
> hitting UNDO pretty hard to get a read-consistent view of even one row.
>
>
>
> On Fri, Apr 30, 2021 at 3:39 PM Ram Raman <veeeraman_at_gmail.com> wrote:
>
> Hi,
>
>
>
> Fri evening and I have a problem. grrr..
>
>
>
> We have a simple SELECT statement like this:
>
>
>
> select * from c.pd where rownum <=1
>
>
>
> This executes instantly in test, but takes over 2 mins in prod. I did some
> autotrace and here are some stats:
>
>
>
> TEST:
>
> =====
>
> ...
>
> Elapsed: 00:00:00.03
>
>
>
> Execution Plan
>
> ----------------------------------------------------------
>
> Plan hash value: 1087134000
>
>
>
>
> ---------------------------------------------------------------------------------
>
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
> Time |
>
>
> ---------------------------------------------------------------------------------
>
> | 0 | SELECT STATEMENT | | 1 | 983 | 2 (0)|
> 00:00:01 |
>
> |* 1 | COUNT STOPKEY | | | | |
> |
>
> | 2 | TABLE ACCESS FULL| PD | 1 | 983 | 2 (0)| 00:00:01 |
>
>
> ---------------------------------------------------------------------------------
>
>
>
> Predicate Information (identified by operation id):
>
> ---------------------------------------------------
>
>
>
> 1 - filter(ROWNUM<=1)
>
>
>
>
>
> Statistics
>
> ----------------------------------------------------------
>
> 0 recursive calls
>
> 0 db block gets
>
> 3 consistent gets
>
> 0 physical reads
>
> 0 redo size
>
> 8556 bytes sent via SQL*Net to client
>
> 408 bytes received via SQL*Net from client
>
> 2 SQL*Net roundtrips to/from client
>
> 0 sorts (memory)
>
> 0 sorts (disk)
>
> 1 rows processed
>
>
>
>
>
> PROD:
>
> =====
>
>
>
> ...
>
> Elapsed: 00:02:13.61
>
>
>
> Execution Plan
>
> ----------------------------------------------------------
>
> Plan hash value: 1087134000
>
>
>
>
> ---------------------------------------------------------------------------------
>
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
> Time |
>
>
> ---------------------------------------------------------------------------------
>
> | 0 | SELECT STATEMENT | | 1 | 993 | 4 (0)|
> 00:00:01 |
>
> |* 1 | COUNT STOPKEY | | | | |
> |
>
> | 2 | TABLE ACCESS FULL| PD | 1 | 993 | 4 (0)| 00:00:01 |
>
>
> ---------------------------------------------------------------------------------
>
>
>
> Predicate Information (identified by operation id):
>
> ---------------------------------------------------
>
>
>
> 1 - filter(ROWNUM<=1)
>
>
>
>
>
> Statistics
>
> ----------------------------------------------------------
>
> 0 recursive calls
>
> 0 db block gets
>
> 11290619 consistent gets
>
> 11289575 physical reads
>
> 8024 redo size
>
> 8510 bytes sent via SQL*Net to client
>
> 408 bytes received via SQL*Net from client
>
> 2 SQL*Net roundtrips to/from client
>
> 0 sorts (memory)
>
> 0 sorts (disk)
>
> 1 rows processed
>
>
>
>
>
> First thing that jumps out is the number of consistent gets, which is in
> the same range in prod again and again. In test also, the consistent gets
> are in the same range for repeated executions - just couple of dozens.
>
>
>
> I checked the stats and here they are:
>
>
>
> Test:
>
> =====
>
>
>
> TABLE_NAME NUM_ROWS LAST_ANAL BLOCKS
> AVG_ROW_LEN PCT_FREE PCT_USED EMPTY_BLOCKS AVG_SPACE
>
> ------------------------ ------------------------ --------- ------------
> ----------- ---------- ---------- ------------ ----------
>
> CHAIN_CNT PAR MON COMPRESS
>
> ---------- --- --- --------
>
> PD 989,633 25-FEB-21 143,844
> 983 10 0 0
>
> 0 NO YES DISABLED
>
>
>
>
>
>
>
> PROD:
>
> =====
>
>
>
> TABLE_NAME NUM_ROWS LAST_ANAL BLOCKS
> AVG_ROW_LEN PCT_FREE PCT_USED EMPTY_BLOCKS AVG_SPACE
>
> ------------------------ ------------------------ --------- ------------
> ----------- ---------- ---------- ------------ ----------
>
> CHAIN_CNT PAR MON COMPRESS
>
> ---------- --- --- --------
>
> PD 1,420,080 29-APR-21 11,537,288
> 993 10 0 0
>
> 0 NO YES DISABLED
>
>
>
> What is confusing is that the number of blocks the table takes in the
> prod. I did some calc and I feel that it should take around 172,000+
> blocks, given our block size is 8K, but I am unable to explain 11.5M
> blocks.
>
>
>
> 19c, if that helps. Table definitions are identical in prod and test.
> Recollecting the table stats in prod at 50% has not helped with the query
> which I did few minutes ago. Stats almost remain the same - just the number
> of rows has increased by a few hundred. I see some parallel queries
> running against the prod table for 30+ hours.
>
>
>
> Thanks a lot,
>
> Ram.
>
>
>
>
>
>
>
>
> --
>
> Best regards,
> Sayan Malakshinov
>
> Oracle performance tuning engineer
>
> Oracle ACE Associate
> http://orasql.org
>
>
>
>
> --
>
> Best regards,
> Sayan Malakshinov
>
> Oracle performance tuning engineer
>
> Oracle ACE Associate
> http://orasql.org
>
>
>

--

--
http://www.freelists.org/webpage/oracle-l
Received on Wed May 05 2021 - 22:05:37 CEST

Original text of this message