Re: ITL waits

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Fri, 5 Apr 2024 10:40:14 +0100
Message-ID: <CAGtsp8m4WBX-Yp_NYFfaFN3-HWjxBHV8XDG4fXhnZ5PP5kT2Bw_at_mail.gmail.com>



You need to describe your problems with much greater precision. I've ignored this thread so far because I had too many questions that I would need answers (or too many random guesses) before it would be possible to answer sensibly.

Even this last post leaves gaps that blur the issue:

You listed a table with 4 named columns, and said these were 4 queries - but is that really 4 QUERIES, or 2 UPDATES and 2 queries, or 4 updates. You've mentioned 3rd April and 30 minutes. So is the table a summary of execution counts from the AWR showing the number of executions in each 30 minutes (for 15 hours) of 3rd April. Why not show the snapshot time and the SQL used to produce the results - then we don't have to make any guesses.

You've shown two lines with the first one reporting "enq: TX - allocate ITL entry" and a count of 372,570 (I assume the 2nd line has "noduplicate" column definition and is reporting a further 300,000+ count. You don't say where these numbers came from, again the SQL to reproduce would be sensible. You indicated earlier on that you were seeing ca. 4 hours in 30 minutes: but if this is v$active_session_history it's (statistically) 680,000 seconds which is closer to 190 hours. Contradictory numbers without explanation do not help.

In the earlier post you said something about concurrency between 3K and 10K - again without explanation of how you source that. Is it sessions connected or active sessions reported in v$active_session_history. If your 30 minutes started at 3K and ran into a small contention problem your application server(s) might have spawned a lot of extra processes very rapidly turning a small problem into a big one.

Looking at the latest table of executions: for 30 minutes you execute 4 statements about 40 times as often as at any other time - it's not very surprising that if there's a little baseline ITL contention on the target objects that the contention should escalate through a positive feedback loop when the number of executions starts to go up. (I wonder if one of your tables is a session log). You said something about the queries being new - was that "new to the AWR reports because it hadn't cause problems before" or definitely new code added to the system in the very recent past.

Was this catastrophe a one-off, or does it happen at around the same hour each day, or randomly but only for brief period, possibly more than once per day.

You mentioned "Top Object" - was that from the "Segments by ITL waits from a 30 minute AWR report, or something else.

You haven't mentioned anything about the "WHERE" clause for the updates - without giving any column names you might be able to give some meaning to why many processes might collide so frequently on the same blocks

Does your lack of any comment on recent object maintenance (e.g. index rebuilds, table shrink, massive delete/archive etc.) mean that you hadn't considered the possibility, or that you actively know that no such activity took place.



If you know that nothing has happened recently that could have resulted in any large-scale change to the tables then I suggest you query the dba_hist_active_sess_history for the snapshot interval in some detail to discover the way the problem developed and progressed.

(1) select sample_time, count(*) .... group by sample_time order by sample_time.
-- can you see any pattern, did the count suddenly jump dramatically at some point and then decrease very slowly - maybe something happened at that point that triggered the catastrophe without actually being related to the 4 statements.

(2) Repeat (1), but restrict to just rows with the ITL wait event

(3) Repeat (1), but restrict to just rows on CPU - maybe a brief spike in CPU usage triggered the catastrophe.

When a session is waiting there are many waits where Oracle will report the current obj#, file, block and row. I don't know if this is the case with enq TX - ITL so treat the results with caution; but if the values are set to the block where the ITL is being requested (and not the undo segment header) then for each SQL_ID of your update statements, and for the 30 minutes interval you could query something like:

(4) select sample_time, current_obj#, current_file#, current_block#, count(*) ... group by ... order by {all 4 listed} -- if this says anything useful it might tell you whether the accesses and collisions were completely random, or fixed in a small number of locations, or whether they were moving along the table as time passed, and the pattern might give you some clue about why the catastrophe could happen.

Regards
Jonathan Lewis

On Fri, 5 Apr 2024 at 08:06, Krishnaprasad Yadav <chrishna0007_at_gmail.com> wrote:

> Adding to mail , Q represents a query , so Q1=query1,q2=query2,q3=query3...
>
>
>
> On Fri, 5 Apr 2024 at 11:39, Krishnaprasad Yadav <chrishna0007_at_gmail.com>
> wrote:
>
>> Hi Mark ,
>>
>> Thanks for revert !
>>
>> EVENT COUNT(*) PERCENT
>> ----------------------------------- ---------- ----------
>> enq: TX - allocate ITL entry 372570 46% ==> update
>> statement updates status column ,avg.row len:138 and blocks :419217 table
>> size:3.2GB
>> 302698 37%==> update
>> statement updates status and one more column ,avg.row len :207 table size
>> :92 GB
>>
>> 1.not sure how insertion happens , but it updates status column in both
>> tables
>> 2.Block size is 8K , i am not sure about rows are often multi-block
>> rows or not
>> 3.Rows are not migrated rows
>>
>> Update statement does not has any older history of execution , from last
>> 2 month of AWR data , we could see that only got executed at 3 -APR , below
>> is execution with 30 mins of snap.
>>
>> Q1 Q2 Q3 Q3
>> 9672 2585 2259 2602
>> 803 1112 1070 1111
>> 4245 3839 3316 3840
>> 1707 1280 1358 1279
>> 2827 1659 1968 1659
>> 874 1592 1146 1590
>> 1715 2417 2479 2419
>> 11487 8151 18507 8145
>> 1610 1676 1757 1675
>> 386 693 575 694
>> 1772 690 785 691
>> 7596 5227 5559 5227
>> 258 474 283 475
>> 784 964 511 970
>> 1810 2076 1315 2073
>> 3930 2922 3154 2918
>> 3235 3172 2628 3171
>> 3971 3350 3261 3350
>> 2735 2749 1707 2750
>> 3094 4233 4409 4235
>> 5391 5839 6012 5842
>> 7309 6428 6899 6417
>> 2335 2017 1839 2016
>> 10455 9438 10096 9438
>> 4702 2731 2417 2734
>> 3316 9801 10653 9804
>> *188891* *187068* *189944* *187061*
>> 1707 1027 1066 1031
>> 11642 4481 8448 4490
>> 8852 9287 4333 9318
>> TOTAL 309111 288978 299754 289025
>>
>>
>> Regards,
>> Krishna
>>
>> On Fri, 5 Apr 2024 at 03:31, Mark W. Farnham <mwf_at_rsiz.com> wrote:
>>
>>> So you have a particular table.
>>>
>>>
>>>
>>> Let’s talk about that table.
>>>
>>>
>>>
>>> How many rows are there per block for that table? (min, max, average).
>>>
>>>
>>>
>>> How many insert, update, and delete operations can reasonably be
>>> simultaneously in play for that table?
>>>
>>>
>>>
>>> I’m restraining myself from asking about indexes because you wrote that
>>> you’re sure that is not it.
>>>
>>>
>>>
>>> Are rows inserted into this table born “full length” or is a skeleton
>>> key insert made with lots of null columns which are then updated to actual
>>> values?
>>>
>>>
>>>
>>> Are the rows often multi-block rows?
>>>
>>>
>>>
>>> Are there many migrated rows?
>>>
>>>
>>>
>>> After a row is inserted and perhaps updated a few times very soon after
>>> the insert, does it tend to become quiet and rarely if ever be updated
>>> again?
>>>
>>>
>>>
>>> Your answers to these questions will tend to reduce the possible
>>> improvement suggestions from hundreds to a dozen or fewer.
>>>
>>>
>>>
>>> Good luck,
>>>
>>>
>>>
>>> mwf
>>>
>>>
>>>
>>> *From:* oracle-l-bounce_at_freelists.org [mailto:
>>> oracle-l-bounce_at_freelists.org] *On Behalf Of *Krishnaprasad Yadav
>>> *Sent:* Thursday, April 04, 2024 1:02 PM
>>> *To:* Pierre Labrousse
>>> *Cc:* Oracle L
>>> *Subject:* Re: ITL waits
>>>
>>>
>>>
>>> Adding some more info , querys lio is around 15 to 17 only
>>>
>>>
>>>
>>> On Thu, 4 Apr, 2024, 22:31 Krishnaprasad Yadav, <chrishna0007_at_gmail.com>
>>> wrote:
>>>
>>> Dear Pierre,
>>>
>>>
>>>
>>> Thanks for your reply
>>>
>>>
>>>
>>> I have seen top object is table from reports , so 100%sure on it
>>>
>>>
>>>
>>> I know inittrans would help but what i feel that its might be cascaded
>>> one (suspecting) because as i mentioned
>>>
>>> Concurrency is low , table size is low
>>>
>>> Not sure about connection strom which i need to work on
>>>
>>>
>>>
>>> Regards,
>>>
>>> Krish
>>>
>>>
>>>
>>>
>>>
>>> On Thu, 4 Apr, 2024, 21:43 Pierre Labrousse, <
>>> Pierre.Labrousse_at_digora.com> wrote:
>>>
>>> Hello Krishna,
>>>
>>>
>>>
>>> Are you sure that waits are on table segment and not on index segment ?
>>> Generally this waits appears more on indexes than tables.
>>>
>>>
>>>
>>> Effectively, INITRANS should be raised (by default it's 1 for table and
>>> 2 for index), but it will be taken into account only for new blocks (unless
>>> you move table or rebuild index). You could also raise PCTFREE to have less
>>> rows into the same block or create a new tablespace with a data block size
>>> smaller to minimize number of rows into the same block and move your
>>> tables/index into this tablespace.
>>>
>>> Partitioning would be a good solution but it is "just" extra cost 😉
>>>
>>>
>>>
>>> Best regards.
>>>
>>> Pierre
>>>
>>>
>>>
>>>
>>>
>>> *Pierre **LABROUSSE*
>>> *Consultant DBA ORACLE (OCM 10g/11g/12c)*
>>>
>>> *M*obile +33 (0)7 56 05 27 38
>>>
>>> *pierre.labrousse
>>> <https://owa.digora.com/owa/redir.aspx?C=6cU4SWG0HEauWwQa74AWhWf917AeKdIIQ087knyB2eMPFQyUFrMl2EZ1Rw0CEMwBfg1hQ0wI9Zc.&URL=mailto%3apierre.labrousse%40digora.com%2520%250d>**_at_
>>> <https://owa.digora.com/owa/redir.aspx?C=6cU4SWG0HEauWwQa74AWhWf917AeKdIIQ087knyB2eMPFQyUFrMl2EZ1Rw0CEMwBfg1hQ0wI9Zc.&URL=mailto%3apierre.labrousse%40digora.com%2520%250d>**digora.co
>>> <https://owa.digora.com/owa/redir.aspx?C=6cU4SWG0HEauWwQa74AWhWf917AeKdIIQ087knyB2eMPFQyUFrMl2EZ1Rw0CEMwBfg1hQ0wI9Zc.&URL=mailto%3apierre.labrousse%40digora.com%2520%250d>m
>>> <https://owa.digora.com/owa/redir.aspx?C=6cU4SWG0HEauWwQa74AWhWf917AeKdIIQ087knyB2eMPFQyUFrMl2EZ1Rw0CEMwBfg1hQ0wI9Zc.&URL=mailto%3apierre.labrousse%40digora.com>*
>>>
>>>
>>> ------------------------------
>>>
>>> *De :* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> de
>>> la part de Krishnaprasad Yadav <chrishna0007_at_gmail.com>
>>> *Envoyé :* jeudi 4 avril 2024 17:05
>>> *À :* Oracle L <oracle-l_at_freelists.org>
>>> *Objet :* ITL waits
>>>
>>>
>>>
>>> Dear Gurus ,
>>>
>>>
>>>
>>> i have made observation of ITL waits in DB , and it lasted more than
>>> 4+hrs
>>>
>>> i see that certain new DML was introduced newly in database , and same
>>> time these ITL wait events triggered .
>>>
>>>
>>>
>>> I validated top object i see 3 tables were seen as DML are using these
>>> tables , i saw concurrency it was around 3k to 10K in 30 mins of snap .
>>>
>>> Also table is non partitioned and it was only 3GB in size .
>>>
>>>
>>>
>>> total execution in DAY was around 400K for all 3 DML statements , seeing
>>> these bit surprise that how with such low concurrency its landed in issue
>>>
>>> Also able extent management is AUTO .
>>>
>>>
>>>
>>> i know increasing INITTRANS will help them (probably ?) but i am
>>> suspecting its coming something out of DB , i validated OS watcher details
>>> , i see some potential bottleneck but those stats which i suspect as
>>> bottleneck are seen in good time as well , i validate CPU utilization
>>> which was high during start of issue but after 10 mins it got down to 60%
>>> but spike remain in DB , validate IO busyness trend is similar to CPU .
>>>
>>>
>>>
>>> It will be helpful if you can share any of experience from which i can
>>> relate and take ahead my case .
>>>
>>>
>>>
>>> Regards,
>>>
>>> Krishna
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Apr 05 2024 - 11:40:14 CEST

Original text of this message