Re: INSERT statement restarts internally

From: William Robertson <william_at_williamrobertson.net>
Date: Thu, 6 Aug 2020 16:20:17 +0100
Message-Id: <6FFAD405-067F-428A-91EC-FD22AE220A0A_at_williamrobertson.net>



I can confirm that any DDL on any of the tables involved, regardless of partition, triggers a hard invalidation of the cursor (v$sql.object_status becomes ‘INVALID_UNAUTH') and causes it to roll back and restart (sql_exec_start and sql_exec_id are reset). Each rollback can easily take an hour, depending on how long it ran before it was invalidated, as it inserts tens of millions of rows. As it happens there is an exercise at the moment to reload some data for previous dates, and this INSERT doesn’t have a chance.

The 12.2 "deferred invalidations” clause and the corresponding cursor_invalidation parameter made no difference.

Possibly we can add some mutex rules to our batch job config, but working out exactly which rules to add could be a challenge and I suspect there are too many suspects to be practical.

Another approach might be to lock all 10 tables in row share mode at the start of the procedure, and have our partition maintenance procedures take out an exclusive lock on the table before making any change, which might give us a mechanism for making partition operations wait for critical DML.

I’m wondering why I’ve never come across this before, apart from an issue recently with a stats job passing no_invalidate = false. I suppose it must happen all the time, but it takes a long-running, high-volume insert and an unusually busy batch system before you really notice it.

William

On 5 Aug 2020, at 10:19, William Robertson <william_at_williamrobertson.net> wrote:

Well, that’s the kind of smoking gun I am looking for. The statement involved half a dozen tables, mostly partitioned, some subpartitioned, so I was hoping for a quick way to track it down, but I suppose the timestamps in dba_objects and dba_tab_stats_history might be a start.

No automatic/interval partitioning, no triggers, I’d be surprised if there were force-invalidate stats operations (six of them) but you never know, and this insert was serial and conventional.

Thanks for the links - I’d come across the one from Randolf Geist one before, and it describes the situation exactly. I see he wasn’t sure at the time of writing whether fine-grained cursor invalidation might prevent it in 12.2. It doesn’t. I’ll have to read the Kun Sun post a couple more times to get my head around the scenario with undo. Does it seem likely in an insert?

We also get regular ‘Object no longer exists’ errors that are similarly hard to track down. I wonder if that is a related issue.

William

On 5 Aug 2020, at 00:42, Sayan Malakshinov <xt.and.r_at_gmail.com <mailto:xt.and.r_at_gmail.com>> wrote:

Hi William,

Were there any DDL operations on that table? Drop/add? Automatic interval partitioning? Row-level 'before insert' triggers? Was it parallel insert? Undo space allocation? Some notes:
 https://oracle-randolf.blogspot.com/2016/01/dml-operations-on-partitioned-tables.html <https://oracle-randolf.blogspot.com/2016/01/dml-operations-on-partitioned-tables.html> http://ksun-oracle.blogspot.com/2011/05/update-restart-and-new-undo-extent.html <http://ksun-oracle.blogspot.com/2011/05/update-restart-and-new-undo-extent.html>

On Wed, Aug 5, 2020 at 1:29 AM <dimensional.dba_at_comcast.net <mailto:dimensional.dba_at_comcast.net>> wrote: While it is running, what is it waiting on?

-----Original Message-----

From: oracle-l-bounce_at_freelists.org <mailto:oracle-l-bounce_at_freelists.org> <oracle-l-bounce_at_freelists.org <mailto:oracle-l-bounce_at_freelists.org>> On Behalf Of William Robertson Sent: Tuesday, August 4, 2020 3:19 PM
To: oracle-l <oracle-l_at_freelists.org <mailto:oracle-l_at_freelists.org>> Subject: INSERT statement restarts internally

Hi all,

I’m trying to get to the bottom of an issue where a SQL statement (in this case an insert) that normally runs in 15 minutes suddenly takes two hours. Checking the session, it’s not blocked, the plan hasn’t changed and seems fine, all row estimates are good, there is no more data than usual, backups aren’t running (though I can’t rule out other background resource hogs, as I have somewhat limited access). Then I notice that the start time shown in SQL Monitor has changed, and going back to the session I see there have been six sql_exec_ids and sql_exec_starts for the sid/serial#/sql_id, even though the procedure only calls it once. Some of them are a couple of minutes apart, some are after 40 minutes. (With hindsight I should have looked at v$sql.object_status and also checked whether the session was rolling back, which I’m guessing it was.)

The last time we saw this, it coincided with a job that gathered stats on all partitions marked as stale, passing no_invalidate = false. We fixed that by changing it to true and the issue seemed to go away. This time though, this stats job wasn’t running and neither was anything else I could see that would have affected the same tables and partitions - all our large tables are list-partitioned by business date and many are subpartitioned by business line etc, partly to isolate batch processes. Looking at dba_active_sess_history I can see it’s happened before but sporadically.

I’m a bit stumped about what to do about it. Would this be logged anywhere (similar to a deadlock report)? I couldn’t see anything likely in v$diag_trace_file_contents but maybe I’m looking for the wrong thing. Is it the case that another session must have hard-invalidated the cursor, and I just have to find the smoking gun, or is there some other scenario like an internal failure? This is 12.2.0.1.

Thanks,

William

--

http://www.freelists.org/webpage/oracle-l Received on Thu Aug 06 2020 - 17:20:17 CEST

Original text of this message