RE: MERGE statement with parallel DML enabled deadlocks itself

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Wed, 6 May 2015 18:50:12 +0000
Message-ID: <CE70217733273F49A8A162EE074F64D92828BC04_at_EXMBX01.thus.corp>


FYI - I've just run up a small test with a couple of (hash) partitioned tables running a merge with update only. with a join on the select where all three tables are partitioned in the same way. My default parallelism is 4.

If I create the tables with 4 partitions I get partitionwise joins, if I create the tables with 3 or 5 partitions I don't. The relationship between partition count and degree can make a difference and (given it's hash partitions) my 3 / 5 partition test also have uneven data distributions.

You could try forcing partitionwise joins with pq_distribute(xxx none none) hints. I can't work out, though, why your outline shows three use_hash() hints with three pq_distribute() hint when the plan shows only two hash joins.

I think you're doing an aggregation in the select clause to get the uniqueness in you need for the merge to work - and I suspect Sayan is on the right trace with his comment about group by placement. If one of your hash joins is on more than just the partition key, then a hash distribution could push rows which should aggregate to the same partition to different parallel slaves so you end up with multiple rows for the same primary key coming into the final merge from different slaves, giving you the data deadlock. Possibly if you can force a partitionwise join this problem would become impossible. (I'm hand-waving and guessing wildly here, so don't spend too much time trying to prove me right or wrong.)

The reason why the fail regularly and you don't might be luck (i.e. the pattern of the data); it might be a side effect of a different choice of PQ distribution.

My final thought is to check the deadlock graphs to confirm that the rows in the collision are coming from the partitions and not from some data dictionary object. Be cautious because sometimes the data in the "rowid" report is actually just a hangover from an earlier wait that hasn't been cleared properly.

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle



From: oracle-l-bounce_at_freelists.org [oracle-l-bounce_at_freelists.org] on behalf of Jonathan Lewis [jonathan_at_jlcomp.demon.co.uk] Sent: 06 May 2015 12:58
To: oracle-l_at_freelists.org
Subject: RE: MERGE statement with parallel DML enabled deadlocks itself

Possibly a side effect of the degree of parallelism. I don't know how likely it is but with 33 partitions it's possible that 3, 11, and 33 are the only degrees where pw joins might occur.

Possibly it relates to the skew in the size of the partitions - especially if some don't exist to start with.

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle



From: oracle-l-bounce_at_freelists.org [oracle-l-bounce_at_freelists.org] on behalf of Sayan Sergeevich Malakshinov [malakshinovss_at_psbank.ru] Sent: 06 May 2015 12:53
To: Thomas Kellerer
Cc: oracle-l_at_freelists.org
Subject: Re: MERGE statement with parallel DML enabled deadlocks itself

Unfortunately i can't understand why parallel partition-wise join was not used in your case.

--
http://www.freelists.org/webpage/oracle-l
Received on Wed May 06 2015 - 20:50:12 CEST

Original text of this message