Re: Parallel Query Performance Issues

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Mon, 13 Jun 2022 19:46:27 +0100
Message-ID: <CAGtsp8m=xJjfdHqE-p8u31Cx5YgNxO2TiyOzx3qw4dQj9554vQ_at_mail.gmail.com>



If you look closely at the plan you will see that a) the plan has changed completely - the optimizer is now using a star transformation
b) the critical PQ distribution is still using hash/hash distribution c) the hash distribution is still massively skewed towards one PX serve (p013)

Apart from changing a hidden parameter did you do anything else to get the change in plan? (Like testing with the Outline Information from the original 2019 plan but with the relevant pq_distribute() hash/hash hint changed to a broadcast/none; or testing with the Outline Information from the original 2018 plan against the 2019 query.)

I mentioned the idea of rewriting the query to eliminate duplicates earlier. Reviewing the query I see that you have an inline "select distinct" view, and an outer "select distinct" - but the plan has only one "Hash UNIQUE" operation. Have you tried simply adding a no_merge hint to the inline view?

In passing I see the inline view mixes "traditional" Oracle syntax with ANSI syntax. This is a very bad idea and may be contributing to the optimizer doing something very silly in with its join ordering and arithmetic - whatever else I did to find a faster execution path I'd change the code to be either traditional or ANSI, but not a mixture.

Regards
Jonathan Lewis

On Mon, 13 Jun 2022 at 11:02, Goti <aryan.goti_at_gmail.com> wrote:

> Thanks Andy and Jonathan.
>
> I did change _parallel_broadcast_enabled to TRUE to have "PX BROADCAST in
> the plan. But still it doesn't improve the response time of the SQL. Can
> you please help me to identify why the step 38 actual rows shows 495M
> whereas Oracle estimates it to be 1 row. Below are the gist details.
>
>
> https://gist.github.com/aryangoti/ec2804a7b832a7fe606ec0bf6a0681b7
>
> Thanks,
>
> Goti
>
>
> On Thu, Jun 9, 2022 at 8:15 PM Andy Sayer <andysayer_at_gmail.com> wrote:
>
>> Just quick thoughts - replace the distincts with group by, this might
>> allow group by placement to happen for you.
>>
>> The inner distinct doesn’t seem to be executed as a distinct, there might
>> be clues in the outline if it’s decided that it only need wants to do a
>> sort.
>>
>> I’ll have a closer look when I can
>>
>> Thanks,
>> Andy
>>
>> On Thu, 9 Jun 2022 at 15:39, Goti <aryan.goti_at_gmail.com> wrote:
>>
>>> Thanks Jonathan for the quick response!
>>>
>>> I tried for the first 2 workarounds and that didn't work as expected. I
>>> will work on the 3rd and 4th action plan and update here.
>>>
>>> Thanks,
>>>
>>> Goti
>>>
>>>
>>> On Thu, Jun 9, 2022 at 5:41 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>>> wrote:
>>>
>>>>
>>>> The two queries may return the same size result, but the 2019 report
>>>> generates and aggregates roughly 12 times as much data as the 2018 report.
>>>> Check the "Actual Rows" figures - the 2018 report hits 3M rows (and 3M
>>>> execs of the subsequent table probes) while the 2919 report hits 39M
>>>> rows/execs - and that's where a lot of time goes on CPU.
>>>>
>>>> Strangely (almost) all the data is passed to one PX server (at
>>>> operation 13/14, I think) that blows it up through segement NL joins to get
>>>> most of the 39M rows that have to be "buffer sorted" (i.e. buffered, but
>>>> not actually sorted) which is where the temp space and I/O time goes.
>>>>
>>>> Possible workarounds
>>>> - MAYBE if you tried parallel 7 rather than 8 the hash disrtibution at
>>>> operation MIGHT be better balanced;
>>>> - MAYBE if you set "_gby_hash_aggregation_enabled" to false and got a
>>>> SORT UNIQUE instead of a hash unique the distribution would work better.
>>>> - if you get the outline information for the plan you should be able to
>>>> find the pq_distribute hint controls the distribution at operation 14 and
>>>> change it from a hash distribution to a round-robin - this will probably
>>>> introduce a 2nd layer of aggregation/uniqueness, but two small, shared
>>>> stages may well do better than one very large operation.
>>>> - can you rewrite the query to eliminate duplication earlier. This may
>>>> require you to include inline non-mergeable views: ideally you want to
>>>> avoid generating 39M rows at any point and then executing 39M join steps as
>>>> that will still account for a lot of your time.
>>>>
>>>>
>>>> Regards
>>>> Jonathan Lewis
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, 9 Jun 2022 at 12:15, Goti <aryan.goti_at_gmail.com> wrote:
>>>>
>>>>> Environment : 11.2.0.4 database running on Linux.
>>>>>
>>>>> Need help to understand parallel query performance issues. Below are
>>>>> the query details and its associated plans. The 2018_query does execute in
>>>>> 24 seconds and returns about 2.5K rows. The 2019_query is also expected to
>>>>> process almost the same number of rows however it consumes a lot of TEMP
>>>>> space and finally fails. The 2019_query without parallel completes in 45
>>>>> minutes (Just by removing the parallel hint). The only difference between
>>>>> both the queries is related to the predicate "opclf.year_number =
>>>>> to_number('YYYY')". The stats are up to date for the tables are partitions.
>>>>>
>>>>>
>>>>> 2019_query:
>>>>> https://gist.github.com/aryangoti/a7704a8075f118f7d942e49acee1900d
>>>>>
>>>>> 2018_query:
>>>>> https://gist.github.com/aryangoti/a7704a8075f118f7d942e49acee1900d
>>>>>
>>>>> Stats and other details:
>>>>> https://gist.github.com/aryangoti/a3797424ce0cb4fd87e194c05ad099b6
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Goti
>>>>>
>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jun 13 2022 - 20:46:27 CEST

Original text of this message