Re: Parallel Query Performance Issues

From: Goti <aryan.goti_at_gmail.com>
Date: Thu, 9 Jun 2022 20:09:17 +0530
Message-ID: <CAOzfMurjS8F5iGhBrNK6ONoDNuGrHA=E=LqXzEJqosca0=HKLQ_at_mail.gmail.com>



Thanks Jonathan for the quick response!

I tried for the first 2 workarounds and that didn't work as expected. I will work on the 3rd and 4th action plan and update here.

Thanks,

Goti

On Thu, Jun 9, 2022 at 5:41 PM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

>
> The two queries may return the same size result, but the 2019 report
> generates and aggregates roughly 12 times as much data as the 2018 report.
> Check the "Actual Rows" figures - the 2018 report hits 3M rows (and 3M
> execs of the subsequent table probes) while the 2919 report hits 39M
> rows/execs - and that's where a lot of time goes on CPU.
>
> Strangely (almost) all the data is passed to one PX server (at operation
> 13/14, I think) that blows it up through segement NL joins to get most of
> the 39M rows that have to be "buffer sorted" (i.e. buffered, but not
> actually sorted) which is where the temp space and I/O time goes.
>
> Possible workarounds
> - MAYBE if you tried parallel 7 rather than 8 the hash disrtibution at
> operation MIGHT be better balanced;
> - MAYBE if you set "_gby_hash_aggregation_enabled" to false and got a SORT
> UNIQUE instead of a hash unique the distribution would work better.
> - if you get the outline information for the plan you should be able to
> find the pq_distribute hint controls the distribution at operation 14 and
> change it from a hash distribution to a round-robin - this will probably
> introduce a 2nd layer of aggregation/uniqueness, but two small, shared
> stages may well do better than one very large operation.
> - can you rewrite the query to eliminate duplication earlier. This may
> require you to include inline non-mergeable views: ideally you want to
> avoid generating 39M rows at any point and then executing 39M join steps as
> that will still account for a lot of your time.
>
>
> Regards
> Jonathan Lewis
>
>
>
>
> On Thu, 9 Jun 2022 at 12:15, Goti <aryan.goti_at_gmail.com> wrote:
>
>> Environment : 11.2.0.4 database running on Linux.
>>
>> Need help to understand parallel query performance issues. Below are the
>> query details and its associated plans. The 2018_query does execute in 24
>> seconds and returns about 2.5K rows. The 2019_query is also expected to
>> process almost the same number of rows however it consumes a lot of TEMP
>> space and finally fails. The 2019_query without parallel completes in 45
>> minutes (Just by removing the parallel hint). The only difference between
>> both the queries is related to the predicate "opclf.year_number =
>> to_number('YYYY')". The stats are up to date for the tables are partitions.
>>
>>
>> 2019_query:
>> https://gist.github.com/aryangoti/a7704a8075f118f7d942e49acee1900d
>>
>> 2018_query:
>> https://gist.github.com/aryangoti/a7704a8075f118f7d942e49acee1900d
>>
>> Stats and other details:
>> https://gist.github.com/aryangoti/a3797424ce0cb4fd87e194c05ad099b6
>>
>> Thanks,
>>
>> Goti
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jun 09 2022 - 16:39:17 CEST

Original text of this message