Re: Estimations are as good as it can be, what else can be done here?

From: Andy Sayer <andysayer_at_gmail.com>
Date: Wed, 15 Nov 2023 00:01:33 -0800
Message-ID: <CACj1VR7Sf_-B5CEMMQE_Eq9SBs7WKzGqHyfvF=7WN2AK26oVEQ_at_mail.gmail.com>



I think you need to take a step back and understand what you’re trying to achieve with the query. That usually helps you figure out what you really need to be doing. If this is part of a process you expect to be hit by all your users frequently with fast responses, then you probably want to rewrite.

The trick is going to be don’t read and sort millions of rows.

Is the distinct really necessary? Is ordering by 15 columns really necessary?
What is the driving filter for the query? What is the data being pulled out?

The lack of descriptive table names, and the removal of many of the columns involved makes it very difficult to approach from the outside. Are you more familiar with the data model or is this something that has been thrown over to you with the request to make it faster?

Thanks,
Andy

On Tue, 14 Nov 2023 at 23:34, yudhi s <learnerdatabase99_at_gmail.com> wrote:

> Thank You So much Jonathan, Andy and Pap.
>
> Pap stole my thoughts. I really had exactly the same question in my mind.
> In a general use case like this , how one should go about designing the
> system?
> or can we say as we are asking to read 100's of millions or billions and
> fetch just the top few , it can't be sql(or say oltp type) but it's a
> nosql(or OLAP) use case then?
>
>
> On Wed, Nov 15, 2023 at 10:48 AM Pap <oracle.developer35_at_gmail.com> wrote:
>
>> Jonathan/Andy,
>>
>> The OP's query here is really doing a lot of work like passing ~2million
>> rows all the way across 10+ table joins and also sorting all of them before
>> publishing the first_n_rows to the client. But a common case could be like,
>> a scenario in which the UI user has to see latest ~100 transactions details
>> of certain type for a user within a date range(say one week/month). And in
>> cases where the daily transaction can go up to 100's of millions and even a
>> two table join criteria with an order by clause(for finding the latest
>> transaction details based on date column) can take lot of time. So how
>> should one handle the design of such a requirement?
>>
>> Even one can have the de-normalized/flatten table or materialized view to
>> persist the two table JOIN data before hand but the order is not
>> guaranteed while we fetch the data, so we have to have the "order by
>> date_column" clause added to the query on top of the materialized view
>> scan(which might be on 100's of millions rows) , and that can well be the
>> bottleneck then. How should one design to cater such user needs? Can
>> Parallel processing help here?
>>
>> And it's also true that UI users don't understand about the detailed
>> work it had to go through before publishing those handful of rows in the UI
>> screen but they want it in a few seconds.
>>
>> On Fri, Nov 10, 2023 at 6:27 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>> wrote:
>>
>>>
>>> It's not totally trivial working out the timing of activity in a
>>> parallel plan, although a good starting point is to follow the ordering
>>> given by the TQ details for each individual DFO tree. And the "Start
>>> Active / Time Active" is an interesting problem - presumably it's likely to
>>> cover the first slave start to the last slave ending time, which in
>>> unbalanced cases could lead to some confusion.
>>>
>>> Looking at your plan here, though, the tablescan at operation 86 starts
>>> at +9 and takes 4 seconds to complete. I'm having a little difficulty
>>> working out exactly how this table scan passed through lagers of slave to
>>> get to the sort group by at operation 6 that starts at +12 and runs for 5
>>> seconds - though that 5 seconds is probably about receiving data from the
>>> tablescan and join, and not really about sorting.
>>>
>>> Given that it's a parallel query, it's not surprising that there are two
>>> sort group by operations - this is Oracle attempting to reduce the data
>>> size each PX slave has picked up from its hash join before distributing by
>>> range for a final sort group by. The sort group by at operation 3 starts
>>> at +16 and appears to take 24 seconds, but again I think very little time
>>> is spent in the actual sort, and most of the active time is related to
>>> passing the data to the client; the "PX SEND QC" is probably the indicator
>>> that the sort completed at +17.
>>>
>>> Given that you seem to have 2M rows moving up the pipeline it's possible
>>> that the two-pass sorting could be eliminated. If you check the Outline
>>> Data you probably had a gby_pushdown(_at_qbname) in it. If you change this to
>>> no_gby_pushdown(_at_qbname) you should find that the preliminary sort group by
>>> disappears - this might make a couple of seconds difference. (I've had a
>>> draft note on that hint for 9 years but haven't got around to finishing it
>>> - there is a mention of it in
>>> https://jonathanlewis.wordpress.com/2022/04/13/adaptive-joins/ )
>>>
>>> As far as a small amount of data making a big difference - that's always
>>> a possibility in any query. Given the complete cascade of hash joins,
>>> though, I don't think you're likely to see that happen. I have to say,
>>> though, that you do seem to have a skew in your data that might be adding a
>>> couple of seconds to your query. In the PX details you can see that slave
>>> p005 uses far more CPU time than the rest in slave set 1, and slave p00a
>>> does far more I/O to temp that the rest in slave set 2. Snapshots of PX
>>> session stats may give you some clues about those two anomalies.
>>>
>>> Your general question about timing for 10 table joins with 2M rows -
>>> you've got more joins than that, and I can see 4, 5, and 8m rows reported,
>>> so plenty of work to do. 40 seconds doesn't seem unreasonable, until you
>>> say you only want to look at the first few. Possibly you could find a way
>>> to create a materialized view (perhaps of only a subset of the tables) that
>>> reduces that enough to make it look a lot better.
>>>
>>> Regards
>>> Jonathan Lewis
>>>
>>>
>>> On Thu, 9 Nov 2023 at 10:51, yudhi s <learnerdatabase99_at_gmail.com>
>>> wrote:
>>>
>>>> I ran the query manually with parallel(6) and below is the sql monitor.
>>>> It does finish in ~40 seconds overall. But yes need to execute it from the
>>>> UI by changing the view definition with parallel hint , to see howmuch
>>>> extent its helping the real production run which runs with a first_rows
>>>> kind of setup.
>>>>
>>>>
>>>> https://gist.github.com/databasetech0073/62b51587b3f590914e9a92218609672a
>>>>
>>>> But again, if volume increase a bit the response time may again exceed
>>>> and goes beyond ~10 second reponse which is not good experience for any UI
>>>> user. So my doubt is, if we should really not expect these type of queries
>>>> (e.g. with 10 table joins +~2million matching rows +sorting on top) to
>>>> finish in <10 seconds and thus some different design approach should be
>>>> followed to cater such requirement? Like persisting the
>>>> transformed/aggregated results before hand in a materialized view or
>>>> anything else?
>>>>
>>>>>
>>>>>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Nov 15 2023 - 09:01:33 CET

Original text of this message