Re: Estimations are as good as it can be, what else can be done here?

From: yudhi s <learnerdatabase99_at_gmail.com>
Date: Thu, 9 Nov 2023 16:20:53 +0530
Message-ID: <CAEzWdqdmOrFyWXn_cxc7EGCHrxqgm6=wuNXm-25Zj9Lc-xGsOA_at_mail.gmail.com>



I ran the query manually with parallel(6) and below is the sql monitor. It does finish in ~40 seconds overall. But yes need to execute it from the UI by changing the view definition with parallel hint , to see howmuch extent its helping the real production run which runs with a first_rows kind of setup.

https://gist.github.com/databasetech0073/62b51587b3f590914e9a92218609672a

But again, if volume increase a bit the response time may again exceed and goes beyond ~10 second reponse which is not good experience for any UI user. So my doubt is, if we should really not expect these type of queries (e.g. with 10 table joins +~2million matching rows +sorting on top) to finish in <10 seconds and thus some different design approach should be followed to cater such requirement? Like persisting the transformed/aggregated results before hand in a materialized view or anything else?

On Thu, 9 Nov, 2023, 2:21 am yudhi s, <learnerdatabase99_at_gmail.com> wrote:

> Thank You Jonathan.
>
> If I see the total "read bytes" , then out of ~4GB, the majority of the
> "read bytes"(~2GB) were contributed by the bottom 6 lines in the execution
> path. So should we look to tune this part of the query plan? Though the
> "activity %" column in the sql monitor is not showing a major overall
> contribution from this section. So I'm a bit confused here, when trying to
> match the stats. And there too one of the index storage full access
> paths(IDX_PBO) for table TAB_PBO three times as it's used three times in
> the view definition as self outer joins, followed by the full scan on table
> TAB_PP_TD which results in 2million satisfied rows. All the joins on
> table TAB_PBO happen on column B_ID, but I think we also fetch the column
> B_NAME from this table in the view definition, thus it's using the
> composite index which is created on (B_name,B_ID ) and goes for "fast full
> scan". Basically I am trying to see if creating/modifying some index will
> help us, even if we read these 2million matching rows across all the
> tables/joins.
>
> I understand the points you shared in the last post are based on the "time
> active" and "start active" column and that shows the sum of the "Time
> Active" and "Start Active" coming as ~34 seconds and ~37 seconds in
> production run and manual run respectively. Still wondering, how did you
> interpret all the "HASH JOIN RIGHT OUTER" to be the bottleneck here ?
> But if we take the production run, the start active column for "sort
> unique" shows as ~22, so doesn't it mean that the sorting started at 22nd
> second and continued till next ~12 seconds(i.e. the "Times active"). That
> way the sorting lasted for ~12 seconds. Is my understanding wrong here?
>
> I am yet to try the parallel(6) option on the query though, but I am
> hoping adding the parallel(6) either within the view definition or in the
> main query will give a similar result. Correct me if I'm wrong.
>
>
> ======================================================================================================================================================================================================================
> | Id | Operation | Name |
> Rows | Cost | Time | Start | Execs | Rows | Read | Read |
> Mem | Activity | Activity Detail |
> | | | |
> (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes |
> (Max) | (%) | (# samples) |
>
> ======================================================================================================================================================================================================================
> | 23 | TABLE ACCESS STORAGE FULL | TAB_PR_COMP |
> 1M | 19146 | 3 | +4 | 1 | 1M | 892 | 858MB |
> 14MB | 3.03 | cell smart table scan (1) |
> | 24 | INDEX STORAGE FAST FULL SCAN | IDX_PBO |
> 8M | 11692 | 7 | +6 | 1 | 8M | 2542 | 539MB |
> . | 12.12 | Cpu (3) |
> | | | |
> | | | | | | | |
> | | cell multiblock physical read (1) |
> | 25 | TABLE ACCESS STORAGE FULL | TAB_PR_SITE |
> 1M | 9771 | 3 | +12 | 1 | 1M | 457 | 439MB |
> 14MB | | |
> | 26 | INDEX STORAGE FAST FULL SCAN | IDX_PBO |
> 8M | 11692 | 5 | +14 | 1 | 8M | 2192 | 147MB |
> . | 9.09 | cell multiblock physical read (2) |
> | | | |
> | | | | | | | |
> | | cell single block physical read: flash cache (1) |
> | 27 | INDEX STORAGE FAST FULL SCAN | IDX_PBO |
> 8M | 11692 | 5 | +18 | 1 | 8M | | |
> . | | |
> | 28 | TABLE ACCESS STORAGE FULL | TAB_PP_TD |
> 2M | 29829 | 12 | +22 | 1 | 2M | 1664 | 1GB |
> 14MB | | |
>
> ======================================================================================================================================================================================================================
>
> On Thu, Nov 9, 2023 at 1:21 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>>
>> I've learned something new today - I hadn't realised that the SQL Monitor
>> would report "DONE (FIRST N ROWS)" if you interrupted a query after the
>> first fetch, I would have guessed that a ctrl-C from SQL*Plus would have
>> resulted in "DONE (ERROR)". So the 'first N rows' really could be the
>> first N rows fetched out of 2M. Given that it's a sort unique (due to the
>> distinct combined with the "order by" and the optimizer getting clever) the
>> preceding hash join and entire 2M row sort must have completed before the
>> first row was sent to the user.
>>
>> Looking at your manual run: It's a bit of an approximation, but if you
>> check the start time and active time on the three hash join right outers
>> (operations 2, 9 and 16) and the start and active time for the last "probe"
>> table of the hash join at operation 16 (i.e. operation 32) you can see they
>> all sum to 37 seconds. Your Elapsed time according to the Global Stats is
>> 42 seconds, but the clock time is 58 seconds. So it's taking roughly 5
>> seconds to do the sorting, and 16 seconds to move the data,
>>
>> In the production run it's taking virtually no time to transfer the 329
>> rows to the front end; so the target database time to beat is the 33
>> seconds (or 34 seconds summing picking up the equivalent 4 operations), of
>> which about 5 seconds is in the sorting. So somewhere in the joins you
>> have to eliminate another 18 seconds if you want to come in under 10
>> seconds.
>>
>> Apart from the suggestion to use parallel query (around degree 6 to 8,
>> probably) the thing you need to do is examine the logic of the query and
>> see if you can find some way to eliminate data early, aggregate early, or
>> join late (after aggregating).
>>
>> Regards
>> Jonathan Lewis
>>
>>
>> P.S My comment about the scalar subquery transformation was wrong - I
>> misread the SQL and thought you had two inline views in the Select list,
>> but they're in the From clause.
>>
>> P.P.S. The "hint" to use 11.2.0.4 optimizer features is a comment, not a
>> hint. Which is it in the view definition.
>>
>>
>>
>>
>> On Wed, 8 Nov 2023 at 18:43, yudhi s <learnerdatabase99_at_gmail.com> wrote:
>>
>>> Thank you Jonathan.
>>>
>>> The first sql monitor is from the actual query which triggers from the
>>> applications UI screen which is why FIRST_N_ROWS optimization mode shows in
>>> the query, i believe. And that query is on a VIEW(VW_PPTG) with one filter
>>> on the predicate and "order by" on ~12 columns.
>>>
>>> The second sql monitor which i posted , is by executing the query
>>> manually. Also exposing the view-VW_PPTG definition outside and using
>>> directly in the query (that is why if you see the second query, it looks
>>> bigger with Joining ~10 tables). I did this way, as I was expecting some
>>> difference , But i don't see much in overall response time.
>>>
>>> So you are correct , the 329 rows from the first query sql monitor,
>>> because it stops early because of first_row optimization as it is executed
>>> from UI, in the first case.
>>>
>>> The "optimizer_features_enable('11.2.0.4')" hint is in the view
>>> definition and I have put it exactly the same way, when I ran it manually.
>>> but i also saw if i remove it , then also i don't see much of the
>>> difference in the execution time. This hint is there in the view
>>> definition, so i believe the first query execution also must have executed
>>> with the same hints inside the view definition.
>>>
>>> Thus data duplicate issue is not there i believe.
>>>
>>> Considering it really has to read those 2million and sort those , is it
>>> possible to get the response time for those first_n_rows within <10 seconds
>>> anyway?
>>>
>>> I am yet to fully understand the blog below which you highlighted. I
>>> will see if we can utilize something similar for tweaking the query in my
>>> case and get some benefit.
>>>
>>> https://jonathanlewis.wordpress.com/2010/05/18/double-trouble/
>>>
>>> On Wed, Nov 8, 2023 at 5:34 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>>> wrote:
>>>
>>>> Looking through the SQL Monitor output, it seems that you really do
>>>> want 2M rows generated and sorted, and the optimizer has done a couple of
>>>> clever things to make that happen as efficiently as possible.
>>>>
>>>> A couple of discrepancies, though -
>>>> 1 The first plan reports (FIRST N ROWS), the second reports (ALL ROWS)
>>>> in the Status.
>>>> 2 The first plan reports 329 rows returned, the second reports 2M. Is
>>>> this because of "stopping early", or because of the choice of data.
>>>> 3 If the latter why would there be so many duplicates when the query
>>>> was originally run ***
>>>> 4 The second query shows a comment (was it once a hint)
>>>> optimizer_features_enable('11.2.0.4'), but the plan shows scalar subquery
>>>> unnesting, which I think is a version 12 feature. Do you have some other
>>>> optimizer_features_enable set for the production query that might be
>>>> blocking a helpful transformation?
>>>>
>>>>
>>>> GIven point 3 it's worth asking if there's any scope for finding out
>>>> which table introduces the duplicates and replacing a reference to the
>>>> table-name appear with an inline view of the form (select distinct
>>>> column_list from tableX where ...) The optimizer calls this "distinct
>>>> placement" and can do it automatically, but possibly it doesn't see the
>>>> opportunity here, or maybe it's just not possible.
>>>>
>>>> The return of 329 rows (after reduction from 2M) is normal, while the
>>>> return of 2M rows is a very special (possibly unique) case. then you may be
>>>> able to find a way of using a "two-pass" approach to the problem. You call
>>>> for a distinct on 56 columns, and then order by 12 of them. If most of
>>>> those 44 "extra" columns come from a single table you may be able to find a
>>>> way creating an inline view that does a select distinct on a much smaller
>>>> number of columns (including the rowid from the one "large column list"
>>>> table) and then does a second join to that table with a much reduced set of
>>>> rows.
>>>>
>>>> There's a trivial example of the type of thing at: double trouble |
>>>> Oracle Scratchpad (wordpress.com)
>>>> <https://jonathanlewis.wordpress.com/2010/05/18/double-trouble/>
>>>> Your distinct, though, means it might not be possible to use this
>>>> pattern effectively.
>>>>
>>>>
>>>> Regards
>>>> Jonathan Lewis
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, 8 Nov 2023 at 04:15, yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Listers,
>>>>>
>>>>> It's version 19C of oracle.
>>>>> We have a search query which is getting submitted from the UI and the
>>>>> expected response time is <10 seconds to display the first ~200 odd rows on
>>>>> the screen, but it takes ~50 seconds for that. The sql monitor shows the
>>>>> cardinality estimation is as good as it can be. The query has ~10 tables
>>>>> used in the join conditions. I wanted to understand what else can be
>>>>> done(like query modification, design change etc) to make this query finish
>>>>> in <10 seconds response time.
>>>>>
>>>>> The time seems equally distributed in the plan, just that the sorting
>>>>> of the final result set before publishing is consuming ~20% of the
>>>>> execution time, but we need those data sets to be sorted in that order
>>>>> before showing up on the screen. So not sure what else can be done here, to
>>>>> improve this?
>>>>>
>>>>> The main query is based on the top of a view, but I even executed it
>>>>> manually by exposing the internal query which is used for creating the
>>>>> view. Not seeing much of a difference.
>>>>>
>>>>> Below I have published the query and its sql monitor.
>>>>>
>>>>>
>>>>> https://gist.github.com/databasetech0073/046b198e977d200c70778a973c223ed9
>>>>>
>>>>> Regards
>>>>> Yudhi
>>>>>
>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Nov 09 2023 - 11:50:53 CET

Original text of this message