Re: Estimations are as good as it can be, what else can be done here?

From: Mohamed Houri <mohamed.houri_at_gmail.com>
Date: Fri, 10 Nov 2023 09:39:24 +0100
Message-ID: <CAJu8R6gXiS3mSBgRXSYOmmR1+3EX-FrAaSAhkVLvRs6DgYBcLQ_at_mail.gmail.com>



Jonathan,
*I've learned something new today - I hadn't realised that the SQL Monitor would report "DONE (FIRST N ROWS)" if you interrupted a query after the first fetch, I would have guessed that a ctrl-C from SQL*Plus would have resulted in "DONE (ERROR)". So the 'first N rows' really could be the first N rows fetched out of 2M. Given that it's a sort unique (due to the distinct combined with the "order by" and the optimizer getting clever) the preceding hash join and entire 2M row sort must have completed before the first row was sent to the user.*

You’re right to say that ctrl-C from SQL*Plus would have resulted in "DONE (ERROR)”. The DONE (FIRST N ROWS) status is different. It appears not, when the query itself is abruptly stopped, but when the client that activated the query terminates the query before all rows are fetched. The Oracle definition is crystal clear

· DONE (FIRST N ROWS) - Execution terminated by the application before all rows were fetched

Simply put, if I launch a query from SQLPLUS, wait until a couple of lines are fetched, and then:

· Issue a Ctrl-C --> I will get DONE (ERROR)

· Close the SQL*PLus window --> I will get DONE (FIRST N ROWS)

SQL Text



select /*+ monitor */ object_id, sleep(1) from t1 where rownum<1000

Error: ORA-1013


ORA-01013: User requested cancel of current operation.
ORA-06512: at "SYS.DBMS_LOCK", line 215
ORA-06512: at "C##MHOURI.SLEEP", line 3


Global Information


 Status              :  DONE (ERROR)
 Instance ID         :  1
 Session             :  C##MHOURI (301:41126)
 SQL ID              :  0gudjbb2mguw9
 SQL Execution ID    :  16777217
 Execution Started   :  11/10/2023 07:42:49
 First Refresh Time  :  11/10/2023 07:42:49
 Last Refresh Time   :  11/10/2023 07:43:12
 Duration            :  23s
 Module/Action       :  SQL*Plus/-
 Service             :  SYS$USERS
 Program             :  sqlplus_at_localhost.localdomain (TNS V1-V3)
Fetch Calls         :  3


SQL Text



select /*+ monitor */ object_id, sleep(1) from t1 where rownum<1000

Global Information


 Status              :  DONE (FIRST N ROWS)
 Instance ID         :  1
 Session             :  C##MHOURI (41:19542)
 SQL ID              :  0gudjbb2mguw9
 SQL Execution ID    :  16777216
 Execution Started   :  11/10/2023 07:17:01
 First Refresh Time  :  11/10/2023 07:17:01
 Last Refresh Time   :  11/10/2023 07:17:31
 Duration            :  30s
 Module/Action       :  SQL*Plus/-
 Service             :  SYS$USERS
 Program             :  sqlplus_at_localhost.localdomain (TNS V1-V3)
 Fetch Calls         :  15


SELECT
    program,
    to_char(sql_exec_start, 'dd/mm/yyyy hh24:mi:ss') start_date,     status
FROM
    gv$sql_monitor
WHERE
    sql_id = '0gudjbb2mguw9';

PROGRAM                                                       START_DATE
             STATUS
-------------------------------------------
-----------------------     -------------------
SQL Developer                                                10/11/2023
08:16:33     DONE (FIRST N ROWS)
sqlplus_at_localhost.localdomain (TNS V1-V3)    10/11/2023 07:42:49     DONE
(ERROR)
sqlplus_at_localhost.localdomain (TNS V1-V3)    10/11/2023 07:17:01      DONE
(FIRST N ROWS)
SQL Developer                                               10/11/2023
08:12:37     DONE (ERROR)
SQL Developer                                               10/11/2023
08:13:59     DONE (ERROR)

 In passing, FIRST N ROWS in DONE (FIRST N ROWS) has absolutely nothing to do with the optimizer_mode parameter.

select * from table(dbms_xplan.display_cursor('0gudjbb2mguw9',1, format
=>'+outline'));

Outline Data


  /*+

      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('23.1.0')
      DB_VERSION('23.1.0')
      ALL_ROWS
      OUTLINE_LEAF(_at_"SEL$1")
      FULL(_at_"SEL$1" "T1"_at_"SEL$1")
      END_OUTLINE_DATA

  */

 Best regards

Mohamed Houri

Le ven. 10 nov. 2023 à 00:54, Jonathan Lewis <jlewisoracle_at_gmail.com> a écrit :

> 1) initially I'd be guided by time, not by bytes read.  For smart
> tablescans the time taken to read 2 GB could vary enormously depending on
> how much data it being extracted from the 2GB and sent back by the cell
> servers.
>
> 2) I wouldn't use the term "Bottleneck" I was just showing a breakdown of
> where the most significant fraction of time went. If you collapse the plan
> to hide some of the messy details the last "data access action" is the
> tablescan of tab_pp_td (operation 32) which starts after 25 seconds.  A row
> from that tablescan passes up to operation 20 which is a hash join that
> passes it up to operation 18, which passes it to operation 16 which passes
> it to operation 9 which passes it to operation 2 which passes it to the
> Sort Unique at operation 1.  (There's a note at Quiz Night | Oracle
> Scratchpad (wordpress.com)
> <https://jonathanlewis.wordpress.com/2010/12/10/quiz-night-10/> which
> explains the pattern and workload of a series of hash joins that you might
> find useful background reading).
>
> So each from the the tablescan of tab_pp_td arrives at the sort operation
> very quickly; and we can see that the tablescan takes only 12 seconds to
> complete. But the sort operation also starts after 25, but runs for 34
> seconds.   At this point I want to* change what I said about the sort
> completing*.  A Sort Unique uses the V1 "insertion sort", which means the
> sort operation finishes as the last row arrives from the hash join - the
> data is in unique sorted order at 25 + 12 = 3 seconds.
> We now compare the 37 seconds  when the sort completed with the 42 seconds
> reported as "elapsed time" (which should really be "DB time") and the 58
> seconds that would have been seen by the client and is also visible as 25 +
> 34 in the sort operation.
>
> Because it's a serial query we can say that the 58 seconds seen as the
> completion time for the query consists of 42 seconds of DB time seen by one
> Oracle process plus 16 seconds spent in (SQL*Net message from client); and
> the 42 seconds we get from Start Active + Time Active is the 37 seconds to
> the point where the sort completes plus 5 seconds of Oracle (CPU) time
> packing the result set into SQL*Net packets.
>
> Looking at the production run we see that the tablescan ends after 22 + 12
> seconds; which means the data is sorted and ready to send after 23 + 12
> seconds (as reported at the Sort operation) - after which it takes
> virtually no time for Oracle to pack a few rows and start them on their way
> to the client.
>
>
> Regards
> Jonathan Lewis
>
>
> On Wed, 8 Nov 2023 at 20:51, yudhi s <learnerdatabase99_at_gmail.com> wrote:
>
>> Thank You Jonathan.
>>
>> If I see the total "read bytes" , then out of ~4GB, the majority of the
>> "read bytes"(~2GB) were contributed by the bottom 6 lines in the execution
>> path. So should we look to tune this part of the query plan? Though the
>> "activity %" column in the sql monitor is not showing a major overall
>> contribution from this section. So I'm a bit confused here, when trying to
>> match the stats. And there too one of the index storage full access
>> paths(IDX_PBO) for table TAB_PBO three times as it's used three times in
>> the view definition as self outer joins, followed by the full scan on table
>>  TAB_PP_TD which results in 2million satisfied rows. All the joins on
>> table TAB_PBO happen on column B_ID, but I think we also fetch the column
>> B_NAME from this table in the view definition, thus it's using the
>> composite index which is created on (B_name,B_ID ) and goes for "fast full
>> scan". Basically I am trying to see if creating/modifying some index will
>> help us, even if we read these 2million matching rows across all the
>> tables/joins.
>>
>> I understand the points you shared in the last post are based on the
>> "time active" and "start active" column and that shows the sum of the "Time
>> Active" and "Start Active" coming as ~34 seconds and ~37 seconds  in
>> production run and manual run respectively. Still wondering, how did you
>> interpret all the "HASH JOIN RIGHT OUTER" to be the bottleneck here ?
>> But if we take the production run, the start active column for "sort
>> unique" shows as ~22, so doesn't it mean that the sorting started at 22nd
>> second and continued till next ~12 seconds(i.e. the "Times active"). That
>> way the sorting lasted for ~12 seconds. Is my understanding wrong here?
>>
>>
>>>>>>

-- 

Houri Mohamed

Oracle DBA-Developer-Performance & Tuning

Visit My         - Blog <http://www.hourim.wordpress.com/>

Let's Connect -
<http://fr.linkedin.com/pub/mohamed-houri/11/329/857/>*Linkedin
Profile <http://fr.linkedin.com/pub/mohamed-houri/11/329/857/>*

My Twitter <https://twitter.com/MohamedHouri>      - MohamedHouri
<https://twitter.com/MohamedHouri>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Nov 10 2023 - 09:39:24 CET

Original text of this message