Re: Is hash join slower in 12.2 compared to 11.2.0.3 or am i hitting a bug?

From: kunwar singh <krishsingh.111_at_gmail.com>
Date: Tue, 28 Jan 2020 13:28:35 -0500
Message-ID: <CAJSrDUpULARjJLC8PPEQJ89w79r_Pr4xFXQ8cAHPoWjriOTsaw_at_mail.gmail.com>



Hi Jonathan,Tanel,Stefan,Noveljic,
Sorry for the delay as i was away from my laptop the whole day yday.

Thanks for all the great pointers. I am reviewing them all. So it is pointing to something at the OS end which causing the problem as i tested the same query on another OS which had both version databases and it ran fine. So i thought of comparing this "FAST" system with the "SLOW" one which hosts the 12c test environment.

Here are the checks so far :

  1. All are virtual machines .
  2. LSCPU output :

Parameter |SLOW SYSTEM |FAST SYSTEM

----------------------|--------------------|------------
Architecture:          x86_64               x86_64
CPU op-mode(s):        32-bit, 64-bit       32-bit, 64-bit
Byte Order:            Little Endian        Little Endian
CPU(s):                2                    24
      << <<<<<Even though it is 2 vs 24 CPU , should it really make that
much difference to the hash join , no one else is using these VMs than me.
On-line CPU(s) list:   0,1                  0-23
Thread(s) per core:    1                    1
Core(s) per socket:    2                    24
Socket(s):             1                    1
NUMA node(s):          1                    1
Vendor ID:             GenuineIntel         GenuineIntel
CPU family:            6                    6
Model:                 79                   85
Stepping:              1                    4
CPU MHz:               2194.171             2095.855
BogoMIPS:              4388.34              4191.71
L1d cache:             32K                  32K
L1d cache:             32K                  32K
L2d cache:             6144K                6144K
NUMA node0 CPU(s):     0,1                  0-23

3. Few other details.

Parameter             |SLOW SYSTEM                  |FAST SYSTEM
----------------------|-----------------------------|------------
uname -r              |4.1.12-124.16.4.el6uek.x86_64|2.6.32-431.el6.x86_64
/sys/devices/system/clocksource/clocksource0/current_clocksource |acpi_pm |acpi_pm

4. _rowsource_statistics_sampfreq is set to default 128

On Mon, Jan 27, 2020 at 4:53 AM Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk> wrote:

>
> Since Tanel has join in with a couple of points, this seems to be a good
> moment to suggest that you try running the query and capturing the instance
> activity stats (v$sesstat) and detailed wait events (v$session_wait) to
> see if the session stats (in particular) tell you about some activity that
> we're not expecting to see, or suggest a source for the excess CPU time.
> And what better way to capture the information that Tanel's Snapper.
>
> Regards
> Jonathan Lewis
>
> ________________________________________
> From: kunwar singh <krishsingh.111_at_gmail.com>
> Sent: 26 January 2020 14:59
> To: Jonathan Lewis
> Cc: ORACLE-L
> Subject: Re: Is hash join slower in 12.2 compared to 11.2.0.3 or am i
> hitting a bug?
>
> Thanks Jonathan.
>
> yes i do see good performance with px_join_filter as well.
> SQL Monitoring Report
>
> SQL Text
> ------------------------------
> select /*+ gather_plan_statistics monitor leading(a,b) use_hash(a,b)
> full(a) full(b) px_join_filter(b) */ *from TMP_emp a, tmp_dba_objects b
> where a.empno=b.object_id
>
> Global Information
> ------------------------------
> Status : DONE (ALL ROWS)
> Instance ID : 1
> Session : KUNWAR (32:40079)
> SQL ID : f88k408dc9qn4
> SQL Execution ID : 16777216
> Execution Started : 01/26/2020 15:37:23
> First Refresh Time : 01/26/2020 15:37:23
> Last Refresh Time : 01/26/2020 15:37:23
> Duration : .31806s
> Module/Action : SQL*Plus/-
> Service : SYS$USERS
> Fetch Calls : 4
>
> Global Stats
> ===========================================================================
> | Elapsed | Cpu | IO | Other | Fetch | Buffer | Read | Read |
> | Time(s) | Time(s) | Waits(s) | Waits(s) | Calls | Gets | Reqs | Bytes |
> ===========================================================================
> | 0.32 | 0.16 | 0.14 | 0.02 | 4 | 8735 | 92 | 68MB |
> ===========================================================================
>
> SQL Plan Monitoring Details (Plan Hash Value=1496359809)
>
> =============================================================================================================================================================
> | Id | Operation | Name | Rows | Cost | Time
> | Start | Execs | Rows | Read | Read | Mem | Activity | Activity
> Detail |
> | | | | (Estim) | |
> Active(s) | Active | | (Actual) | Reqs | Bytes | (Max) | (%) |
> (# samples) |
>
> =============================================================================================================================================================
> | 0 | SELECT STATEMENT | | | |
> 1 | +0 | 1 | 32 | | | . | |
> |
> | 1 | HASH JOIN | | 32 | 2482 |
> 1 | +0 | 1 | 32 | | | 297KB | |
> |
> | 2 | JOIN FILTER CREATE | :BF0000 | 1 | 3 |
> 1 | +0 | 1 | 1 | | | . | |
> |
> | 3 | TABLE ACCESS FULL | TMP_EMP | 1 | 3 |
> 1 | +0 | 1 | 1 | 2 | 16384 | . | |
> |
> | 4 | JOIN FILTER USE | :BF0000 | 640K | 2477 |
> 1 | +0 | 1 | 32 | | | . | |
> |
> | 5 | TABLE ACCESS FULL | TMP_DBA_OBJECTS | 640K | 2477 |
> 1 | +0 | 1 | 32 | 81 | 68MB | . | |
> |
>
> =============================================================================================================================================================
>
> I will check into the OS & h/w difference. Is there a reference blog or
> step by step checks that you can suggest?
> Yes removing gather_plan_statistics didnt make a difference to the
> performance.
>
>
> On Sun, Jan 26, 2020 at 5:03 AM Jonathan Lewis <
> jonathan_at_jlcomp.demon.co.uk<mailto:jonathan_at_jlcomp.demon.co.uk>> wrote:
>
> I've run up your test on 11.2.0.4 and 12.2.0.1 - there is basically no
> difference between the two (though 12.2.0.1 is slightly faster, but the
> variation by each version is similar to the difference between versions, so
> not significant). CPU time is ca. 0.32 - 0.35 seconds.
>
> Tested with and without direct path reads, with and without
> gather_plan_statistics.
> I would have expected the impact of gathering rowsource execution
> statistics to be minimal in all your test cases because I believe the
> counters and timers are checked at the start and end of each call to a
> child operation - and a hash join calls it's child operations just once
> each.
>
> I suspect any problem may relate to O/S or hardware differences, that's
> resulting in a difference with memory access as rows are moved up from
> child to parent. (This might explain why the parallel query with Bloom
> filter is faster - very few rows get moved from child to parent - you could
> test this by adding the hint px_join_filter(b) to the SQL to force the
> Bloom filter to appear in the (12.2) serial query.)
>
> Regards
> Jonathan Lewis
>
> ________________________________________
> From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>
> <oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>> on
> behalf of kunwar singh <krishsingh.111_at_gmail.com<mailto:
> krishsingh.111_at_gmail.com>>
> Sent: 26 January 2020 02:24
> To: ORACLE-L
> Subject: Is hash join slower in 12.2 compared to 11.2.0.3 or am i hitting
> a bug?
>
> Hi Listers,
>
> I am doing a testing of migrating app for a customer from 11.2.0.3 to 12.2
> and find the queries doing hash join are taking upto 4x more time. Hash
> join step is the one taking the time , more cpu and more elapsed time.
> Is this a known issue or any bug? I searched in MOS , no hits or maybe i
> am not using correct keywords to match.
>
> Note that i am doing testing on base releases of 11.2.0.3 & 12.2 for
> simplifying my testcase.
>
> 1. Testcase
> create table tmp_dba_objects AS SELECT *from dba_objects where rownum
> <20001;
> insert into tmp_dba_objects select *from tmp_dba_objects;
> insert into tmp_dba_objects select *from tmp_dba_objects;
> insert into tmp_dba_objects select *from tmp_dba_objects;
> insert into tmp_dba_objects select *from tmp_dba_objects;
> insert into tmp_dba_objects select *from tmp_dba_objects;
>
> create table TMP_EMP ( EMPNO NUMBER);
> insert into TMP_EMP select min(object_id) from tmp_dba_objects;
> commit;
>
> alter system flush shared_pool;
> alter system flush buffer_cache;
>
> select /*+ gather_plan_statistics monitor leading(a,b) use_hash(a,b)
> full(a) full(b) */ *from TMP_emp a, tmp_dba_objects b where
> a.empno=b.object_id;
> set lin 300 pages 9999
> select *from table(dbms_xplan.display_cursor(NULL,NULL,'IOSTATS LAST'));
>
>
> 2. SQL monitor report outputs
>
> SQL Monitoring Report -12.2.0.1
>
> SQL Text
> ------------------------------
> select /*+ gather_plan_statistics monitor leading(a,b) use_hash(a,b)
> full(a) full(b) */ *from TMP_emp a, tmp_dba_objects b where
> a.empno=b.object_id
>
> Global Information
> ------------------------------
> Status : DONE (ALL ROWS)
> Instance ID : 1
> Session : KUNWAR (40:7639)
> SQL ID : 0257vnkax6zak
> SQL Execution ID : 16777216
> Execution Started : 01/26/2020 02:52:50
> First Refresh Time : 01/26/2020 02:52:50
> Last Refresh Time : 01/26/2020 02:52:58
> Duration : 8s <<<<<<< 8
> seconds
> Module/Action : SQL*Plus/-
> Service : SYS$USERS
> Program : sqlplus_at_devserver1 (TNS V1-V3)
> Fetch Calls : 4
>
> Global Stats
>
> =========================================================================================
> | Elapsed | Cpu | IO | Application | Other | Fetch | Buffer |
> Read | Read |
> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Calls | Gets |
> Reqs | Bytes |
>
> =========================================================================================
> | 7.83 | 5.18 | 2.46 | 0.01 | 0.18 | 4 | 8742
> | 91 | 68MB |
>
> =========================================================================================
>
> SQL Plan Monitoring Details (Plan Hash Value=1398591261)
>
> ============================================================================================================================================================
> | Id | Operation | Name | Rows | Cost | Time
> | Start | Execs | Rows | Read | Read | Mem | Activity | Activity
> Detail |
> | | | | (Estim) | | Active(s)
> | Active | | (Actual) | Reqs | Bytes | (Max) | (%) | (#
> samples) |
>
> ============================================================================================================================================================
> | 0 | SELECT STATEMENT | | | | 8
> | +1 | 1 | 32 | | | . | |
> |
> | 1 | HASH JOIN | | 32 | 2482 | 8
> | +1 | 1 | 32 | | | 708KB | |
> | -- Just as a side note that ASH is not getting
> populated, due to some other reason
> | 2 | TABLE ACCESS FULL | TMP_EMP | 1 | 3 | 1
> | +1 | 1 | 1 | 2 | 16384 | . | |
> |
> | 3 | TABLE ACCESS FULL | TMP_DBA_OBJECTS | 640K | 2477 | 8
> | +1 | 1 | 640K | 80 | 68MB | . | |
> |
>
> ============================================================================================================================================================
>
>
> ---
> SQL Monitoring Report -11.2.0.3
> SQL Text
> ------------------------------
> select /*+ gather_plan_statistics monitor leading(a,b) use_hash(a,b)
> full(a) full(b) */ *from TMP_emp a, tmp_dba_objects b where
> a.empno=b.object_id
>
> Global Information
> ------------------------------
> Status : DONE (ALL ROWS)
> Instance ID : 1
> Session : KUNWAR (147:18275)
> SQL ID : 0257vnkax6zak
> SQL Execution ID : 16777216
> Execution Started : 01/26/2020 03:12:22
> First Refresh Time : 01/26/2020 03:12:22
> Last Refresh Time : 01/26/2020 03:12:24
> Duration : 2s <<<<<<<
> 2 seconds
> Module/Action : SQL*Plus/-
> Service : SYS$USERS
> Program : sqlplus_at_devserver2.us.oracle.com<mailto:
> sqlplus_at_devserver2.us.oracle.com><mailto:sqlplus_at_devserver2.us.oracle.com
> <mailto:sqlplus_at_devserver2.us.oracle.com>> (TNS V1-V3)
> Fetch Calls : 4
>
> Global Stats
> ================================================================
> | Elapsed | Cpu | IO | Fetch | Buffer | Read | Read |
> | Time(s) | Time(s) | Waits(s) | Calls | Gets | Reqs | Bytes |
> ================================================================
> | 1.96 | 0.53 | 1.43 | 4 | 8844 | 83 | 69MB |
> ================================================================
>
> SQL Plan Monitoring Details (Plan Hash Value=1398591261)
>
> =================================================================================================================================================================
> | Id | Operation | Name | Rows | Cost | Time
> | Start | Execs | Rows | Read | Read | Mem | Activity | Activity
> Detail |
> | | | | (Estim) | | Active(s)
> | Active | | (Actual) | Reqs | Bytes | (Max) | (%) | (#
> samples) |
>
> =================================================================================================================================================================
> | 0 | SELECT STATEMENT | | | | 3
> | +0 | 1 | 32 | | | | |
> |
> | 1 | HASH JOIN | | 32 | 2494 | 3
> | +0 | 1 | 32 | | | 734K | |
> |
> | 2 | TABLE ACCESS FULL | TMP_EMP | 1 | 3 | 1
> | +0 | 1 | 1 | 2 | 49152 | | |
> |
> | 3 | TABLE ACCESS FULL | TMP_DBA_OBJECTS | 640K | 2487 | 3
> | +0 | 1 | 640K | 81 | 69MB | | 50.00 | direct path
> read (1) |
>
> =================================================================================================================================================================
>
> Not sure what diagnostic to dig deeper, strace/perf? i am not much
> familiar with these.
> oradebug oradebug short_stack in 12.2 shows lots of rows like below :
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000392A8918<-6D70726B00796669
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-kpusattr()+58060<-6B6D70726B007966
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-kzftfpc()+108<-6B6D70726B007966
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-dmsglLRPersist()+1480<-6B6D70726B007966
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000016646A58<-6B6D70726B007966
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000001CD11420<-6B6D70726B007966
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002305FE48<-6B6D70726B007966
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002928CA8A<-6B6D70726B007966
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002F6031FE<-6B6D70726B007966
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000359BB229<-6B6D70726B007966
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000003FAF84<-736B6D70726B0079
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-kfclUnlinkBh()+1586<-736B6D70726B0079
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-kdstf111010100001000km()+14<-736B6D70726B0079
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000012BCB837<-736B6D70726B0079
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000018D8C1B8<-736B6D70726B0079
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000001EFA8E9F<-736B6D70726B0079
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000252DCF2E<-736B6D70726B0079
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002B574A8C<-736B6D70726B0079
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000317E01AF<-736B6D70726B0079
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000037C374DF<-736B6D70726B0079
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-kdiss_fetch()+4710<-76736B6D70726B00
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-qmxxtiAdjMKXTIArgs()+384<-76736B6D70726B00
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-qesgvslice_0_MIN_MO_DA_F()+6004<-76736B6D70726B00
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000014D30516<-76736B6D70726B00
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000001B002C0A<-76736B6D70726B00
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002125D372<-76736B6D70726B00
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000276006C7<-76736B6D70726B00
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002D93D64B<-76736B6D70726B00
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000033C3E57D<-76736B6D70726B00
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000039EC64C6<-76736B6D70726B00
>
> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-xsspfc_Comp504()+575<-5F76736B6D70726B
>
> I dont see these in 11.2
>
> While researching i am saw some twitter conversation of Frank Pachot and
> thought this link could help , but i am not familar with tmux/gdb , so
> still reading :) :
> https://github.com/FranckPachot/demo/blob/master/TVD-JoinMethods.sql
>
> Maybe i will have to go to one of conferences like POUG for such advanced
> techniques :D
>
> --
> Cheers,
> Kunwar
>
>
> --
> Cheers,
> Kunwar
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

-- 
Cheers,
Kunwar

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jan 28 2020 - 19:28:35 CET

Original text of this message