Re: Is hash join slower in 12.2 compared to 11.2.0.3 or am i hitting a bug?

From: kunwar singh <krishsingh.111_at_gmail.com>
Date: Tue, 28 Jan 2020 13:31:16 -0500
Message-ID: <CAJSrDUp8c5D9qt9g2L8pKUZ4YKfjDfagWx7gHnVcmJn0DoPgUg_at_mail.gmail.com>



*correction , not another OS , but another machine on Linux OS only.

On Tue, Jan 28, 2020 at 1:28 PM kunwar singh <krishsingh.111_at_gmail.com> wrote:

> Hi Jonathan,Tanel,Stefan,Noveljic,
> Sorry for the delay as i was away from my laptop the whole day yday.
>
> Thanks for all the great pointers. I am reviewing them all. So it is
> pointing to something at the OS end which causing the problem as i tested
> the same query on another OS which had both version databases and it ran
> fine. So i thought of comparing this "FAST" system with the "SLOW" one
> which hosts the 12c test environment.
>
> Here are the checks so far :
>
> 1. All are virtual machines .
>
> 2. LSCPU output :
>
> Parameter |SLOW SYSTEM |FAST SYSTEM
> ----------------------|--------------------|------------
> Architecture: x86_64 x86_64
> CPU op-mode(s): 32-bit, 64-bit 32-bit, 64-bit
> Byte Order: Little Endian Little Endian
> CPU(s): 2 24
> << <<<<<Even though it is 2 vs 24 CPU , should it really make that
> much difference to the hash join , no one else is using these VMs than me.
> On-line CPU(s) list: 0,1 0-23
> Thread(s) per core: 1 1
> Core(s) per socket: 2 24
> Socket(s): 1 1
> NUMA node(s): 1 1
> Vendor ID: GenuineIntel GenuineIntel
> CPU family: 6 6
> Model: 79 85
> Stepping: 1 4
> CPU MHz: 2194.171 2095.855
> BogoMIPS: 4388.34 4191.71
> L1d cache: 32K 32K
> L1d cache: 32K 32K
> L2d cache: 6144K 6144K
> NUMA node0 CPU(s): 0,1 0-23
>
> 3. Few other details.
>
> Parameter |SLOW SYSTEM |FAST SYSTEM
> ----------------------|-----------------------------|------------
> uname -r |4.1.12-124.16.4.el6uek.x86_64|2.6.32-431.el6.x86_64
> /sys/devices/system/clocksource/clocksource0/current_clocksource |acpi_pm
> |acpi_pm
>
>
> 4. _rowsource_statistics_sampfreq is set to default 128
>
> On Mon, Jan 27, 2020 at 4:53 AM Jonathan Lewis <
> jonathan_at_jlcomp.demon.co.uk> wrote:
>
>>
>> Since Tanel has join in with a couple of points, this seems to be a good
>> moment to suggest that you try running the query and capturing the instance
>> activity stats (v$sesstat) and detailed wait events (v$session_wait) to
>> see if the session stats (in particular) tell you about some activity that
>> we're not expecting to see, or suggest a source for the excess CPU time.
>> And what better way to capture the information that Tanel's Snapper.
>>
>> Regards
>> Jonathan Lewis
>>
>> ________________________________________
>> From: kunwar singh <krishsingh.111_at_gmail.com>
>> Sent: 26 January 2020 14:59
>> To: Jonathan Lewis
>> Cc: ORACLE-L
>> Subject: Re: Is hash join slower in 12.2 compared to 11.2.0.3 or am i
>> hitting a bug?
>>
>> Thanks Jonathan.
>>
>> yes i do see good performance with px_join_filter as well.
>> SQL Monitoring Report
>>
>> SQL Text
>> ------------------------------
>> select /*+ gather_plan_statistics monitor leading(a,b) use_hash(a,b)
>> full(a) full(b) px_join_filter(b) */ *from TMP_emp a, tmp_dba_objects b
>> where a.empno=b.object_id
>>
>> Global Information
>> ------------------------------
>> Status : DONE (ALL ROWS)
>> Instance ID : 1
>> Session : KUNWAR (32:40079)
>> SQL ID : f88k408dc9qn4
>> SQL Execution ID : 16777216
>> Execution Started : 01/26/2020 15:37:23
>> First Refresh Time : 01/26/2020 15:37:23
>> Last Refresh Time : 01/26/2020 15:37:23
>> Duration : .31806s
>> Module/Action : SQL*Plus/-
>> Service : SYS$USERS
>> Fetch Calls : 4
>>
>> Global Stats
>>
>> ===========================================================================
>> | Elapsed | Cpu | IO | Other | Fetch | Buffer | Read | Read
>> |
>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Calls | Gets | Reqs | Bytes
>> |
>>
>> ===========================================================================
>> | 0.32 | 0.16 | 0.14 | 0.02 | 4 | 8735 | 92 | 68MB
>> |
>>
>> ===========================================================================
>>
>> SQL Plan Monitoring Details (Plan Hash Value=1496359809)
>>
>> =============================================================================================================================================================
>> | Id | Operation | Name | Rows | Cost | Time
>> | Start | Execs | Rows | Read | Read | Mem | Activity | Activity
>> Detail |
>> | | | | (Estim) | |
>> Active(s) | Active | | (Actual) | Reqs | Bytes | (Max) | (%) |
>> (# samples) |
>>
>> =============================================================================================================================================================
>> | 0 | SELECT STATEMENT | | | |
>> 1 | +0 | 1 | 32 | | | . | |
>> |
>> | 1 | HASH JOIN | | 32 | 2482 |
>> 1 | +0 | 1 | 32 | | | 297KB | |
>> |
>> | 2 | JOIN FILTER CREATE | :BF0000 | 1 | 3 |
>> 1 | +0 | 1 | 1 | | | . | |
>> |
>> | 3 | TABLE ACCESS FULL | TMP_EMP | 1 | 3 |
>> 1 | +0 | 1 | 1 | 2 | 16384 | . | |
>> |
>> | 4 | JOIN FILTER USE | :BF0000 | 640K | 2477 |
>> 1 | +0 | 1 | 32 | | | . | |
>> |
>> | 5 | TABLE ACCESS FULL | TMP_DBA_OBJECTS | 640K | 2477 |
>> 1 | +0 | 1 | 32 | 81 | 68MB | . | |
>> |
>>
>> =============================================================================================================================================================
>>
>> I will check into the OS & h/w difference. Is there a reference blog or
>> step by step checks that you can suggest?
>> Yes removing gather_plan_statistics didnt make a difference to the
>> performance.
>>
>>
>> On Sun, Jan 26, 2020 at 5:03 AM Jonathan Lewis <
>> jonathan_at_jlcomp.demon.co.uk<mailto:jonathan_at_jlcomp.demon.co.uk>> wrote:
>>
>> I've run up your test on 11.2.0.4 and 12.2.0.1 - there is basically no
>> difference between the two (though 12.2.0.1 is slightly faster, but the
>> variation by each version is similar to the difference between versions, so
>> not significant). CPU time is ca. 0.32 - 0.35 seconds.
>>
>> Tested with and without direct path reads, with and without
>> gather_plan_statistics.
>> I would have expected the impact of gathering rowsource execution
>> statistics to be minimal in all your test cases because I believe the
>> counters and timers are checked at the start and end of each call to a
>> child operation - and a hash join calls it's child operations just once
>> each.
>>
>> I suspect any problem may relate to O/S or hardware differences, that's
>> resulting in a difference with memory access as rows are moved up from
>> child to parent. (This might explain why the parallel query with Bloom
>> filter is faster - very few rows get moved from child to parent - you could
>> test this by adding the hint px_join_filter(b) to the SQL to force the
>> Bloom filter to appear in the (12.2) serial query.)
>>
>> Regards
>> Jonathan Lewis
>>
>> ________________________________________
>> From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>
>> <oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>> on
>> behalf of kunwar singh <krishsingh.111_at_gmail.com<mailto:
>> krishsingh.111_at_gmail.com>>
>> Sent: 26 January 2020 02:24
>> To: ORACLE-L
>> Subject: Is hash join slower in 12.2 compared to 11.2.0.3 or am i hitting
>> a bug?
>>
>> Hi Listers,
>>
>> I am doing a testing of migrating app for a customer from 11.2.0.3 to
>> 12.2 and find the queries doing hash join are taking upto 4x more time.
>> Hash join step is the one taking the time , more cpu and more elapsed time.
>> Is this a known issue or any bug? I searched in MOS , no hits or maybe i
>> am not using correct keywords to match.
>>
>> Note that i am doing testing on base releases of 11.2.0.3 & 12.2 for
>> simplifying my testcase.
>>
>> 1. Testcase
>> create table tmp_dba_objects AS SELECT *from dba_objects where rownum
>> <20001;
>> insert into tmp_dba_objects select *from tmp_dba_objects;
>> insert into tmp_dba_objects select *from tmp_dba_objects;
>> insert into tmp_dba_objects select *from tmp_dba_objects;
>> insert into tmp_dba_objects select *from tmp_dba_objects;
>> insert into tmp_dba_objects select *from tmp_dba_objects;
>>
>> create table TMP_EMP ( EMPNO NUMBER);
>> insert into TMP_EMP select min(object_id) from tmp_dba_objects;
>> commit;
>>
>> alter system flush shared_pool;
>> alter system flush buffer_cache;
>>
>> select /*+ gather_plan_statistics monitor leading(a,b) use_hash(a,b)
>> full(a) full(b) */ *from TMP_emp a, tmp_dba_objects b where
>> a.empno=b.object_id;
>> set lin 300 pages 9999
>> select *from table(dbms_xplan.display_cursor(NULL,NULL,'IOSTATS LAST'));
>>
>>
>> 2. SQL monitor report outputs
>>
>> SQL Monitoring Report -12.2.0.1
>>
>> SQL Text
>> ------------------------------
>> select /*+ gather_plan_statistics monitor leading(a,b) use_hash(a,b)
>> full(a) full(b) */ *from TMP_emp a, tmp_dba_objects b where
>> a.empno=b.object_id
>>
>> Global Information
>> ------------------------------
>> Status : DONE (ALL ROWS)
>> Instance ID : 1
>> Session : KUNWAR (40:7639)
>> SQL ID : 0257vnkax6zak
>> SQL Execution ID : 16777216
>> Execution Started : 01/26/2020 02:52:50
>> First Refresh Time : 01/26/2020 02:52:50
>> Last Refresh Time : 01/26/2020 02:52:58
>> Duration : 8s <<<<<<<
>> 8 seconds
>> Module/Action : SQL*Plus/-
>> Service : SYS$USERS
>> Program : sqlplus_at_devserver1 (TNS V1-V3)
>> Fetch Calls : 4
>>
>> Global Stats
>>
>> =========================================================================================
>> | Elapsed | Cpu | IO | Application | Other | Fetch | Buffer
>> | Read | Read |
>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Calls | Gets
>> | Reqs | Bytes |
>>
>> =========================================================================================
>> | 7.83 | 5.18 | 2.46 | 0.01 | 0.18 | 4 | 8742
>> | 91 | 68MB |
>>
>> =========================================================================================
>>
>> SQL Plan Monitoring Details (Plan Hash Value=1398591261)
>>
>> ============================================================================================================================================================
>> | Id | Operation | Name | Rows | Cost | Time
>> | Start | Execs | Rows | Read | Read | Mem | Activity | Activity
>> Detail |
>> | | | | (Estim) | |
>> Active(s) | Active | | (Actual) | Reqs | Bytes | (Max) | (%) |
>> (# samples) |
>>
>> ============================================================================================================================================================
>> | 0 | SELECT STATEMENT | | | |
>> 8 | +1 | 1 | 32 | | | . | |
>> |
>> | 1 | HASH JOIN | | 32 | 2482 |
>> 8 | +1 | 1 | 32 | | | 708KB | |
>> | -- Just as a side note that ASH is not getting
>> populated, due to some other reason
>> | 2 | TABLE ACCESS FULL | TMP_EMP | 1 | 3 |
>> 1 | +1 | 1 | 1 | 2 | 16384 | . | |
>> |
>> | 3 | TABLE ACCESS FULL | TMP_DBA_OBJECTS | 640K | 2477 |
>> 8 | +1 | 1 | 640K | 80 | 68MB | . | |
>> |
>>
>> ============================================================================================================================================================
>>
>>
>> ---
>> SQL Monitoring Report -11.2.0.3
>> SQL Text
>> ------------------------------
>> select /*+ gather_plan_statistics monitor leading(a,b) use_hash(a,b)
>> full(a) full(b) */ *from TMP_emp a, tmp_dba_objects b where
>> a.empno=b.object_id
>>
>> Global Information
>> ------------------------------
>> Status : DONE (ALL ROWS)
>> Instance ID : 1
>> Session : KUNWAR (147:18275)
>> SQL ID : 0257vnkax6zak
>> SQL Execution ID : 16777216
>> Execution Started : 01/26/2020 03:12:22
>> First Refresh Time : 01/26/2020 03:12:22
>> Last Refresh Time : 01/26/2020 03:12:24
>> Duration : 2s <<<<<<<
>> 2 seconds
>> Module/Action : SQL*Plus/-
>> Service : SYS$USERS
>> Program : sqlplus_at_devserver2.us.oracle.com<mailto:
>> sqlplus_at_devserver2.us.oracle.com><mailto:sqlplus_at_devserver2.us.oracle.com
>> <mailto:sqlplus_at_devserver2.us.oracle.com>> (TNS V1-V3)
>> Fetch Calls : 4
>>
>> Global Stats
>> ================================================================
>> | Elapsed | Cpu | IO | Fetch | Buffer | Read | Read |
>> | Time(s) | Time(s) | Waits(s) | Calls | Gets | Reqs | Bytes |
>> ================================================================
>> | 1.96 | 0.53 | 1.43 | 4 | 8844 | 83 | 69MB |
>> ================================================================
>>
>> SQL Plan Monitoring Details (Plan Hash Value=1398591261)
>>
>> =================================================================================================================================================================
>> | Id | Operation | Name | Rows | Cost | Time
>> | Start | Execs | Rows | Read | Read | Mem | Activity |
>> Activity Detail |
>> | | | | (Estim) | |
>> Active(s) | Active | | (Actual) | Reqs | Bytes | (Max) | (%) |
>> (# samples) |
>>
>> =================================================================================================================================================================
>> | 0 | SELECT STATEMENT | | | |
>> 3 | +0 | 1 | 32 | | | | |
>> |
>> | 1 | HASH JOIN | | 32 | 2494 |
>> 3 | +0 | 1 | 32 | | | 734K | |
>> |
>> | 2 | TABLE ACCESS FULL | TMP_EMP | 1 | 3 |
>> 1 | +0 | 1 | 1 | 2 | 49152 | | |
>> |
>> | 3 | TABLE ACCESS FULL | TMP_DBA_OBJECTS | 640K | 2487 |
>> 3 | +0 | 1 | 640K | 81 | 69MB | | 50.00 | direct
>> path read (1) |
>>
>> =================================================================================================================================================================
>>
>> Not sure what diagnostic to dig deeper, strace/perf? i am not much
>> familiar with these.
>> oradebug oradebug short_stack in 12.2 shows lots of rows like below :
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000392A8918<-6D70726B00796669
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-kpusattr()+58060<-6B6D70726B007966
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-kzftfpc()+108<-6B6D70726B007966
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-dmsglLRPersist()+1480<-6B6D70726B007966
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000016646A58<-6B6D70726B007966
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000001CD11420<-6B6D70726B007966
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002305FE48<-6B6D70726B007966
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002928CA8A<-6B6D70726B007966
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002F6031FE<-6B6D70726B007966
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000359BB229<-6B6D70726B007966
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000003FAF84<-736B6D70726B0079
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-kfclUnlinkBh()+1586<-736B6D70726B0079
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-kdstf111010100001000km()+14<-736B6D70726B0079
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000012BCB837<-736B6D70726B0079
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000018D8C1B8<-736B6D70726B0079
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000001EFA8E9F<-736B6D70726B0079
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000252DCF2E<-736B6D70726B0079
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002B574A8C<-736B6D70726B0079
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000317E01AF<-736B6D70726B0079
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000037C374DF<-736B6D70726B0079
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-kdiss_fetch()+4710<-76736B6D70726B00
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-qmxxtiAdjMKXTIArgs()+384<-76736B6D70726B00
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-qesgvslice_0_MIN_MO_DA_F()+6004<-76736B6D70726B00
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000014D30516<-76736B6D70726B00
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000001B002C0A<-76736B6D70726B00
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002125D372<-76736B6D70726B00
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-00000000276006C7<-76736B6D70726B00
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-000000002D93D64B<-76736B6D70726B00
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000033C3E57D<-76736B6D70726B00
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-0000000039EC64C6<-76736B6D70726B00
>>
>> ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-clock_gettime()+331<-clock_gettime()+54<-xsspfc_Comp504()+575<-5F76736B6D70726B
>>
>> I dont see these in 11.2
>>
>> While researching i am saw some twitter conversation of Frank Pachot and
>> thought this link could help , but i am not familar with tmux/gdb , so
>> still reading :) :
>> https://github.com/FranckPachot/demo/blob/master/TVD-JoinMethods.sql
>>
>> Maybe i will have to go to one of conferences like POUG for such advanced
>> techniques :D
>>
>> --
>> Cheers,
>> Kunwar
>>
>>
>> --
>> Cheers,
>> Kunwar
>> --
>> http://www.freelists.org/webpage/oracle-l
>>
>>
>>
>
> --
> Cheers,
> Kunwar
>

-- 
Cheers,
Kunwar

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jan 28 2020 - 19:31:16 CET

Original text of this message