Randolf Geist

Subscribe to Randolf Geist feed
Updated: 8 min 55 sec ago

DOAG 2016 - "Oracle Database Cloud Performance" presentation material

Sat, 2016-11-19 15:23
Thanks to all attendees that came to my presentation "Oracle Database Cloud Performance" at the DOAG conference 2016. You can check the presentation material here at Slideshare.

Oracle Database Cloud (DBaaS) Performance Consistency - Part 7

Sun, 2016-11-06 17:13
This part of the series is supposed to cover the results of I/O related tests performed on the Amazon RDS Oracle cloud instance.
As mentioned in the previous part of this series I've only used the "General Purpose SSD" storage type since the "Provisioned IOPS" storage was simply to expensive to me and it wasn't possible to get a trial license for that storage type.

Now one key aspect of the "General Purpose SSD" storage offered by Amazon RDS is that it is limited to a peak IOPS performance of 3.000 IOPS, and this peak performance will be delivered only for a limited amount of time if you happen to have storage allocated less than 1.000GB - more details can be found on the Amazon RDS documentation.
So in my case since I did a comparison to the Oracle Cloud setup with just 100GB of storage allocated this configuration was only capable of sustaining the peak performance of 3.000 IOPS for a maximum of 2.000 seconds, which is a little more than 30 minutes.
After that the IOPS rate gets limited to the base rate, which in my case here was just 300 IOPS.
Effectively this meant with that configuration I could simply not perform the same I/O heavy tests reasonably - each loop iteration would have taken hours, and then again I didn't plan to run the tests for a very long time since it would simply become too expensive.
To give an idea how running the same test setup would have performed on this Amazon RDS "General Purpose SSD" storage, here is the individual thread performance for the "read-only" I/O test (for details see this blog post, at the end of that post you can also find the median runtimes for the Oracle Cloud DBaaS offering):
So after using up the "I/O credits" for the peak IOPS performance, which is still 10 times slower than the IOPS rate of the Oracle Clould DBaaS offering (approximately 640 seconds per iteration vs. 64 seconds, 3.000 vs. 30.000 IOPS), the rate drops to 300+ IOPS, meaning loop iteration runtimes of 4.500+ seconds - this more than one hour! to run a single iteration of the test that takes slightly more than a minute on the Oracle Cloud DBaaS offering.
I stopped the I/O tests there since I didn't see a point in continuing this further - the two offerings simply can't be compared on that basis. It would have been different when using the "Provisioned IOPS" storage configuration, but that was simply too expensive for such a small test.

"Cost Based Optimizer: Grundlagen – mit Update für Oracle 12c" Artikel (German)

Tue, 2016-11-01 18:31
Seit gestern steht auf der "Informatik Aktuell"-Seite mein Artikel "Cost Based Optimizer: Grundlagen – mit Update für Oracle 12c" zur Verfügung.

Er stimmt auch inhaltlich auf meinen Vortrag bei den IT-Tagen 2016 am 13. Dezember ein.

Sollte Sie das Thema interessieren, lade ich Sie hiermit also herzlich zu meinem Vortrag ein, bei dem ich das Thema auch mit Live-Demonstrationen vertiefen werde.

Oracle Database Cloud (DBaaS) Performance Consistency - Part 6

Sun, 2016-10-23 16:42
In the next parts of this series I'll have a look at the results of similar performance consistency tests performed on a comparable Amazon RDS Oracle cloud database instance.

Let's start with the configuration used, which was an Amazon RDS "db.m4.2xlarge" instance - the latest RDS generation available at time of testing. This is an 8 VCPUs resp. Hyper Threads / 4 cores configuration with 32GB of RAM.
As storage I've used the "General Purpose SSD" setup. Although there is a "Provisioned IOPS" configuration available, too, I haven't used for testing because according to the price calculator available it would have costed me several hundred dollar even for a very short period of testing because some part of the configuration gets charged per month, no matter how long I actually use it. I'll get back to that point when dealing with the I/O related tests and the raw performance figures.
Therefore I've only run the tests on the "General Purpose SSD" configuration - more on that in later installments.
Even without the "Provisioned IOPS" storage the tests were only conducted for several hours instead of several days, therefore they can't be compared directly to the other test results.
The first test conducted was the "PL/SQL" CPU burning test I've outlined in the first part of this series.
The overall consistency result looked like this:

Again the graph is supposed to show how many of the runs deviated how much from the overall median runtime. The results - at least for the short period of the test run - are very consistent, similar to the other test results.
Looking at the results on a more granular basis is only of limited use in this case due to the short test run time, hence the resolution is changed to hours instead of days here:
There were a couple of hours that show a very similar behaviour and again another set of hours with rather a similar profile, and there is an hour with a rather unusual profile.
Finally, looking at the runtime of the individual threads:
Except for one threads all other show a very similar runtime, also the runtime is very consistent.
It should be noted however, that the actual raw performance of each thread is pretty different from that of the Oracle DBaaS offering based on 4 OCPUs, because the DBaaS offering actually provides 8 cores, and hence scales much better when running with 8 threads - see the corresponding previous test results. The Amazon RDS raw performance is more comparable to that of the 4 cores physical host used in the previous tests for comparison.

Oracle Database Cloud (DBaaS) Performance Consistency - Part 5

Sun, 2016-10-16 17:17
This is the fifth part of this installment, and before coming to comparisons to other cloud providers, in this part I show the results of the read-only test that I've already described in part three of this series, but repeated at a later point in time. The test setup was identical and can be checked in the mentioned previous part.

The reason for running the test again was the fact that I was informed during the first test run that the zone that my Oracle Cloud domain was assigned to was temporarily overloaded, which I also noticed since I wasn't able to create new services for some time.

Hence I decided to repeat the tests after it was confirmed that the issue got resolved.

So here is a comparison of the second test run results compared to the first test run - the corresponding results for the dedicated physical host can be found in part three.

Overall consistency second run:


Overall consistency first run:


Again the graph is supposed to show how many of the runs deviated how much from the overall median runtime. The second run shows a much improved overall consistency much closer to that of the dedicated physical host, although a few outliers are more extreme than in the first run,

The same data on per day basis, second run:


First run:


Except for the first two days the second run shows a pretty consistent behaviour per day, and it becomes obvious that it's the second day of the second run that is responsible for the extreme outliers and also the significant inconsistency in the area between 8 and 12 percent deviation.

Finally the individual thread performance, second run:


First run:


The individual thread performance graph confirms the much improved consistency compared to the first run.

Upcoming Public Appearances

Sun, 2016-10-09 16:54
It's that time of the year again... I'll be speaking at the following events:

- SSWUG 2016 Fall Virtual Conference: On November 15th I'll do the geeky stuff that was rejected by DOAG this year, which is "Oracle Database Parallel Execution plans deep dive - understanding the internals of Parallel Execution plans". You should only attend this session if you already have a good understanding of serial execution plans.

- DOAG Conference 2016: On November the 16th I'll present "Oracle Database Cloud Performance" where I talk about my experience regarding the performance of the Oracle DBaaS offerings. Note that some printed conference schedule material is outdated, I'll speak at 3pm, not 10am

- IT Tage 2016: On December 13th I'll present "Oracle 12c - Cost Based Optimizer Grundlagen" which describes the basics of the Cost Based Optimizer with an update how things might be different with Oracle 12c

Hope to see you at some of these events - if you're going to be there, say Hello!

Oracle Database Cloud (DBaaS) Performance Consistency - Part 4

Sun, 2016-10-02 18:07
This is the fourth part of this installment, comparing the performance consistency of the DBaaS cloud offering with a dedicated physical host. This time the previous read-only test was modified to be a 100% update read-write test. So every row read was updated in the following way:

begin
  loop
    for rec in (
          select /*+
              index(t_o)
           */
          id_fk
    from
          t_o
    ) loop
      update t_i&tabname t_i
      set n = rec.id_fk
      where id = rec.id_fk;
    end loop;
    insert into timings(testtype, thread_id, ts) values ('&testtype', &thread_id, systimestamp);
    commit;
  end loop;
end;
/

The setup of the tables T_O and T_I<n> was identical to the previous read-only test and can be checked in part 3 of this series. Again this was executed by several sessions concurrently, this time by 8 sessions on both the DBaaS database and the dedicated physical host - since this wasn't CPU bound and due to better write performance the dedicated physical host could keep up with the load at 8 sessions.
Due to the minimal buffer cache used, the way the data and test is crafted this setup puts maximum pressure of the DBWR background process, due to the constant need to read new blocks into the small buffer cache and write dirty blocks at the same time. By choosing a large block size of 16KB and at the same time minimizing the actual data volume the log writer didn't play a role in this test here.
The test showed that for this setup the DBWR for the DBaaS database was not capable of writing the data fast enough, so that significant time was spent on "free buffer waits", waiting for the DBWR to free up buffers by writing dirty blocks to disk, amplified by the faster read I/O of the DBaaS service compared to the dedicated physical host. The DBaaS storage obviously is optimized for reads - having to write encrypted blocks makes things not faster - the dedicated physical host didn't use TDE encryption. In a separate Parallel Execution test the DBaaS service performed a maximum read throughput of 640 MB per second and 120 MB per second write throughput.
A sample AWR Top profile looked like this:
DBaaS:


Physical Host:



Unfortunately this test could only be executed for a couple of hours on the dedicated physical host and had then to be stopped due to technical problems, distorting some of the following charts.
The overall consistency graph looked like this:

DBaaS:

Physical host:


Again the graph is supposed to show how many of the runs deviated how much from the overall median runtime. Again the DBaaS service shows a significantly different profile, but this time performed more consistently than in the previous read-only test - the probable reason for the difference will be explained in a later part of this series. The DBaaS service this had some extreme outliers (max. >  160 percent deviation) that I didn't include in the graph to not distort the graph too much - but it were just a few runs out of more than 25.000 that were affected.

The same data on per day (per hour for the physical host) basis:

DBaaS:


Physical host:


As mentioned above the physical host data covers only a few hours and hence can't really be compared to the DBaaS data covering more than a week of data.

Finally the individual thread performance:

DBaaS:


Physical host:


Both environments show pretty stable run times per thread over time, and the graphs confirm what is mentioned above: The physical host this time outperforms the DBaaS, however, as already mentioned, a slightly unfair comparison as the physical host didn't had to read and write encrypted blocks.

The next parts will cover comparisons to other cloud providers.

Video Tutorial: XPLAN_ASH Active Session History - Part 12

Sat, 2016-09-24 17:27
The final part of the video tutorial explaining the XPLAN_ASH Active Session History functionality continuing the actual walk-through of the script output.


This is the final part of this tutorial, but there are more tutorials coming - about configuring the script, script internals and also the Rowsource Statistics mode of the script.

Video Tutorial: XPLAN_ASH Active Session History - Part 11

Sun, 2016-09-18 15:00
The next part of the video tutorial explaining the XPLAN_ASH Active Session History functionality continuing the actual walk-through of the script output.

More parts to follow.


Posts by Alberto Dell'Era On ASH Wait Times And Materialized Views

Sun, 2016-09-11 15:23
This is just a short note that links to different articles on Alberto Dell'Era's blog. Alberto is a long time member of the OakTable and has published a number of posts I find very useful and therefore are linked here:

ASH wait times (including an ASH simulator!)

Posts on Materialized Views

Video Tutorial: XPLAN_ASH Active Session History - Part 10

Sun, 2016-09-04 16:49
The next part of the video tutorial explaining the XPLAN_ASH Active Session History functionality continuing the actual walk-through of the script output.

More parts to follow.

Oracle Database Cloud (DBaaS) Performance Consistency - Part 3

Sun, 2016-08-28 16:43
This is the third part of this installment, comparing the performance consistency of the DBaaS cloud offering with a dedicated physical host. This time instead of burning CPU using logical I/O only (see part 2) the test harness executes a SQL statement that has to perform physical I/O mostly, and to be more precise this round is a read-only test, so doesn't include any modification to data (except the logging and any other background system generated write activity, like maintaining ASH /AWR data etc.).

In order to maximize the physical I/O part the database instance was configured with a minimum sized buffer cache (16k block size in this case) and the following script was executed as many times as CPUs were available:

declare
  n number;
begin
  loop
    select /*+
              leading(t_o)
              use_nl(t_i)
              index(t_o)
              index(t_i)
          */
          sum(t_i.n)
          into n
    from
          t_o
        , t_i&tabname t_i
    where
          t_o.id_fk = t_i.id;
    insert into timings(testtype, thread_id, ts) values ('&testtype', &thread_id, systimestamp);
    commit;
  end loop;
end;
/

The outer table T_O was created like this:

create table t_o (id primary key, id_fk)
organization index
as
select
       rownum as id
     , case mod(rownum, 2) + 1
       when 1
       then mod(rownum, &tab_size / 20)
       else &tab_size / 10 - mod(rownum, &tab_size / 20) + 1
       end as id_fk
from dual

connect by level <= &tab_size;

Each thread got its dedicated inner table of the Nested Loop join, created like this:

create table t_i' || i || ' (id not null, n, filler)
pctfree 99 pctused 1
tablespace &tbs
as
select cast(rownum as integer) as id,
cast(rownum as number) as n,
cast(rpad('x', 200) as varchar2(200)) as filler
from dual

connect by level <= &tab_size / 10;

create index t_i' || i || '_idx on t_i' || i || ' (id, filler) pctfree 99 tablespace &tbs;


Due to the way the data was crafted and thanks to the Nested Loop join batching performed by Oracle this ensured that the sessions were performing "db file parallel read" I/O as much and hard as possible, so were submitting multiple I/O requests using a single call and taking advantage of asynchronous I/O where possible.

When running with 8 threads on those 8 CPUs on the DBaaS service this resulted in approx. 30.000 IOPS for those 16K blocks (approx. 480MB / sec throughput). Note that I got these 30.000 IOPS for 16K blocks also on a different instance with just 4 CPUs (2 OCPUs) and 4 threads, so it looks like this is a common 16K blocks IOPS limit for a DBaaS instance independently from the CPU count without further tweaking (I didn't attempt to optimize this in any way but used the service as provided / configured by the Oracle Cloud). It looks like this is more a throughput limitation at 480MB / sec than an IOPS limitation, because when using a 4K blocksize I was able to get 60.000 IOPS (middle value), but the IOPS rate was much more fluctuating (between 20.000 and 90.000) whereas the 30.000 IOPS using 16K blocksize was pretty much stable.

On the physical host I got approx. half of these IOPS (roughly 15.500) for those 16K blocks (approx. 250MB / sec throughput), which is very likely the throughput limit for the rather outdated hardware configuration, albeit using a rather modern Samsung SSD drive, but also meant that running at 4 threads I got a comparable number of IOPS per thread as in the DBaaS setup. Using a 4K blocksize the physical host maxed out at (very stable) 52.000 IOPS.

The overall results look like this:

DBaaS:



Physical host:




Again the graph is supposed to show how many of the runs deviated how much from the overall median runtime. Similar to the previous, logical I/O only test this time the DBaaS service shows a significantly different profile, with an even larger spread of deviation up to almost 9 percent from the median runtime, whereas the physical host only shows significant deviation up to 2.5 percent.

Again, like in the previous test, the physical host shows more extreme outliers than the DBaaS service.

The same graph on a per day basis this time shows significant differences between the days for the DBaaS service:
The physical host shows a very common pattern, except for the first day:
Looking at the individual performance of each thread the DBaaS shows a significant variation in performance per day:
The physical host shows a pretty consistent performance pattern, but interestingly the different threads show different, but consistent runtimes:
The next test round will include physical writes.

Adaptive Cursor Sharing Fail

Sun, 2016-08-21 10:34
Here is another example (besides the fact that Adaptive Cursor Sharing only gets evaluated during a PARSE call (still valid in 12c) and supports a maximum of 14 bind variables) I've recently come across at a client site where the default implementation of Adaptive Cursor Sharing fails to create a more suitable execution plan for different bind variable values.

Broken down to a bare minimum the query was sometimes executed using non-existing values for a particular bind variable, but other times these values were existing and very popular. There were two suitable candidate indexes and one of them appeared to the optimizer more attractive in case of the "non-existing" value case. Unfortunately this index was a very bad choice for the "existing and very popular" case.

The critical point of the case is that due to other, additional filters on the same table/index the final number of rows produced by the row source operation was (almost) the same for the "non-existing" and the "existing and very popular" case, but in the latter case a large proportion of the bad index had to be scanned due to the popularity of the bind value, which was the only predicate that could be used as ACCESS predicate and the additional ones could only be applied as FILTER on the index.

So although there was a suitable histogram available on the column in question and the optimizer in principle was aware of the different popularities of values and marked the cursor as "bind sensitive" it looks like when a cursor is then monitored it just is assigned to different buckets as exposed via V$SQL_CS_HISTOGRAM, based on the rows produced by the row source. Since the operation in question didn't differ much in the final number of rows produced between the two cases (either 0 rows or 1 row) the different executions were all assigned to the same bucket - although the amount of work required to produce this single row was huge, reflected in execution time and physical / logical I/Os performed per execution.

Obviously this vastly different amount of work performed to produce a similar number of rows is not used as criteria for marking a cursor as "bind aware" and evaluate different execution plans based on ranges of bind values.

There is a view V$SQL_CS_STATISTICS that includes information about CPU_TIME and BUFFER_GETS but in 11.2.0.4 it only reflects the initial execution when starting with the "non-existing" value case, but not the slow executions afterwards. In 12.1.0.2 the view is no longer populated at all, which suggests that this information is simply not used for deciding the "bind aware" status of a cursor.

Discussing the case with Mohamed Houri, who has spent significant time on investigating the "Adaptive Cursor Sharing" feature seemed to confirm this assumption.

Here is a simple test case that allows reproducing the issue:

-- FILTER1 is highly skewed (here one very popular value -1)
create table t
as
select
rownum as id
, rownum as n
, case when rownum <= 100000 then rownum else -1 end as filter1
, rownum as filter2
, rpad('x', 200) as filler1
, rownum as filter3
, rpad('x', 200) as filler2
from
dual
connect by level <= 1e6;

-- Histogram on FILTER1 should make the sample query "bind sensitive"
exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1 for columns size 254 filter1')

-- The "good" index, ACCESS on all three columns
-- But higher BLEVEL (3) due to FILLER1 only one value and wide - needs to be part of every branch block
create index t_idx1 on t (filter1, filler1, filter2) compress 2 /*advanced low*/;

-- The "bad" index, ACCESS only on first column, FILTER on subsequent columns
-- But lower BLEVEL (2) due to FILTER3 before FILLER1, leads to much more compact branch blocks
create index t_idx2 on t (filter1, filter3, filler1, filler2, filter2) compress 1 /*advanced low*/;
The critical point is that the good index has a higher BLEVEL than the bad index. So in case a non-existing value for FILTER1 gets used the optimizer will favor the index with the lower BLEVEL, which is for that case the key cost component:

set echo on

-- Non-existing value
-- T_IDX2 gets preferred
explain plan for
select sum(n) from t where filter1 = -999999 and filler1 = rpad('x', 200) and filter2 = 999999;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 216 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 216 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 216 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_IDX2 | 1 | | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("FILTER1"=(-999999) AND "FILLER1"='x

' AND
"FILTER2"=999999)
filter("FILTER2"=999999 AND "FILLER1"='x

')

-- Non-existing value
-- T_IDX1 has a higher cost
explain plan for
select /*+ index(t t_idx1) */ sum(n) from t where filter1 = -999999 and filler1 = rpad('x', 200) and filter2 = 999999;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 216 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 216 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 216 | 5 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_IDX1 | 1 | | 4 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("FILTER1"=(-999999) AND "FILLER1"='x

' AND
"FILTER2"=999999)

-- Existing and very popular value
-- T_IDX1 gets preferred
explain plan for
select sum(n) from t where filter1 = -1 and filler1 = rpad('x', 200) and filter2 = 999999;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 216 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 216 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 216 | 5 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_IDX1 | 1 | | 4 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("FILTER1"=(-1) AND "FILLER1"='x

' AND
"FILTER2"=999999)


-- Existing and very popular value
-- T_IDX2 is a bad idea, correctly reflected in the cost estimate
explain plan for
select /*+ index(t t_idx2) */ sum(n) from t where filter1 = -1 and filler1 = rpad('x', 200) and filter2 = 999999;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 216 | 58269 (1)| 00:00:03 |
| 1 | SORT AGGREGATE | | 1 | 216 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 216 | 58269 (1)| 00:00:03 |
|* 3 | INDEX RANGE SCAN | T_IDX2 | 1 | | 58268 (1)| 00:00:03 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("FILTER1"=(-1) AND "FILLER1"='x

' AND
"FILTER2"=999999)
filter("FILTER2"=999999 AND "FILLER1"='x

')
So when optimizing for the two different cases the optimizer does the right thing and understands that for the case "existing and very popular" T_IDX2 is a bad choice. It's also obvious from the "Predicate Information" section that the index T_IDX2 only can use FILTER1 as ACCESS predicate.

But when using bind variables the different cases are not recognized and the bad index is used for both cases when the optimization is based on the "non-existing value" case:

-- Default behaviour, although being BIND_SENSITIVE thanks to the histogram, no ACS kicks in
-- The V$SQL_CS_HISTOGRAM shows all executions being in bucket 0, so according to BIND_SENSITIVE monitoring no need for action
set echo on timing on time on

alter session set statistics_level = all;

variable n number

exec :n := -999999

declare
res number;
begin
select sum(n) into res from t where filter1 = :n and filler1 = rpad('x', 200) and filter2 = 999999;
end;
/

exec :n := -1

declare
res number;
begin
for i in 1..1000 loop
select sum(n) into res from t where filter1 = :n and filler1 = rpad('x', 200) and filter2 = 999999;
end loop;
end;
/

SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.03
SQL>
SQL> variable n number
SQL>
SQL> exec :n := -999999

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> declare
2 res number;
3 begin
4 select sum(n) into res from t where filter1 = :n and filler1 = rpad('x', 200) and filter2 = 999999;
5 end;
6 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.28
SQL>
SQL> exec :n := -1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> declare
2 res number;
3 begin
4 for i in 1..1000 loop
5 select sum(n) into res from t where filter1 = :n and filler1 = rpad('x', 200) and filter2 = 999999;
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.

Elapsed: 00:04:48.64
SQL>

-- With BIND_AWARE a second child cursors gets generated and used on second parse, interesting
set echo on timing on time on

alter session set statistics_level = all;

variable n number

exec :n := -999999

declare
res number;
begin
select /*+ bind_aware */ sum(n) into res from t where filter1 = :n and filler1 = rpad('x', 200) and filter2 = 999999;
end;
/

exec :n := -1

declare
res number;
begin
for i in 1..1000 loop
select /*+ bind_aware */ sum(n) into res from t where filter1 = :n and filler1 = rpad('x', 200) and filter2 = 999999;
end loop;
end;
/

SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.03
SQL>
SQL> variable n number
SQL>
SQL> exec :n := -999999

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> declare
2 res number;
3 begin
4 select /*+ bind_aware */ sum(n) into res from t where filter1 = :n and filler1 = rpad('x', 200) and filter2 = 999999;
5 end;
6 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> exec :n := -1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> declare
2 res number;
3 begin
4 for i in 1..1000 loop
5 select /*+ bind_aware */ sum(n) into res from t where filter1 = :n and filler1 = rpad('x', 200) and filter2 = 999999;
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.20
SQL>

-- Or hinting for a fixed plan helps, too
set echo on timing on time on

alter session set statistics_level = all;

variable n number

exec :n := -999999

declare
res number;
begin
select /*+ index(t t_idx1) */ sum(n) into res from t where filter1 = :n and filler1 = rpad('x', 200) and filter2 = 999999;
end;
/

exec :n := -1

declare
res number;
begin
for i in 1..1000 loop
select /*+ index(t t_idx1) */ sum(n) into res from t where filter1 = :n and filler1 = rpad('x', 200) and filter2 = 999999;
end loop;
end;
/
Executing the unhinted, default case leads to a very inefficient execution of the thousand executions using the popular value (and the actual table / index at the client site was much larger than this example and the query got executed very frequently) - compare that to the execution time for the other two variants, either using the BIND_AWARE hint or simply requesting the good index.

So when requesting to mark the cursor immediately BIND_AWARE via the corresponding hint (only possible if the cursor qualifies to be bind sensitive) Oracle happily generates a second child cursor at the second PARSE call with a different, more efficient execution plan for the popular value, because when a cursor is marked bind aware a completely different algorithm gets used that is based on range of values as exposed via V$SQL_CS_SELECTIVITY, and since the different values are falling into different ranges of values a new plan gets generated that is different from the previous one and hence gets used from then on for those (range of) values.

Nested Loop Join Physical I/O Optimizations

Mon, 2016-08-15 07:44
Having done my mini-series on Nested Loop join logical I/O optimizations a while ago I unfortunately never managed to publish anything regarding the Nested Loop join physical I/O optimizations, which are certainly much more relevant to real-life performance.

Therefore the main purpose of this blog post is to point you to Nikolay Savvinov's (whose blog I can recommend in general) great mini-series covering various aspects of these optimizations:

Part 1
Part 2
Part 3
Summary

One point that - at least to me - isn't entirely clear when reading Nikolay's series is which specific plan shape he refers to, in particular since in 12c even more plan shapes for a Nested Loop join are possible.

Hence here is an attempt to summarize the various two table Nested Loop join plan shapes as of 12c and what kind of physical I/O optimizations one can expect from them:

1. Nested Loop Join Batching, the default in most cases since 11g and also in 12c

-----------------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | NESTED LOOPS | |
| 3 | NESTED LOOPS | |
| 4 | INDEX FULL SCAN | SYS_IOT_TOP_97632 |
|* 5 | INDEX RANGE SCAN | T_I6_IDX |
| 6 | TABLE ACCESS BY INDEX ROWID| T_I6 |
-----------------------------------------------------------
This is the plan shape that Nikolay calls "batch" or "batching" in his posts. It can provide batched I/O (mainly "db file parallel read" issuing multiple I/O requests in a single call, potentially asynchronous I/O) on both the INDEX RANGE SCAN as well as the TABLE ACCESS BY ROWID operation as described in Nikolay's blog post.

Please note that I deliberately write "can provide batched I/O", because, as Nikolay also points out in his posts, the runtime engine monitors the execution and can dynamically adjust the behaviour, which also means that it might decide to use conventional single block reads ("db file sequential read").

Please also note that in addition to the "db file parallel read" operation that submits multiple I/O requests in a single I/O submit call (see Frits Hoogland's blog post from some time ago about Linux internals of this I/O operation) the runtime engine might use "db file scattered read" multi-block I/Os under certain circumstances, in particular when performing "cache warmup prefetching".

Also note that as Nikolay points out when enabling SQL Trace or "rowsource statistics" the "Batched I/O" optimization for some reason gets disabled.

Also this plan shape at least in 12c seems to lead to inconsistencies in the Real-Time SQL Monitoring in several ways. First the number of "Executions" for the INDEX RANGE SCAN and TABLE ACCESS component of the inner row source is not necessarily consistent with the number of rows in the driving row source, second almost all activity and I/O volume seems to be contributed to the "INDEX RANGE SCAN" plan operation and not the "TABLE ACCESS" operation, even if it's the "TABLE ACCESS" that causes physical I/O.

2. Nested Loop Join Prefetch plan shape including BATCHED ROWID table access (only from 12c on)

------------------------------------------------------------------
| Id | Operation | Name |
------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T_I5 |
| 3 | NESTED LOOPS | |
| 4 | INDEX FULL SCAN | SYS_IOT_TOP_97716 |
|* 5 | INDEX RANGE SCAN | T_I5_IDX |
------------------------------------------------------------------
This is the Nested Loop prefetching plan shape introduced in Oracle 9i combined with the new TABLE ACCESS BY INDEX ROWID BATCHED introduced in 12c. It's the plan shape Nikolay refers to with "BATCHED ROWID" in his posts. In my (and Nikolay's) tests this provides the most aggressive batching of I/O (highest number of I/O requests submitted per call in "db file parallel read") for the TABLE ACCESS BY ROWID BATCHED, but didn't perform the same on the index ("db file sequential read", single block reads on the index segment), which doesn't mean that different test and data setups might provide that, too.
Note you should get this plan shape only with explicit hinting or disabling Nested Loop Join Batching via parameter.

In 12c, given a two table join between alias A and B, the following hints would be required to arrive at this plan shape:

leading(a b)
use_nl(b)
index(b)
no_nlj_batching(b)
--batch_table_access_by_rowid(b)
The "batch_table_access_by_rowid" isn't strictly necessary since it's the default behaviour in 12c.

Other variants are possible, like "use_nl_with_index" instead of "use_nl" and "index" separately or "opt_param('_nlj_batching_enabled', 0)" instead of "no_nlj_batching" to disable the batching (but then batching is disabled for the whole execution plan, not just for a particular join).

3. Classic Nested Loop Join Prefetch introduced in 9i

----------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | TABLE ACCESS BY INDEX ROWID| T_I5 |
| 3 | NESTED LOOPS | |
| 4 | INDEX FULL SCAN | SYS_IOT_TOP_97716 |
|* 5 | INDEX RANGE SCAN | T_I5_IDX |
----------------------------------------------------------
This is what you get in pre-12c when preventing Nested Loop Join Batching, in 12c the BATCHED table access needs to be disabled in addition.

This plan shape provides the less aggressive table prefetching as described in Nikolay's posts (he refers to this plan shape as "Prefetch"), maximum number of requests submitted per call in a "db file parallel read" operation seems to 39.

As mentioned above, it didn't provide index prefetching in my tests.

The session statistics don't mention "Batched I/O", so although the "db file parallel read" wait event is the same the internal implementation obviously is a different code path.

In 12c, given a two table join between alias A and B, the following hints would be required to arrive at this plan shape:

leading(a b)
use_nl(b)
index(b)
no_nlj_batching(b)
no_batch_table_access_by_rowid(b)
Other variants as above.

4. Classic Nested Loop Join plan shape with BATCHED ROWID table access (only from 12c on)

-------------------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | NESTED LOOPS | |
| 3 | INDEX FULL SCAN | SYS_IOT_TOP_97716 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T_I5 |
|* 5 | INDEX RANGE SCAN | T_I5_IDX |
-------------------------------------------------------------------
This is the classic Nested Loop join plan shape (pre-9i) combined with the new TABLE ACCESS BY INDEX ROWID BATCHED introduced in 12c.

In my tests it provides only "intra-loop" table prefetching, so if the INDEX RANGE SCAN of a single loop iteration points to different table blocks the TABLE ACCESS BY INDEX ROWID BATCHED can make use of "db file parallel read" to batch these I/O requests to the table blocks, and the session statistics show "Batched I/O" counters increasing.

However, it doesn't attempt to optimize "inter-loop" / multiple loop iterations, so if each loop iteration via the INDEX RANGE SCAN only points to a single table block, only single block reads ("db file sequential read") on the TABLE ACCESS BY INDEX ROWID BATCHED can be seen.

In my tests this plan shape didn't provide index prefetching / I/O optimizations and performed single block reads ("db file sequential read") on the INDEX RANGE SCAN operation.

For a two table join it needs explicit hinting in 12c to arrive at this plan shape, but it is the default plan shape for the "inner" joins in case of nested Nested Loop joins (multiple, consecutive Nested Loop joins in a row), see below for more information.

In 12c, given a two table join between alias A and B, the following hints would be required to arrive at this plan shape:

leading(a b)
use_nl(b)
index(b)
opt_param('_nlj_batching_enabled', 0)
no_nlj_prefetch(b)
--batch_table_access_by_rowid(b)
See my older posts regarding Nested Loop join logical I/O optimizations why the combination of OPT_PARAM nd NO_NLJ_PREFETCH is required to arrive at this plan shape - in short, specifying the obvious NO_NLJ_PREFETCH plus NO_NLJ_BATCHING doesn't work. Other variants as above.

5. Classic Nested Loop Join plan shape

-----------------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | NESTED LOOPS | |
| 3 | INDEX FULL SCAN | SYS_IOT_TOP_97716 |
| 4 | TABLE ACCESS BY INDEX ROWID| T_I5 |
|* 5 | INDEX RANGE SCAN | T_I5_IDX |
-----------------------------------------------------------
This is the classic Nested Loop join plan shape (pre-9i).

Interestingly in my tests at least in 12c it provides also "intra-loop" table prefetching, so if the INDEX RANGE SCAN of a single loop iteration points to different table blocks the TABLE ACCESS BY INDEX ROWID can make use of "db file parallel read" to submit multiple of these I/O requests to the table blocks in a single call, but the session statistics don't show "Batched I/O" counters increasing and the maximum number of requests seem to be 39, so it is less aggressive and looks very similar to the internal implementation used for the classic "table prefetching" plan shape above.

There is no sign of "inter-loop" optimizations and the INDEX RANGE SCAN seems to make use of single block reads only ("db file sequential read").

For pre-12c this is default plan shape used for the "inner" joins in case of nested Nested Loop joins (multiple Nested Loop joins in a row), see below for more information.

In 12c, given a two table join between alias A and B, the following hints would be required to arrive at this plan shape:

leading(a b)
use_nl(b)
index(b)
opt_param('_nlj_batching_enabled', 0)
no_nlj_prefetch(b)
no_batch_table_access_by_rowid(b)
Other variants as above.

Multiple, consecutive Nested Loop Joins
Another point that I think it is important to mention when describing these "inter-loop" prefetching and batching Nested Loop join optimization techniques is that they only apply to the "outer-most" Nested Loop join in case of multiple, consecutive Nested Loop joins, which makes their "game changing" character that Nikolay mentions in his posts less strong to me.

For example, this is the 12c default shape of a four table join using three Nested Loop joins:

----------------------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | NESTED LOOPS | |
| 3 | NESTED LOOPS | |
| 4 | NESTED LOOPS | |
| 5 | NESTED LOOPS | |
| 6 | INDEX FULL SCAN | SYS_IOT_TOP_97632 |
| 7 | TABLE ACCESS BY INDEX ROWID BATCHED| T_I1 |
|* 8 | INDEX RANGE SCAN | T_I1_IDX |
| 9 | TABLE ACCESS BY INDEX ROWID BATCHED | T_I2 |
|* 10 | INDEX RANGE SCAN | T_I2_IDX |
|* 11 | INDEX RANGE SCAN | T_I3_IDX |
| 12 | TABLE ACCESS BY INDEX ROWID | T_I3 |
----------------------------------------------------------------------
As it can be seen only the outer-most Nested Loop joins gets the "batching" plan shape and can benefit from the optimizations described above. The inner Nested Loop joins show the classic plan shape, in case of 12c with the ROWID BATCHED option, so they can only benefit from the "intra-loop" optimizations described above, if a single loop iteration INDEX RANGE SCAN points to several table blocks.

Note that even when using explicit, additional NLJ_BATCHING hints for the inner tables joined I wasn't able to enforce any other plan shape.

If there are multiple blocks of (consecutive) Nested Loop joins (for example a HASH JOIN in between), then in each block the outer-most Nested Loop join gets the optimization, so multiple of those are possible per execution plan:

---------------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | NESTED LOOPS | |
| 3 | NESTED LOOPS | |
|* 4 | HASH JOIN | |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| T_I2 |
| 6 | INDEX FULL SCAN | T_I2_IDX |
| 7 | NESTED LOOPS | |
| 8 | NESTED LOOPS | |
| 9 | INDEX FULL SCAN | SYS_IOT_TOP_97632 |
|* 10 | INDEX RANGE SCAN | T_I1_IDX |
| 11 | TABLE ACCESS BY INDEX ROWID | T_I1 |
|* 12 | INDEX RANGE SCAN | T_I3_IDX |
| 13 | TABLE ACCESS BY INDEX ROWID | T_I3 |
---------------------------------------------------------------------
As it can be seen now that T_I2 is joined via a HASH JOIN both Nested Loop joins to T_I1 and T_I3 get the "batched" optimization plan shape.

I don't know why this limitation is there that only the outer-most in a block of Nested Loop joins gets the optimized plan shape, but it certainly can reduce the possible benefit.

Since I don't think there is a costing difference on CBO level for the different Nested Loop join plan shapes (so the decision which shape to use isn't cost-driven) it might be possible to see a significant performance difference depending on which join gets the optimization - there might be room for improvement by manually influencing the join order in case of multiple, consecutive Nested Loop joins.

Oracle Database Cloud (DBaaS) Performance Consistency - Part 2

Sun, 2016-08-07 06:30
This is the second part of this installment, comparing the performance consistency of the DBaaS cloud offering with a dedicated physical host. This time instead of burning CPU using a trivial PL/SQL loop (see part 1) the test harness executes a SQL statement that performs logical I/O only, so no physical I/O involved.

In order to achieve that a variation of Jonathan Lewis' good old "kill_cpu" script got executed. In principle each thread performed the following:

define tabname = &1

define thread_id = &1;

alter session set "_old_connect_by_enabled" = true;

declare
  n number;
begin
  loop
    select
   count(*) X
    into    n
    from
   kill_cpu&tabname
    connect by
   n > prior n
    start with
   n = 1;
    insert into timings(testtype, thread_id, ts) values ('SQLLIO', &thread_id, systimestamp);
    commit;
  end loop;
end;
/

Each thread got its own, exclusive "kill_cpu<n>" table, so this setup attempted to avoid competition for buffer cache latches. The tables (in fact IOTs) were created like the following:

create table kill_cpu<n>(n primary key)
organization index
as
select
rownum n
from
all_objects
where
rownum <= 25
;

Again there were as many threads started as CPUs available - which meant 8 threads for the DBaaS environment and 4 threads for the physical host and again this was left running for several days.
The overall results look like this:
DBaaS:
Physical host:
Again the graph is supposed to show how many of the runs deviated how much from the overall median runtime. In contrast to the previous, simple PL/SQL test this time the DBaaS service shows a significantly different profile with a larger spread of deviation up to 3 percent from the median runtime, whereas the physical host only shows significant deviation up to 1.5 percent.
It's interesting to note however, that the physical host this time shows more extreme outliers than the DBaaS service, which was the other way around in the previous PL/SQL test.

The same graph on a per day basis doesn't show too significant differences between the days for either environment (here DBaaS):
Physical host:
Looking at the individual performance of each thread again the DBaaS shows a similar behaviour than last time - the different threads show a slightly different performance, and they also get slightly slower towards the end of the measurement:
And again the physical host shows a more consistent performance between threads:
The next test round will be a physical I/O bound setup.

DBMS_STATS - Gather statistics on tables with many columns - 12c update

Wed, 2016-08-03 12:33
This is just a short 12c update on my post about gathering statistics on tables with many columns from some time ago.

I'm currently investigating the "Incremental Statistics" feature in 12.1.0.2 for a client, which probably will be worth one or more other posts, but since we're still in the process of evaluating and installing various patches it's too early to write about that.

As part of the investigation I've noticed a significant change in behaviour in 12.1.0.2 compared to previous versions when it comes to gathering statistics on tables with many columns, hence this post here.

The key message of the original post was that DBMS_STATS needs potentially several passes when gathering statistics on tables with many columns, which meant a significant increase in overall work and resource consumption, exaggerated by the fact that tables with that many columns consist of multiple row pieces.

Along with other significant improvements in 12c (among others new types of histograms and the capability to derive some types of these histograms from a single pass with no need to run separate passes for each histogram) obviously now no longer multiple passes are required for such tables - Oracle can obviously now cope with up to and including 1000 columns in a single pass.

Repeating the test case from the original post for a table with 1.000 columns / 10.000 blocks and using a default METHOD_OPT setting of "FOR ALL COLUMNS SIZE AUTO" results in the following single query executed by the DBMS_STATS call:

SQL ID: 98828wcrfyn0c Plan Hash: 1962185829

select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
xmlindex_sel_idx_tbl no_substrb_pad */to_char(count("COL968")),
substrb(dump(min("COL968"),16,0,64),1,240),substrb(dump(max("COL968"),16,0,
64),1,240),to_char(count("COL969")),substrb(dump(min("COL969"),16,0,64),1,
240),substrb(dump(max("COL969"),16,0,64),1,240),to_char(count("COL970")),
substrb(dump(min("COL970"),16,0,64),1,240),substrb(dump(max("COL970"),16,0,
64),1,240),to_char(count("COL971")),substrb(dump(min("COL971"),16,0,64),1,
240),substrb(dump(max("COL971"),16,0,64),1,240),to_char(count("COL972")),
.
.
.
from
"CBO_TEST"."MANY_X_COLS" t /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,
NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,
NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,
NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,
NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,
.
.
.
NDV,NIL,NIL,NDV,NIL,NIL*/


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 1.95 1.97 0 40011 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 1.95 1.98 0 40011 0 1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 111 (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 SORT AGGREGATE (cr=40011 pr=0 pw=0 time=1969578 us)
9995 9995 9995 OPTIMIZER STATISTICS GATHERING (cr=40011 pr=0 pw=0 time=751494 us cost=2717 size=240000 card=10000)
10000 10000 10000 TABLE ACCESS FULL MANY_X_COLS (cr=40011 pr=0 pw=0 time=413062 us cost=2717 size=240000 card=10000)
As it can be seen the "APPROXIMATE NDV AGGREGATE" operation introduced in 11g for the new statistics gathering mode has been renamed in 12c to "OPTIMIZER STATISTICS GATHERING".

Apart from that this is good news as it minimizes the work required to gather statistics for such tables - it takes the minimum of logical / physical I/Os to process the data. And as already mentioned 12c is also capable of more, like generating synopses for incremental statistics and histograms from such a single pass.

Video Tutorial: XPLAN_ASH Active Session History - Part 9

Sun, 2016-07-24 09:21
The next part of the video tutorial explaining the XPLAN_ASH Active Session History functionality continuing the actual walk-through of the script output.

More parts to follow.

Video Tutorial: XPLAN_ASH Active Session History - Part 8

Sun, 2016-07-17 09:54
The next part of the video tutorial explaining the XPLAN_ASH Active Session History functionality continuing the actual walk-through of the script output.

More parts to follow.


Comparing Columns Containing NULL Values

Sun, 2016-07-10 06:32
Prompted by a (not really that) recent discussion on the OTN forum I've decided to publish this note.

Sometimes you have the task of comparing column values and handling the NULL value cases correctly makes this rather cumbersome for columns that are allowed to be NULL.

The "official" SQL way of comparing two column values and to find out whether they are equal or not - under the assumption that having NULL in both columns should be treated as equal (a point that can be argued) would read in SQL something like the following for the "unequal" case:

column1 != column2 or (column1 is null and column2 is not null) or (column1 is not null and column2 is null)
and

column1 = column2 or (column1 is null and column2 is null)
for the "equal" case.

Imagine this to be done for a lot of columns and it becomes obvious that people might be looking for more concise alternatives how to express this.

Sometimes you might have come across code that uses NVL or a similar function with some "special" value that is not supposed to be part of the column data to be compared - like -1 for (positive) integer values for example, but this has the inherent risk of producing wrong results if the "special" value is (or might become in future) not that special after all.

The not really documented SYS_OP_MAP_NONNULL function could come handy if it was official, because it covers the NULL cases automatically.

Note: I still would call it undocumented even if it shows up in some "Materialized View" related part of the official 12c documentation, because it is not part of the official SQL language manual

SYS_OP_MAP_NONNULL(column1) != SYS_OP_MAP_NONNULL(column2)
is all that is needed. More concise as above, but undocumented.

And: There is one obvious drawback to SYS_OP_MAP_NONNULL, since it adds one byte to the input value to ensure the mapped NULL value differs from any other value: It doesn't work properly with maxed out columns, for example this:

SQL> create table t
2 as
3 select rpad('x', 4000) as col1, rpad('x', 4000) as col2 from dual
4 union all
5 select rpad('x', 4000) as col1, rpad('y', 4000) as col2 from dual
6 union all
7 select null as col1, rpad('y', 4000) as col2 from dual
8 union all
9 select rpad('x', 4000) as col1, null from dual
10 union all
11 select null as col1, null as col2 from dual
12 ;

Table created.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't')

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from t where sys_op_map_nonnull(col1) = sys_op_map_nonnull(col2);
select count(*) from t where sys_op_map_nonnull(col1) = sys_op_map_nonnull(col2)
*
ERROR at line 1:
ORA-01706: user function result value was too large
errors out with "ORA-01706: user function result value was too large".

DECODE
Decode is an exception to the rule how Oracle treats NULLs in SQL - it returns the "equal" value if both operands to compare are NULL, so it allows the following construct to achieve above comparison (see above OTN forum discussion for credits where this expression appeared first, more recently here on Stew Ashton's blog):

For checking for inequality:

decode(column1, column2, 0, 1) = 1
For equality:

decode(column1, column2, 0, 1) = 0
Very concise, and officially documented.

Interestingly, starting with 11.2.0.2 there is an implicit re-write by default enabled that rewrites above expression (and only this variation, not the one testing for inequality):

decode(column1, column2, 0, 1) = 0
into

sys_op_map_nonnull(column1) = sys_op_map_nonnull(column2)
This is controlled via fix control 8551880 ("Rewrite decode predicate to join").

Guess what happens when running the following query in 11.2.0.2 or later against above data set (the fix control is enabled by default in these versions):

SQL> select /*+ opt_param('_fix_control', '8551880:0') */ count(*) from t where decode(col1, col2, 0, 1) = 0;

COUNT(*)
----------
2

SQL>
SQL> select /*+ opt_param('_fix_control', '8551880:1') */ count(*) from t where decode(col1, col2, 0, 1) = 0;
select /*+ opt_param('_fix_control', '8551880:1') */ count(*) from t where decode(col1, col2, 0, 1) = 0
*
ERROR at line 1:
ORA-01706: user function result value was too large
Finally, what about performance? For that purpose I created the following compressed 50M rows table, covering different cases:

create table t1 compress
as
select case mod(rownum, 5) + 1
when 1 then cast('C1' as varchar2(20))
when 2 then cast(null as varchar2(20))
when 3 then cast('C3' as varchar2(20))
when 4 then cast(null as varchar2(20))
when 5 then cast('C5' as varchar2(20))
end as col1,
case mod(rownum, 5) + 1
when 1 then cast('C2' as varchar2(20))
when 2 then cast(null as varchar2(20))
when 3 then cast('C3' as varchar2(20))
when 4 then cast('C4' as varchar2(20))
when 5 then cast(null as varchar2(20))
end as col2
from
(select /*+ cardinality(1e4) */ null from dual connect by level <= 1e4)
, (select /*+ cardinality(5e3) */ null from dual connect by level <= 5e3)
;
and got the following results from 11.2.0.4 (ignoring the different cardinality estimates for the moment, since I'm only interested in the elapsed time here):

SQL> select count(*) from t1;

Elapsed: 00:00:00.94

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18743 (2)| 00:03:45 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| T1 | 50M| 18743 (2)| 00:03:45 |
-------------------------------------------------------------------


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> select count(*) from t1 where decode(col1, col2, 0, 1) = 1;

Elapsed: 00:00:02.55

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 19176 (4)| 00:03:51 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 500K| 1953K| 19176 (4)| 00:03:51 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(DECODE("COL1","COL2",0,1)=1)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> select count(*) from t1 where SYS_OP_MAP_NONNULL("COL1")!=SYS_OP_MAP_NONNULL("COL2");

Elapsed: 00:00:03.04

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 19515 (6)| 00:03:55 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 15M| 60M| 19515 (6)| 00:03:55 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(SYS_OP_MAP_NONNULL("COL1")<>SYS_OP_MAP_NONNULL("COL2"))


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> select count(*) from t1 where (col1 != col2 or (col1 is null and col2 is not null) or (col1 is not null and col2 is null));

Elapsed: 00:00:03.51

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 18876 (2)| 00:03:47 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 30M| 115M| 18876 (2)| 00:03:47 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("COL1" IS NULL AND "COL2" IS NOT NULL OR "COL2" IS NULL
AND "COL1" IS NOT NULL OR "COL1"<>"COL2")



Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> select /*+ opt_param('_fix_control', '8551880:0') */ count(*) from t1 where decode(col1, col2, 0, 1) = 0;

Elapsed: 00:00:02.49

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 19176 (4)| 00:03:51 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 500K| 1953K| 19176 (4)| 00:03:51 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(DECODE("COL1","COL2",0,1)=0)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> select /*+ opt_param('_fix_control', '8551880:1') */ count(*) from t1 where decode(col1, col2, 0, 1) = 0;

Elapsed: 00:00:03.04

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 19515 (6)| 00:03:55 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 6000K| 22M| 19515 (6)| 00:03:55 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(SYS_OP_MAP_NONNULL("COL1")=SYS_OP_MAP_NONNULL("COL2"))


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> select count(*) from t1 where col1 = col2 or (col1 is null and col2 is null);

Elapsed: 00:00:02.30

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 18865 (2)| 00:03:47 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 13M| 49M| 18865 (2)| 00:03:47 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("COL1" IS NULL AND "COL2" IS NULL OR "COL1"="COL2")


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Surprisingly, in my tests SYS_OP_MAP_NONNULL always performed worst, and the DECODE expression pretty close to the more verbose SQL expression - depending on the data set sometimes even faster.

So using SYS_OP_MAP_NONNULL, and in particular the implicit rewrite might not be such a good idea after all. And now you know how the automatic rewrite could be disabled if desired.

New Version Of XPLAN_ASH Utility

Tue, 2016-06-28 18:23
A new version 4.23 of the XPLAN_ASH utility is available for download.

As usual the latest version can be downloaded here.

This version comes only with minor changes, see the change log below.

Here are the notes from the change log:

- Finally corrected the very old and wrong description of "wait times" in the script comments, where it was talking about "in-flight" wait events but that is not correct. ASH performs a "fix-up" of the last 255 samples or so and updates them with the time waited, so these wait events are not "in-flight"

- Removed some of the clean up code added in 4.22 to the beginning of the script, because it doesn't really help much but spooled script output always contained these error messages about non-existent column definitions being cleared

- The "Concurrent I/O" sections will now also be skipped in LIMITED_ASH mode

- Some more fixes to the I/O figures in the "Activity Timeline based on ASH" - the spreading introduced in 4.22 needed some further refinement (see 4.22 change log for more details)

Pages