Skip navigation.

Jonathan Lewis

Syndicate content Oracle Scratchpad
Just another Oracle weblog
Updated: 26 min 1 sec ago

Quiz Night

Wed, 2014-05-07 07:18

Okay – so it’s not night time in my home time-zone, but I’m in Singapore at the moment so it’s night time.

A very simple little quiz – so I’ve disabled comments for the moment and will re-enable them tomorrow morning to allow more people to have a chance to see the question without the solution.

Explain the anomaly displayed in the following “cut-n-paste” from a session running SQL*Plus on 11.2.0.4:

SQL> create unique index t1_i1 on t1(v1 desc);
create unique index t1_i1 on t1(v1 desc)
                                *
ERROR at line 1:
ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found

SQL> create unique index t1_i1 on t1(v1);

Index created.

Answer

Well it didn’t take long for an answer and several bits of related infomration to show up – as Martin pointed out, all I have to do is insert NULL into the table twice.

To create an entry in a descending index, Oracle takes the 1′s-complement of each column and appends an 0xFF byte to each column – except in the case of a null column where the null is replaced with a 0×00. (And, as Sayan points out, funny things happen if you have a varchar2() column which has already reached the 4,000 byte limit)

The point of the 1′s-complement is that if you walk through the stored values in ascending order you’re walking through the original values in descending – provided you have the 0xFF on the end of each non-null entry.

 


Extended stats

Sun, 2014-05-04 06:24

Like the recent article on deleting histograms this is another draft that I rediscovered while searching for some notes I had written on a different topic – so I’ve finally finished it off and published it.

Here’s a quirky little detail of extended stats that came up in an OTN thread earlier on this week [ed: actually 8th Jan 2014]. When you create column group stats, Oracle uses an undocumented function sys_op_combined_hash() to create a hash value, and if you gather simple stats on the column (i.e. no histogram) you can get some idea of the range of values that Oracle generates through the hash function. For example:


create table t1 as
select  1 n1, 2 n2
from dual
connect by level<=5000
union all
select  2, 1
from dual
connect by level<=5000
;

select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/

break on column_name skip 1

column column_name format a32
column endpoint_number heading "EP No."
column endpoint_value  heading "Value" format 999,999,999,999,999,999,999

select
	column_name, endpoint_number, endpoint_value
from
	user_tab_histograms
where
	table_name = 'T1'
order by
	column_name, endpoint_value
;

COLUMN_NAME                          EP No.                        Value
-------------------------------- ---------- ----------------------------
N1                                        0                            1
                                          1                            2

N2                                        0                            1
                                          1                            2

SYS_STUBZH0IHA7K$KEBJVXO5LOHAS            0      298,332,787,864,733,000
                                          1    8,095,781,421,167,520,000

I could have selected low_value and high_value from user_tab_cols, using utl_raw.cast_to_number() to display them in numeric format, but the view user_tab_histograms display the low and high as a two-bucket histogram if there is no actual histogram data for the column in the histogram (histgrm$) table.

We probably don’t need to worry about what the low and high values might be because taking hash values destroys any meaning that a range might have (the optimizer can’t use column group stats in range-based predicates, only in equality predicates). However, we might collect a frequency histogram (or Top-N histogram in 12c) on the column group because there might be some data skew in the sets of values that we need to tell the optimizer about – so let’s gather a histogram with 2 buckets on our sample data set and see what we get:


begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 2'
	);
end;
/

select
	column_name, endpoint_number, endpoint_value
from
	user_tab_histograms
where
	table_name = 'T1'
order by
	column_name, endpoint_value
;

COLUMN_NAME                          EP No.                        Value
-------------------------------- ---------- ----------------------------
N1                                     5000                            1
                                      10000                            2

N2                                     5000                            1
                                      10000                            2

SYS_STUBZH0IHA7K$KEBJVXO5LOHAS         5000                1,977,102,303
                                      10000                7,894,566,276

The histogram values have changed! As one of the posters on the OTN thread points out, what Oracle has actually stored in this case is mod(sys_op_combined_hash() ,9999999999).

So far I haven’t done any exhaustive testing to work out whether the change in the strategy for storing numbers makes any difference to the typical optimizer arithmetic – but I do have at least one case (relating to “missing values” behaviour where the presence or absence of a column group histogram does make a difference to the estimated cardinality in a way that seems inconsistent with other patterns of behaviour: I doubt if it’s actually due to the change in what’s stored, and one day I may come across a client where I actually need to work out what’s going on and how best to work with the anomaly.

Bonus thought:

In 12c Oracle collects column stats automatically as it loads data into an empty table; but not if it’s got extended stats defined on it.  (This is bug no 18425876, fixed in 12.2). Here’s some code modelling a client scenario where we truncate and reload a table every day. In the first part of the demonstration I’ve loaded the table twice to show that after truncating and reloading I get new stats on the table – the first load is 10,000 rows, the second is 20,000 rows and the stats reflect this automatically. In the second part of the code, after adding a set of column group stats, truncating and loading 5,000 rows, the stats from the previous cycle are still in place. (The code is only relevant to 12c, of course)

create table t1 (n1 number, n2 number);

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 10000
;
commit;

-- stats have appeared without a call to dbms_stats to gather them.

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

truncate table t1;

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 20000
;
commit;

-- Stats now show the latest data 

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

-- Add a column group to the stats 

select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual;

truncate table t1;

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 5000
;
commit;

-- The stats have not been updated to reflect the new data, and the column group stats are empty

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

--------------------------------------------------------
-- Here are the three consecutive sets of results
--------------------------------------------------------

    BLOCKS   NUM_ROWS
---------- ----------
        16      10000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      10000          0      .0001 C103                       C3020C60
N2                                       2534       7429 .000394633 C103                       C30B2929

    BLOCKS   NUM_ROWS
---------- ----------
        32      20000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      20000          0     .00005 C103                       C30A4553
N2                                       3115      16848 .000321027 C103                       C30B2929

    BLOCKS   NUM_ROWS
---------- ----------
        32      20000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      20000          0     .00005 C103                       C30A4553
N2                                       3115      16848 .000321027 C103                       C30B2929
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS

The workaround given in the bug is “add the extended stats after loading the table” – but if you’re constantly truncating and reloading that means you have to drop and add the extended stats and do a tablescan to gather the column group stats every time you reload.

Note: the limitation applies whether you create a column group, “ordinary” extended stats, a virtual column, or an implicit virtual column underlying a function-based index.

Just as a little aside – when I first wrote the demo script I forgot to put in the commit; after the insert/append – which meant I was trying to create column group stats on a table which should have given me Oracle error: “ORA-12838: cannot read/modify an object after modifying it in parallel”; instead this has been trapped by the dbms_stats package and shows up as a slightly confusing:


select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual
       *
ERROR at line 1:
ORA-20001: Error when processing extension -  resource busy and acquire with NOWAIT specified or timeout expired
ORA-06512: at "SYS.DBMS_STATS", line 12977
ORA-06512: at "SYS.DBMS_STATS", line 44967
ORA-06512: at "SYS.DBMS_STATS", line 44986


Costing Bug

Fri, 2014-05-02 01:53

It’s amazing how you can find little bugs (or anomalies) as soon as you start to look closely at how things work in Oracle. I started to write an article for All Things Oracle last night about execution plans with subqueries, so wrote a little script to generate some sample data, set up the first sample query, checked the execution plan, and stopped because the final cost didn’t make sense. Before going on I should point out that this probably doesn’t matter and probably wouldn’t cause a change in the execution plan if the calculation were corrected – but it is just an interesting indication of the odd things that can happen when sections of modular code are combined in an open-ended way. Here’s the query (running on 11.2.0.4) with execution plan:


update t1 set 
	n1 = (
		select	max(mod100)
		from	t2
		where	t2.id = t1.id
	),
	n2 = (
		select	max(trunc100)
		from	t3
		where	t3.id = t1.id
	)
where
	id between 101 and 200
;

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT              |       |   101 |  1212 |   812  (25)| 00:00:05 |
|   1 |  UPDATE                       | T1    |       |       |            |          |
|*  2 |   INDEX RANGE SCAN            | T1_I1 |   101 |  1212 |     2   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE              |       |     1 |     7 |            |          |
|   4 |    FIRST ROW                  |       |     1 |     7 |     3   (0)| 00:00:01 |
|*  5 |     INDEX RANGE SCAN (MIN/MAX)| T2_I1 |     1 |     7 |     3   (0)| 00:00:01 |
|   6 |   SORT AGGREGATE              |       |     1 |     7 |            |          |
|   7 |    FIRST ROW                  |       |     1 |     7 |     3   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN (MIN/MAX)| T3_I1 |     1 |     7 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=101 AND "ID"<=200)
   5 - access("T2"."ID"=:B1)
   8 - access("T3"."ID"=:B1)

So the cost of running each of the subqueries is 3 – there are two of them, and we expect to run each of the 101 times: for a total cost of 606. So how do we get to 812 as the total cost of the query ?

Further testing:

  • the cost of the plan for updating the two columns with constants is just 4.
  • rebuild the indexes with different values for pctfree to see how the cost changes
  • vary the number of columns updated by subquery
  • check the 10053 trace – for issues or presentation vs. rounding, particularly

Ultimately I decided that for each column updated by subquery the optimizer added 1 to the cost of accessing the table for each row; or, to view it another way, the optimizer used “sum(subquery costs + 1) * number of rows to be updated” so (4 + 4) * 101 + a little bit for the driving table access =  812. This doesn’t seem entirely reasonable – given that a cost is essentially equivalent to assuming that a single block visit is a disk read when we know that when we update multiple columns of the same row we need only read the block into memory at most once. As I said at the start, though this anomaly in costing probably doesn’t matter – there are no further steps to be taken after the update so there’s nothing the optimizer might do differently if the cost of the update had been calculated as 612 rather then 812.

Footnote:

If you want to play about with this query, here’s the code to create the tables – with one proviso, the plan above happens to be one I produced after rebuilding the indexes on t2 and t3 with pctfree 99


create table t1
as
with generator as (
	select  --+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum				id,	
	mod(rownum-1,100)		mod100,
	trunc((rownum - 1)/100)		trunc100,
	rownum				n1,
	rownum				n2,
	lpad(rownum,6,'0')		vc1,
	rpad('x',100)			padding
from
	generator
where
	rownum <= 10000
;

create table t2 as select * from t1;
create table t3 as select * from t1;

create index t1_i1 on t1(id);
create index t2_i1 on t2(id,mod100);
create index t3_i1 on t3(id,trunc100);

begin
	dbms_stats.gather_table_stats(user,'t1');
	dbms_stats.gather_table_stats(user,'t2');
	dbms_stats.gather_table_stats(user,'t3');
end;
/


Delete Histogram

Thu, 2014-05-01 06:27

Here’s a note which I drafted in Novemeber 2010, and then didn’t publish. I found it earlier on this morning while looking for another note I’d written about histograms so, even though it may not be something that people need so much these days, I thought: better late than never.

I’ve pointed out in the past that I’m not keen on seeing lots of histograms on a system and tend to delete them if I think they are not needed. Here’s an example of the type of code I use to delete a histogram.

declare

	srec			dbms_stats.statrec;
	m_distcnt		number;
	m_density		number;
	m_nullcnt		number;
	m_avgclen		number;

	n_array                 dbms_stats.numarray;

begin

	dbms_stats.get_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'n1',
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec		=> srec,
		avgclen		=> m_avgclen
	); 

	srec.bkvals := null;
	srec.novals :=	dbms_stats.numarray(
				utl_raw.cast_to_number(srec.minval),
				utl_raw.cast_to_number(srec.maxval)
			);
	srec.epc := 2;
	dbms_stats.prepare_column_values(srec, srec.novals);

	m_density := 1/m_distcnt;

	dbms_stats.set_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'n1',
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec		=> srec,
		avgclen		=> m_avgclen
	); 

exception
	when others then
		raise;		-- should handle div/0

end;
/

The code basically reads the column stats, resets the histogram figures to just the low and high values for the column, setting the endpoint-count to two, then adjusts the density to the standard for a column with no histogram. This specific example is for a numeric column.

Footnote: my preferred method of collecting statistics is to use method_opt => ‘for all columns size 1′ (i.e. no histograms) and then run scripts to create the histograms I want. This means that after any stats collection I need to run code that checks to see which tables have new stats, and then re-run any histogram code that I’ve written for that table.

To move from Oracle’s default histogram collection to this strategy, you could start by switching to method_opt => ‘for all columns size repeat’ (i.e. recreate existing histograms, don’t create new ones), then simply delete histograms as you find that you don’t need them, and introduce scripts to recreate the histograms that you do need. When you’ve finally got to the point where every histogram is scripted you can then switch to method_opt => ‘for all columns size 1′.

 Footnote 2: Since 2010 when I drafted this note Oracle 12c has launched, and the changes it has introduced for frequency and Top-N histograms means that I’m far less stringent in my demand that if a histogram is worth having it’s better to write code to create it. There’s a series of three articles about 12c histograms in particular at this link.


Bigfiles

Tue, 2014-04-29 07:12

It’s always the combinations that catch you out.

Bigfile tablespaces have their uses – especially in big systems

Materialized views have their users – especially in big systems

There’s absolutely no reason why the two technologies should interfere with each other … until you find a bug !

Running an example, stripped to the bare minimum, and doing a couple of things that I personally don’t like doing, on 11.2.0.4:

drop materialized view t1_mv;
drop table t1;

create table t1 (
	id	number constraint t1_pk primary key,
	n1	number,
	n2	number
)
tablespace bigfile_ts
;

create materialized view log on t1
with
	rowid		-- ugh!
;

create materialized view t1_mv
refresh fast on demand
with rowid		-- ugh!
enable query rewrite
as
select
	id, n1
from
	t1
where
	n2 is not null
;

-- Another ugh coming up !
insert into t1 values(1,null,null);
update t1 set n1 = 2, n2 = 1 where id = 1;
commit;

begin
        dbms_mview.refresh(
                list           => 'T1_MV',
                method         => 'F',
                atomic_refresh  => true
        );
end;
/

Things I don’t like:

  • Doing anything that is strongly dependent on rowid – I’d rather do my materialized view stuff by primary key … but, as Mick Jagger once told us: “You can’t always get what you want”.
  • Processes that insert an “empty” row and then update it – it’s very inefficient, generates excessive undo and redo, and often leads to row migration

Most significantly I don’t like operations that have worked for ages suddenly crashing when someone adds a new piece of code to the system.

If you run the fragment about, the refresh will fail with the follow string of errors:


ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-00001: unique constraint (TEST_USER.I_SNAP$_T1_MV) violated
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2809
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 3025
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2994
ORA-06512: at line 2

When you try to find out what the unique constraint is, it’s based on a hidden column (M_ROW$$) of type rowid that has been added to the materialized view to emulate a primary key; and, for some reason, if your base table is in a bigfile tablespace and you insert a row and then update it before you’ve run a refresh then you can’t do a fast refresh again until you clean up the mess (e.g. with a complete refresh).

Note – if you complete all your inserts, then refresh, you can update a row as many times as you like without causing subsequent refresh problems – it looks like it’s just “insert it, update it, refresh” (or “insert it, delete it, refresh”) that has a problem because the only way to hit the issue is to get a correct rowid (from an update or delete) and an incorrect rowid (from an insert) for the same row in the log at the same time.

Workaround: if you move the base table to a smallfile tablespace you might just find you get what you need.

This shows up on MoS as bug 17603987, fixed in 12.2.

If you want to dig in a little bit more, you can start by comparing the contents of t1 and mlog$_t1 (the materialized view log table). Critically:

SQL> select rowid from t1;

ROWID
------------------
AAAXN2AAAAAAAJEAAA

1 row selected.

SQL> select m_row$$, dmltype$$ from mlog$_t1;

M_ROW$$                          D
-------------------------------- -
AAAXN2AQAAAAAJEAAA               I
AAAXN2AAAAAAAJEAAA               U

2 rows selected.

SQL> select rowidtochar(m_row$$) m_row$$, dmltype$$ from mlog$_t1;

M_ROW$$                          D
-------------------------------- -
AAAXN2AAAAAAAJEAAA               I
AAAXN2AAAAAAAJEAAA               U

2 rows selected.

If you compare the rowid from t1 with the m_row$$ column (which should hold the rowids of rows from t1) from mlog$_t1 you’ll notice that the insert and the update have been given different values for the source row. Somehow the logging code for the insert has generated the wrong rowid value; however, when you apply a rowidtochar() to the wrong value the return value is the right value – and the merge statement that transfers modified data from the source table into the materialized view uses the rowidtochar() function to join the materialized view log back to the source table, with the result that the same rowid can be inserted twice – except the attempt results in a “duplicate key” error.

 


Modify PK – 2

Fri, 2014-04-25 10:48

In an earlier posting I described how we can play games with primary key indexes in 12c because you can create multiple indexes on a table for the same (ordered) column list provided they have some differences in attributes and only one of them is visible at a time. But how, if you’re not on 12c, can you a primary key index from unique to non-unique (or vice versa, as this question on OTN wants) without any down-time ?

Of course you can’t “change” the uniqueness of an index – that attribute embedded in the way that Oracle stores rowids for the index – but you can create a new index for the constraint; and you can’t avoid the little bit of “not quite down”-time that it takes to start and finish an online rebuild. But how do you get around the limitation of Oracle error: ORA-01408: such column list already indexed

Easy – just work through an intermediate step using an index with an extra column. Here’s an example where I start with a non-unique index supporting the PK and end up with a unique index.


create table t1 as select * from all_objects where rownum <= 10000;

create index t1_pk on t1(object_id);

alter table t1 add constraint t1_pk primary key(object_id);

create index t1_i1 on t1(object_id, 0) online;

alter table t1 modify primary key using index t1_i1;

drop index t1_pk;

create unique index t1_pk on t1(object_id) online;

alter table t1 modify primary key using index t1_pk;

drop index t1_i1;



NL History

Wed, 2014-04-23 11:43

Even the simplest things change – here’s a brief history of nested loop joins, starting from 8i, based on the following query (with some hints):

select
	t2.n1, t1.n2
from
	t2,t1
where
	t2.n2 = 45
and	t2.n1 = t1.n1
;

There’s an index to support the join from t2 to t1, and I’ve forced an (unsuitable) index scan for the predicate on t2.

Basic plan for 8i (8.1.7.4)

As reported by $ORACLE_HOME/rdbms/admin/utlxpls.sql.
Note the absence of a Predicate Information section.

Plan Table
--------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |   225 |    3K|   3038 |       |       |
|  NESTED LOOPS             |          |   225 |    3K|   3038 |       |       |
|   TABLE ACCESS BY INDEX RO|T2        |    15 |  120 |   3008 |       |       |
|    INDEX FULL SCAN        |T2_I1     |    15 |      |      8 |       |       |
|   TABLE ACCESS BY INDEX RO|T1        |     3K|   23K|      2 |       |       |
|    INDEX RANGE SCAN       |T1_I1     |     3K|      |      1 |       |       |
--------------------------------------------------------------------------------
Basic plan for 9i (9.2.0.8)

As reported by a call to a home-grown version of dbms_xplan.display_cursor() with statistics_level set to all.

Note the “prefetch” shape of the body of the plan but the inconsistency in the numbers reported for Rows, Bytes, and Cost seem to be reporting the “traditional” 8i values transposed to match the new arrangement of the operations. There’s also a little oddity in the A-rows column in line 2 which looks as if it is the sum of its children plus 1 when the size of the rowsource is (presumably) the 225 rowids used to access the table.

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                     |  Name       | Rows  | Bytes | Cost  | Starts  | A-Rows  | Buffers |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |   225 |  3600 |  3038 |         |         |         |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1          |    15 |   120 |     2 |     1   |    225  |   3061  |
|   2 |   NESTED LOOPS                |             |   225 |  3600 |  3038 |     1   |    241  |   3051  |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2          |    15 |   120 |  3008 |     1   |     15  |   3017  |
|   4 |     INDEX FULL SCAN           | T2_I1       |  3000 |       |     8 |     1   |   3000  |     17  |
|*  5 |    INDEX RANGE SCAN           | T1_I1       |    15 |       |     1 |    15   |    225  |     34  |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Basic plan for 10g (10.2.0.5)

As reported by a call to dbms_xplan.display_cursor() with statistics_level set to all.

No change from 9i.

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1    |      1 |     15 |    225 |00:00:00.03 |    3061 |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    241 |00:00:00.03 |    3051 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.03 |    3017 |
|   4 |     INDEX FULL SCAN           | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      34 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Basic plan for 11g (11.2.0.4)

As reported by a call to dbms_xplan.display_cursor() with statisics_level set to all

Note how the nested loop has now turned into two NESTED LOOP operations – potentially opening the way for a complete decoupling of index access and table access. This has an interesting effect on the number of starts of the table access by rowid for t1, of course. The number of buffer gets for this operation looks surprisingly low (given that it started 225 times) but can be explained by the pattern of the data distribution – and cross-checked by looking at the “buffer is pinned count” statistic which accounts for most of the table visits.


-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    225 |00:00:00.01 |    3048 |
|   1 |  NESTED LOOPS                 |       |      1 |    225 |    225 |00:00:00.01 |    3048 |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    225 |00:00:00.01 |    3038 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.01 |    3013 |
|   4 |     INDEX FULL SCAN           | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      13 |
|*  5 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T1    |    225 |     15 |    225 |00:00:00.01 |      10 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

There is, however, a second possible plan for 11g. The one above is the “NLJ Batching” plan, but I could have hinted the “NLJ prefetch” strategy, which takes us back to the 9i execution plan (with a very small variation in buffer visits).

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1    |      1 |     15 |    225 |00:00:00.01 |    3052 |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    241 |00:00:00.01 |    3042 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.01 |    3017 |
|   4 |     INDEX FULL SCAN           | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Base plan for 12c (12.1.0.1)

As reported by a call to dbms_xplan.display_cursor() with statistics_level set to all.
Note that the table access to t2 in line 3 is described as “batched” (a feature that can be disabled by the /*+ no_batch_table_access_by_rowid(alias) */  hint) otherwise the plan matches the 11g plan.


---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      1 |        |    225 |00:00:00.01 |    3052 |
|   1 |  NESTED LOOPS                         |       |      1 |        |    225 |00:00:00.01 |    3052 |
|   2 |   NESTED LOOPS                        |       |      1 |    225 |    225 |00:00:00.01 |    3042 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |     15 |     15 |00:00:00.01 |    3017 |
|   4 |     INDEX FULL SCAN                   | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN                   | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
|   6 |   TABLE ACCESS BY INDEX ROWID         | T1    |    225 |     15 |    225 |00:00:00.01 |      10 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Of course 12c also has the “prefetch” version of the plan available; and again “batched” access appears – for both tables in this case – and again the feature can be disabled individually by hints addressed at the tables:


---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED  | T1    |      1 |     15 |    225 |00:00:00.01 |    3052 |
|   2 |   NESTED LOOPS                        |       |      1 |    225 |    225 |00:00:00.01 |    3042 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |     15 |     15 |00:00:00.01 |    3017 |
|   4 |     INDEX FULL SCAN                   | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN                   | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

In these examples the difference in work done by the different variations and versions is negligible, but there may be cases where the pattern of data distribution may change the pattern of logical I/Os and buffer pins – which may affect the physical I/O. In this light it’s interesting to note the hint /*+ cluster_by_rowid(alias) */ that was introduced in 11.2.0.4 but disappeared by 12c changing the 11g plan as follows:


----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      0 |        |      0 |00:00:00.01 |       0 |       |       |          |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1    |      1 |     15 |    225 |00:00:00.01 |     134 |       |       |          |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    241 |00:00:00.01 |     124 |       |       |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.01 |      99 |       |       |          |
|   4 |     SORT CLUSTER BY ROWID     |       |      1 |   3000 |   3000 |00:00:00.01 |       8 |   142K|   142K|  126K (0)|
|   5 |      INDEX FULL SCAN          | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |       8 |       |       |          |
|*  6 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   6 - access("T2"."N1"="T1"."N1")

Note the effect appearing at line 4 – and the extraordinary effect this has on the buffer visits (so significant that I did a follow-up check on v$mystat to see if the figures were consistent). This type of rowid sorting is, of course, an important fix for an Exadata issue I described some time ago, and I had assumed that the “batched” concept in the 12c plan was in some way enabling it – although the 12c rowsource execution stats don’t seem to bear that idea out.

Footnote:

You may also want to read the following note on the 12c batched rowid.


Coincindences

Sat, 2014-04-19 02:22

I had another of those odd timing events today that make me think that Larry Ellison has access to a time machine. I found (yet another example of a) bug that had been reported on MoS just a few days before it appeared on an instance I was running. How is it possible that someone keeps doing things that I’m doing, but just a few days before I do them !

For no good reason I happened to browse through a load of background trace files on an 11.2.0.4 instance and found the following in an “m000″ file:

*** SERVICE NAME:(SYS$BACKGROUND) 2014-04-19 08:55:20.617
*** MODULE NAME:(MMON_SLAVE) 2014-04-19 08:55:20.617
*** ACTION NAME:(Auto-Purge Slave Action) 2014-04-19 08:55:20.617

*** KEWROCISTMTEXEC - encountered error: (ORA-06525: Length Mismatch for CHAR or RAW data
ORA-06512: at "SYS.DBMS_STATS", line 29022
ORA-06512: at line 1
)
  *** SQLSTR: total-len=93, dump-len=93,
      STR={begin dbms_stats.copy_table_stats('SYS', :bind1, :bind2, :bind3, flags=>1, force=>TRUE); end;}

Before trying to track down what had gone wrong I did a quick check on MoS, searching for “copy_table_stats” and “29022″ and found bug 17079301 – fixed in 12.2, and 12.1.0.2, with a patch for 12.1.0.1 (and some back-ports for 11.2.0.4). The description of the bug in the note was basically: “it happens”.

I may get around to looking more closely at what’s gone wrong but as an initial though I’m guessing that, even though the action name is “auto-purge slave action”, this may be something to do with adding a partition to some of the AWR history tables and rolling stats forward – so at some point I’ll probably start by checking for partitions with missing stats in the SYS schema.

The bug note, by the way, was published (last updated, on second thoughts) on 14th April 2014 – just 5 days before I first happened to spot the occurrence of the bug.


Coincindences

Sat, 2014-04-19 02:22

I had another of those odd timing events today that make me think that Larry Ellison has access to a time machine. I found (yet another example of a) bug that had been reported on MoS just a few days before it appeared on an instance I was running. How is it possible that someone keeps doing things that I’m doing, but just a few days before I do them !

For no good reason I happened to browse through a load of background trace files on an 11.2.0.4 instance and found the following in an “m000″ file:

*** SERVICE NAME:(SYS$BACKGROUND) 2014-04-19 08:55:20.617
*** MODULE NAME:(MMON_SLAVE) 2014-04-19 08:55:20.617
*** ACTION NAME:(Auto-Purge Slave Action) 2014-04-19 08:55:20.617

*** KEWROCISTMTEXEC - encountered error: (ORA-06525: Length Mismatch for CHAR or RAW data
ORA-06512: at "SYS.DBMS_STATS", line 29022
ORA-06512: at line 1
)
  *** SQLSTR: total-len=93, dump-len=93,
      STR={begin dbms_stats.copy_table_stats('SYS', :bind1, :bind2, :bind3, flags=>1, force=>TRUE); end;}

Before trying to track down what had gone wrong I did a quick check on MoS, searching for “copy_table_stats” and “29022″ and found bug 17079301 – fixed in 12.2, and 12.1.0.2, with a patch for 12.1.0.1 (and some back-ports for 11.2.0.4). The description of the bug in the note was basically: “it happens”.

I may get around to looking more closely at what’s gone wrong but as an initial though I’m guessing that, even though the action name is “auto-purge slave action”, this may be something to do with adding a partition to some of the AWR history tables and rolling stats forward – so at some point I’ll probably start by checking for partitions with missing stats in the SYS schema.

The bug note, by the way, was published (last updated, on second thoughts) on 14th April 2014 – just 5 days before I first happened to spot the occurrence of the bug.


Bitmap loading

Fri, 2014-04-18 05:43

Everyone “knows” that bitmap indexes are a disaster (compared to B-tree indexes) when it comes to DML. But at an event I spoke at recently someone made the point that they had observed that their data loading operations were faster when the table being loaded had bitmap indexes on it than when it had the equivalent B-tree indexes in place.

There’s a good reason why this can be the case.  No prizes for working out what it is – and I’ll supply an answer in a couple of days time.  (Hint – it may also be the reason why Oracle doesn’t use bitmap indexes to avoid the “foreign key locking” problem).

 


Bitmap loading

Fri, 2014-04-18 05:43

Everyone “knows” that bitmap indexes are a disaster (compared to B-tree indexes) when it comes to DML. But at an event I spoke at recently someone made the point that they had observed that their data loading operations were faster when the table being loaded had bitmap indexes on it than when it had the equivalent B-tree indexes in place.

There’s a good reason why this can be the case.  No prizes for working out what it is – and I’ll supply an answer in a couple of days time.  (Hint – it may also be the reason why Oracle doesn’t use bitmap indexes to avoid the “foreign key locking” problem).

Answer

As Martin (comment 3) points out, there’s a lot of interesting information in the statistics once you start doing the experiment. So here’s some demonstration code, first we create a table with one of two possible indexes:


create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	mod(rownum,1000)	btree_col,
	mod(rownum,1000)	bitmap_col,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create        index t1_btree on t1(btree_col) nologging;
-- create bitmap index t1_bitmap on t1(bitmap_col) nologging;

You’ll note that the two columns I’m going to build indexes on hold the same data in the same order – and it’s an order with maximum scatter because of the mod() function I’ve used to create it. It’s also very repetitive data, having 1000 distinct values over 1,000,0000 rows. With the data and (one of) the indexes in place I’m going to insert another 10,000 rows:

execute snap_my_stats.start_snap

insert /* append */ into t1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	1e6 + rownum		id,
	mod(rownum,1000)	btree_col,
	mod(rownum,1000)	bitmap_col,
	rpad('x',100)		padding
from
	generator
;

execute snap_my_stats.end_snap

You’ll note that I’ve got an incomplete append hint in the code – I’ve tested the mechanism about eight different ways, and left the append in as a convenience, but the results I want to talk about (first) are with the hint disabled so that the insert is a standard insert. The snap_my_stats calls are my standard mechanism to capture deltas of my session statistics (v$mystat) – one day I’ll probably get around to using Tanel’s snapper routine everywhere – and here are some of the key results produced in the two tests:


11.2.0.4 with btree
===================
Name                                                                     Value
----                                                                     -----
session logical reads                                                   31,403
DB time                                                                     64
db block gets                                                           31,195
consistent gets                                                            208
db block changes                                                        21,511
redo entries                                                            10,873
redo size                                                            3,591,820
undo change vector size                                                897,608
sorts (memory)                                                               2
sorts (rows)                                                                 1

11.2.0.4 with bitmap
====================
Name                                                                     Value
----                                                                     -----
session logical reads                                                   13,204
DB time                                                                     42
db block gets                                                            8,001
consistent gets                                                          5,203
db block changes                                                         5,911
redo entries                                                             2,880
redo size                                                            4,955,896
undo change vector size                                              3,269,932
sorts (memory)                                                               3
sorts (rows)                                                            10,001

As Martin has pointed out, there are a number of statistics that show large differences between the B-tree and bitmap approaches, but the one he didn’t mention was the key: sorts (rows). What is this telling us, and why could it matter so much ? If the B-tree index exists when the insert takes place Oracle locates the correct place for the new index entry as each row is inserted which is why you end up with so many redo entries, block gets and block changes; if the bitmap index exists, Oracle postpones index maintenance until the table insert is complete, but accumulates the keys and rowids as it goes then sorts them to optimize the rowid to bitmap conversion and walks the index in order updating each modified key just once.

The performance consequences of the two different strategies depends on the number of indexes affected, the number of rows modified, the typical number of rows per key value, and the ordering of the new data as it arrives; but it’s possible that the most significant impact could come from ordering.  As each row arrives, the relevant B-tree indexes are modified – but if you’re unlucky, or have too many indexes on the table, then each index maintenance operation could result in a random disk I/O to read the necessary block (how many times have you seen complaints like: “we’re only inserting 2M rows but it’s taking 45 minutes and we’re always waiting on db file sequential reads”). If Oracle sorts the index entries before doing the updates it minimises the random I/O because it need only update each index leaf block once and doesn’t run the risk of re-reading many leaf blocks many times for a big insert.

Further Observations

The delayed maintenance for bitmap indexes (probably) explains why they aren’t used to avoid the foreign key locking problem.  On a large insert, the table data will be arriving, the b-tree indexes will be maintained in real time, but a new child row of some parent won’t appear in the bitmap index until the entire insert is complete – so another session could delete the parent of a row that exists, is not yet committed, but is not yet visible. Try working out a generic strategy to deal with that type of problem.

It’s worth noting, of course, that when you add the /*+ append */ hint to the insert then Oracle uses exactly the same optimization strategy for B-trees as it does for bitmaps – i.e. postpone the index maintenance, remember all the keys and rowids, then sort and bulk insert them.  And when you’ve remembered that, you may also remember that the hint is (has to be) ignored if there are any enabled foreign key constraints on the table. The argument for why the hint has to be ignored and why bitmap indexes don’t avoid the locking problem is (probably) the same argument.

You may also recall, by the way, that when you have B-tree indexes on a table you can choose the optimal update or delete strategy by selecting a tablescan or index range scan as the execution path.  If you update or delete through an index range scan the same “delayed maintenance” trick is used to optimize the index updates … except for any indexes being used to support foreign key constraints, and they are maintained row by row.

In passing, while checking the results for this note I re-ran some tests that I had originally done in 2006 and added one more test that I hadn’t considered at the time; as a result I can also point out that index will see delayed maintenance if you drive the update or delete with an index() hint, but not if you drive it with an index_desc() hint.