Jonathan Lewis

Subscribe to Jonathan Lewis feed Jonathan Lewis
Just another Oracle weblog
Updated: 35 min 34 sec ago

DBaaS Performance

Fri, 2017-01-27 01:58

I don’t know how I missed it but Randolf Geist has been doing writing a series of posts on the performance of Oracle’s DBaaS offering, using a series of long-running tests to capture not only raw performance figures but also an indication of consistency. You can find all of these tests with a search URL on his blog, but I’ve also created a little index here to make it easier for me to access them in order.

Oracle Database Cloud (DBaaS) Performance Consistency Oracle Database Cloud (DBaaS) Performance

… to be continued (I hope).

h/t to Connor McDonald for the tweet that took me back to Randolf’s blog.

 


Basicfile LOBs

Thu, 2017-01-26 06:03

I wrote a short series a little while ago about some of the nasty things that can happen (and can’t really be avoided) with Basicfile LOBs and recently realised that it needed a directory entry so that I didn’t have to supply 6 URLs if I wanted to point someone to it; so here’s the catalogue:

At some stage I may also write a similar series about Securefile LOBs – because you do hit problems if you have a system that does a lot of work modifying a LOB segment whether or not it’s Basicfile or Securefile, and you need a strategy for damage limitation.

Footnote

At the time of creating this catalogue I’ve had an SR open with Oracle for about 4 months on the problem that triggered this series, basically asking if there was a way to limit the number of chunks that could be taken off the reusable part of the index. So far I haven’t had an answer to that question; however the client was able to switch the table into a partitioned table and now drops old partitions rather than deleting old data.

 


use_nl hint

Fri, 2017-01-13 02:52

In response to a recent lamentation from Richard Foote about the degree of ignorance regarding the clustering_factor of indexes I commented on the similar level of understanding of a specific hint syntax, namely use_nl(a b) pointing out that this does not mean “do a nested loop from a to b”. My comment was underscored by a fairly prompt response asking what the hint did mean.

Surprisingly, although I’ve explained it many times over the last couple of decades (here’s one from 10 years ago), I couldn’t find an explanation on my blog though I did find a blog note where I’d made a passing comment about the equivalent misunderstanding of the use_hash(a b) syntax.

The misunderstanding is not entirely surprising since for many years the Oracle manuals seemed to suggest (in their examples) that the hint did have a multi-table meaning and it wasn’t until 10g that the manual gave an explicit statement of the single-table nature of the hint. The hint /*+ use_nl(a b) */ is a short-hand for the pair of hints /*+ use_nl(a)  use_nl(b) */ it doesn’t say anything about whether a and b should be joined, or in what order. If you want to guarantee that a and b will be joined in that order by a nested loop you will have to work a lot harder with your hints – and almost certainly need to make use of the /+ leading() */ hint.

Consider the following query (I’ll put the table creation code at the end of the article if you want to experiment):

select
	/*+ use_nl(a b) */
	a.v1, b.v1, c.v1, d.v1
from
	a, b, c, d
where
	d.n100 = 0
and	a.n100 = d.id
and	b.n100= a.n2
and	c.id = a.id
;

Only one of the tables a and b can be the first table in the final execution plan so one of them will be “the next table in the join order” at some point, so this hint will guarantee that one of the tables will be the inner table of a nested loop join. Here’s the plan I happened to get with my data, indexing, version (11.2.0.4), etc.:

---------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |      | 20000 |  1347K| 30125   (1)| 00:00:02 |
|   1 |  HASH JOIN                     |      | 20000 |  1347K| 30125   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL            | C    | 10000 |   146K|    26   (4)| 00:00:01 |
|   3 |   HASH JOIN                    |      | 20000 |  1054K| 30098   (1)| 00:00:02 |
|   4 |    TABLE ACCESS FULL           | D    |   100 |  1800 |    26   (4)| 00:00:01 |
|   5 |    NESTED LOOPS                |      | 20000 |   703K| 30072   (1)| 00:00:02 |
|   6 |     NESTED LOOPS               |      | 20000 |   703K| 30072   (1)| 00:00:02 |
|   7 |      TABLE ACCESS FULL         | B    | 10000 |   136K|    26   (4)| 00:00:01 |
|   8 |      INDEX RANGE SCAN          | A_I2 |     2 |       |     1   (0)| 00:00:01 |
|   9 |     TABLE ACCESS BY INDEX ROWID| A    |     2 |    44 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

In this case it’s table a that ends up in a position to be the inner table of a nested loop join.

You may be wondering why there seems to be a hash join into b when we’ve hinted a nested loop join – but the join order that Oracle is using is B -> A -> D -> C with a swap_join_inputs(d) swap_join_inputs(d), so b is never “the next table in the join order”.

If you want an even more confusing (at first sight) plan here’s the plan I got if I changed the one hint to /*+ use_nl(a) */


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|   1 |  HASH JOIN           |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | B    | 10000 |   136K|    26   (4)| 00:00:01 |
|   3 |   HASH JOIN          |      | 10000 |   537K|    78   (4)| 00:00:01 |
|   4 |    TABLE ACCESS FULL | C    | 10000 |   146K|    26   (4)| 00:00:01 |
|   5 |    HASH JOIN         |      | 10000 |   390K|    52   (4)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| D    |   100 |  1800 |    26   (4)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| A    | 10000 |   214K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------

This plan really looks as if Oracle should have done a nested loop into a but didn’t. Again appearanced are deceptive thanks to the effects of swap_join_inputs(): the join order here is A -> D -> C -> B (note that we don’t have a use_nl(b) hint in this example).

If you want a plan where the optimizer produces a nested loop join between a and b you’ll need to put in a leading() hint which places b immediately after a somewhere in the list of tables with just use_nl(b) being sufficient to enforce the join method. Here, for example, is the plan with hints /*+ leading(d a b c) use_nl(b) */ for my data set:


----------------------------------------------------------------------------------------
| Id  | Operation                     | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |        | 20000 |  1347K| 30164   (1)| 00:00:02 |
|   1 |  HASH JOIN                    |        | 20000 |  1347K| 30164   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL           | C      | 10000 |   146K|    26   (4)| 00:00:01 |
|   3 |   NESTED LOOPS                |        | 20000 |  1054K| 30137   (1)| 00:00:02 |
|   4 |    NESTED LOOPS               |        |  1000K|  1054K| 30137   (1)| 00:00:02 |
|   5 |     HASH JOIN                 |        | 10000 |   390K|    52   (4)| 00:00:01 |
|   6 |      TABLE ACCESS FULL        | D      |   100 |  1800 |    26   (4)| 00:00:01 |
|   7 |      TABLE ACCESS FULL        | A      | 10000 |   214K|    26   (4)| 00:00:01 |
|   8 |     INDEX RANGE SCAN          | B_I100 |   100 |       |     1   (0)| 00:00:01 |
|   9 |    TABLE ACCESS BY INDEX ROWID| B      |     2 |    28 |   101   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Notice, yet again, Oracle has done hash join to c with a swap_join_inputs().

Creation Script:

create table a
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
	rownum				id,
	mod(rownum,5000)		n2,
	mod(rownum,100)			n100,
	lpad(rownum,10,'0')		v1,
	lpad('x',100,'x')		padding
from
        generator       v1
;

create table b nologging as select * from a;
create table c nologging as select * from a;
create table d nologging as select * from a;

alter table a add constraint a_pk primary key(id);
alter table b add constraint b_pk primary key(id);
alter table c add constraint c_pk primary key(id);
alter table d add constraint d_pk primary key(id);

create index a_i2 on a(n2) nologging;
create index b_i2 on b(n2) nologging;
create index c_i2 on c(n2) nologging;
create index d_i2 on d(n2) nologging;

create index a_i100 on a(n100) nologging;
create index b_i100 on b(n100) nologging;
create index c_i100 on c(n100) nologging;
create index d_i100 on d(n100) nologging;
begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'A',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'B',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'C',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'D',
		method_opt	 => 'for all columns size 1'
	);
end;
/

Join Elimination 12.2

Tue, 2017-01-10 07:03

From time to time someone comes up with the question about whether or not the order of tables in the from clause of a SQL statement should make a difference to execution plans and performance. Broadly speaking the answer is no, although there are a couple of boundary cases were a difference can appear unexpectedly.

When considering join permutations the optimizer has a few algorithms for picking an initial join order and then deciding how to permute from that order, and one of the criteria with the very lowest priority (i.e. when all other factors are equal) is dictated by the order the tables appear in the from clause so if you have enough tables in the from clause it’s possible for the subset of join orders considered to change if you change the from clause in a way that causes the initial join order to change.

It’s been over 11 years since I wrote the article I’ve linked to in the previous paragraph and in that time no-one has yet approached me with other examples of a plan changing due to a change in the from clause order (though, with all the transformations now available to the optimizer, I wouldn’t be surprised if a few cases have appeared occasionally, so feel free to let me know if you think you’ve got an interesting example that I can experiment on).

A little while ago, though, while testing a feature enhancement in 12.2, I finally came across a case where a real difference appeared. Here’s the query I was using – I’ll give you the SQL to reproduce the tables at the end of the article:


select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

As you will see later on the three tables grandparent, parent, child have the obvious primary keys and referential integrity constraints. This means that grandparent has a single-column primary key, parent has a two-column primary key, and child has a three-column primary key. The query joins the three tables along their primary keys and then selects data only from the child table, so it’s a good candidate for join elimination.

In earlier versions of Oracle join elimination could take place only if the primary key you joined to was a single column key, so 12.1 and earlier would be able to eliminate just the grandparent from this three-table join; but in 12.2 multi-column primary keys also allow join elimination to take place, so we might hope that the plan we get from this query would eliminate both the grandparent and parent tables. Here’s the plan (pulled from memory after execution):

SQL_ID  8hdybjpn2884b, child number 0
-------------------------------------
select  count(c.small_vc_c) from  grandparent g,  parent  p,  child  c
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 4120004759

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE     |       |     1 |    23 |            |          |
|   2 |   NESTED LOOPS      |       |    85 |  1955 |    26   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| CHILD |    85 |  1615 |    26   (4)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| G_PK  |     1 |     4 |     0   (0)|          |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))
   4 - access("G"."ID"="C"."ID_G")

It didn’t work quite as expected. The optimizer has managed to eliminate table parent – so that looks like “single column primary key” join elimination has worked, but “multi-column” join elimination hasn’t appeared. On the other hand, I’ve not followed my usual rules for writing SQL so let’s try again. If I follow the pattern I usually follow, my from clause would have been in the order child  -> parent -> grandparent – listing the tables in the order I expect to visit them. Here’s the plan – again pulled from memory – after making this visual change the SQL:


SQL_ID  1uuq5vf4bq0gt, child number 0
-------------------------------------
select  count(c.small_vc_c) from  child  c,  parent  p,  grandparent g
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 1546491375

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE    |       |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL| CHILD |    85 |  1275 |    26   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))

So join elimination based on multi-column primary keys does work – but you might have to get a bit lucky in the order you list the tables in the from clause.

Footnote.

If you’re wondering whether or not switching from Oracle syntax to ANSI syntax would make a difference, it does: with ANSI syntax both grandparent and parent are eliminated if the SQL lists the tables in the order grandparent -> parent -> child (i.e. the order which doesn’t work properly for Oracle syntax) and only the parent is eliminated for the order child -> parent -> grandparent. In other words, both syntax options have a point of failure but they fail the opposite way around.

Code:

rem
rem	Script:		join_elimination_12c2.sql
rem	Author:		Jonathan Lewis
rem	

-- Environment details eliminated

define m_pad=100

/*
	IDs will be 1 to 1000
*/

create table grandparent 
as
select 
	rownum			id,
	trunc((rownum-1)/5)	small_num_g,
	rpad(rownum,10)		small_vc_g,
	rpad(rownum,&m_pad)	padding_g
from 
	all_objects 
where 
	rownum <= 1000
;

/*
	Each GP has two (scattered) children here
	Parent IDs are 1 to 2,000
*/

create table parent 
as
select 
	1+mod(rownum,1000)	id_g,
	rownum			id,
	trunc((rownum-1)/5)	small_num_p,
	rpad(rownum,10)		small_vc_p,
	rpad(rownum,&m_pad)	padding_p
from 
	all_objects 
where 
	rownum <= 2000
;

/*
	Simple trick to get 5 (clustered) children per parent
	Child IDs are 1 to 12,000
*/

create table child 
as
select 
	id_g,
	id			id_p,
	rownum			id,
	trunc((rownum-1)/5)	small_num_c,
	rpad(rownum,10)		small_vc_c,
	rpad(rownum,&m_pad)	padding_c
from 
	parent	p,
	(
		select /*+ no_merge */ 
			rownum 
		from	parent p 
		where	rownum <= 5
	)	d
;

create unique index g_pk on grandparent(id);
create unique index p_pk on parent     (id_g, id)       compress 1;
create unique index c_pk on child      (id_g, id_p, id) compress 2;

alter table grandparent add constraint g_pk primary key (id);
alter table parent      add constraint p_pk primary key (id_g, id);
alter table child       add constraint c_pk primary key (id_g, id_p, id);

alter table parent add constraint p_fk_g foreign key (id_g)       references grandparent;
alter table child  add constraint c_fk_p foreign key (id_g, id_p) references parent;

rem
rem	Don't need to collect stats because it's 12c
rem

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered gp -> p -> c
prompt	The final plan is GP->C, The optimizer eliminated P before
prompt	considering GP
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered c -> p -> gp
prompt	The final plan is a tablescan of C only. The optimizer managed 
prompt	to eliminate GP first and P second
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	child		c,
	parent		p,
	grandparent	g 
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	==================================================
prompt	Convert to ANSI standard in the order gp -> p -> c
prompt	and both gp and p eliminated.
prompt	==================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g
join
	parent		p
on	p.id_g = g.id
join
	child		c
on	c.id_p = p.id
and	c.id_g = p.id_g
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===================================================
prompt	Convert to ANSI standard in the order c -> p -> gp
prompt	and only p is eliminated. 
prompt	===================================================

select 
	count(c.small_vc_c)
from 
	child		c
join
	parent		p
on      p.id   = c.id_p 
and	p.id_g = c.id_g 
join
	grandparent	g
on	g.id = p.id_g 
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

It’s possible, of course, that with different system stats, or I/O calibration, or extent sizes, or segment space management, or block sizes, sundry other parameter details that you won’t be able to reproduce the results without messing about a little bit, but I don’t think I’ve done anything special in the setup that would make a real difference.


Index Compression

Tue, 2016-12-13 07:11

Richard Foote has published a couple of articles in the last few days on the new (licensed under the advanced compression option) compression mechanism in 12.2 for index leaf blocks. The second of these pointed out that the new “high compression” mechanism was even able to compress single-column unique indexes – a detail that doesn’t make sense and isn’t allowed for the older style “leading edge deduplication” mechanism for index compression.

In 12.2 an index can be created (or rebuilt) with the option “compress advanced high” – and at the leaf-block level this will create “compression units” (possibly just one per leaf block – based on my early testing) that takes the complexity of compression far beyond the level of constructing a directory of prefixes. Richard demonstrated the benefit by creating a table with a numeric unique index – then compressed the index, reducing its size from 2,088 leaf blocks to 965 leaf blocks, which is pretty dramatic difference.

It crossed my mind, though, to wonder whether the level of compression was a side effect of the very straightforward code that Richard had used to create the data set: the table was a million rows with a primary key that had been generated as the rownum selected from the now-classic “connect by..” query against dual, and the row length happened to allow for 242 rows per 8KB table block.

If you pause to think about this data set you realise that if you pick the correct starting point and walk through 242 consecutive entries of the index you will be walking through 242 consecutive rows in the table starting from the zeroth row in a particular table block and ending at the 241st row in that block. A rowid (as stored by Oracle in a simple B-tree index) consists of 6 bytes and the first five bytes of the rowid will be the same for the first 256 rows in any one table block (and the first four will still be the same for the remaining rows in the block). Richard’s data set will be very close to ideal for any byte-oriented, or bit-oriented, compression algorithm because (to use Oracle terminology) the data will have a perfect clustering_factor. (He might even have got a slightly better compression ratio if he’d used an /*+ append */ on the insert, or done a CTAS, and reduced the rowsize to get a uniform 256 rows per table block.)

So how do things change if you randomise the ordering of the key ? Here’s a variant on Richard’s code:


rem
rem	Script:		index_compression_12c_2.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Dec 2016
rem
rem	Last tested 
rem		12.2.0.1
rem

execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
	rownum				id,
	lpad('x',130,'x')		padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
order by
	dbms_random.value
;

select table_name, blocks, round(num_rows/blocks,0) rpb from user_tables where table_name = 'T1';

drop index t1_i1;
create unique index t1_i1 on t1(id);
execute dbms_stats.gather_index_stats(user,'t1_i1');
select index_name, compression, pct_free, leaf_blocks from user_indexes where index_name = 'T1_I1';

drop index t1_i1;
create unique index t1_i1 on t1(id) compress advanced high;
execute dbms_stats.gather_index_stats(user,'t1_i1');
select index_name, compression, pct_free, leaf_blocks from user_indexes where index_name = 'T1_I1';

The initial drop index is obviously redundant, and the calls to gather_index_stats should also be redundant – but they’re there just to make it obvious I haven’t overlooked any checks for correctness in the build and stats.

You’ll notice that my row length is going to be slightly more “real-world” than Richard’s so that the degree of compression I get from nearly identical rowid values is likely to be reduced slightly, and I’ve completely randomised the order of key values.

So what do the results like ?

With the default pctfree = 10, and in a tablespace of uniform 1MB extents, 8KB blocks, utilising ASSM I get this:


TABLE_NAME               BLOCKS        RPB
-------------------- ---------- ----------
T1                        19782         51

INDEX_NAME           COMPRESSION     PCT_FREE LEAF_BLOCKS
-------------------- ------------- ---------- -----------
T1_I1                DISABLED              10        2088
T1_I1                ADVANCED HIGH         10        1303

Unsurprisingly the uncompressed index is exactly the same size as Richard’s (well, it was just the integers from 1 to 1M in both cases) but the compression ratio is significantly less – though still pretty impressive.

Of course, for this type of index my example probably goes to the opposite extreme from Richard’s. Realistically if you have a sequence based key with an OLTP pattern of data arrival then consecutive key values are likely to be scattered within a few blocks of each other rather than being scattered complely randomly across the entire width of the table; so a more subtle model (using a suitable number of concurrent processes to insert ids based on a sequence, perhaps) would probably get a better compression ratio than I did, though a worse one than Richard’s.There’s also the issue of the size of the key value itself – once you get to values in the order of 10 million to 100 million you’re looking at mostly 4 bytes (internal format) storage where for large runs of values the first 3 bytes match, possibly leading to a better compression ratio.

Of course the question of globally partitioned indexes will be relevant for some people since the principle reason for global indexes on partitioned tables is to enforce uniqueness on column combinations that don’t include the partition key, and that introduces another possible benefit – the rowid goes up to 10 bytes, of which the first 4 bytes are the object id of the table partition: depending on the nature of the partitioning that repeated 4 bytes per row may be close to non-existent after compression, giving you a better compression ratio on globally partitioned than you get on any other type of single column unique index.

Once you start looking at the details there are a surpising number of factors that can affect how well the advanced compression high can work.

Footnote:

Once you’ve created the index, you can start poking around in all sorts of ways to try and work out what the compression algorithm does. A simple block dump is very informative, with lots of clues in the descriptive details – and lots of puzzles when you start looking too closely. There are hints that this type of index compression adopts a strategy similar to “oltp comprssion” for tables in that compression occurs only as the leaf block becomes full – and possibly allows some sort of batching process within a leaf block before finally compressing to a single contiguous unit. (This is just conjecture, at present: the only firm statement I’ll make about the actual implementation of index compression is that it uses a lot of CPU; in my example the baseline create index took about 1.5 seconds of CPU, the compressed create took about 4.5 seconds of CPU.)

There are also a couple of amusing side effects that may confound those who use the old “validate index / query index_stats” two-step to decide when to rebuild indexes. Here’s what I got on the compressed index:


SQL> validate index t1_i1;

SQL> select blocks, lf_rows, lf_rows_len, btree_space, used_space, pct_used from index_stats;

    BLOCKS    LF_ROWS LF_ROWS_LEN BTREE_SPACE USED_SPACE   PCT_USED
---------- ---------- ----------- ----------- ---------- ----------
      1408    1000000	 14979802    10416812	14994105	144

My index is using 144% of the space that it has been allocated. You don’t have to be very good at maths (or math, even) to realise that something’s wrong with that number.


Extended Stats

Wed, 2016-12-07 09:54

After my Masterclass on indexes at the UKOUG Tech2016 conference this morning I got into a conversation about creating extended stats on a table. I had pointed out in the masterclass that each time you dropped an index you really ought to be prepared to create a set of extended stats (specifically a column group) on the list of columns that had defined the index just in case the optimizer had been using the distinct_keys statistic from the index to help it calculate cardinalities.

Unfortunately there is a limit on the number of column groups (or any other type of extended stats) you can have on a table and that limit is the larger of 20 and ceiling(number of columns / 10) – so you typically run into a problem if you want to take defensive action after dropping more than twenty (multi-column) indexes. (And you wonder how Oracle’s adaptive dynamic stats process that silently creates column groups overnight handles the problem of needing far more column groups than are allowed.)

The conversation led on to the oddity that the column count includes the virtual columns representing the column groups so, for example, if you have 253 columns in your table you can create 26 column groups; but if you have 26 column groups that means you have a total of 279 columns, so you can actually create a total of 28 groups (an extra 2); but if you create those two column groups you now have a total of 281 columns in the table which means you’re allowed a total of 29 column groups so you can add one more column group for a total of 282 columns. Here’s some code (which I’ve run only on 11.2.0.4) to play with – to keep things very simple I’ve generated some trivial extended stats rather than column groups:


rem
rem     Script:         extended_stats_limit2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2016
rem

drop table t1 purge;

begin
        for i in 2..253 loop
                execute immediate
                'alter table t1 add (c' || to_char(i,'FM000') || ' number)';
        end loop;
end;
/

desc t1

prompt  ============================================================================================
prompt  This will raise an error on the 30th addition
prompt  ORA-20008: Number of extensions in table TEST_USER.T1 already reaches the upper limit (28.2)
prompt  ============================================================================================

declare
        ext_string varchar2(20);
begin
        for i in 1..30 loop
                ext_string := '(c001 + ' || i || ')';
                dbms_output.put_line(
                        dbms_stats.create_extended_stats(
                                ownname         => user,
                                tabname         => 'T1',
                                extension       => ext_string
                        )
                );
        end loop;
end;
/

column column_name format a32

select
        column_name, hidden_column, virtual_column, segment_column_id, internal_column_id
from
        user_tab_cols
where
        table_name = 'T1'
order by
        internal_column_id
;

This code results in a table with 253 segment columns, and 29 hidden, virtual columns (with names like SYS_STU0#$2X$X1M4NFZVM2O_5A3FC) representing the extended stats. What if I want more extended stats ? There is no limit on virtual columns in general, beyond the inherent table limit of 1,000 columns total, so what if I create a few virtual columns (another 39, say, taking my total column count to 321): would this allow me to increase the number of extended stats to 33 – and if so, what would happen if I then dropped the virtual columns:


prompt  ============================================
prompt  Now we add some virtual columns after which
prompt  we will be able to add more extended stats
prompt  and drop the virtual columns
prompt  ============================================

begin
        for i in 1..39 loop
                execute immediate
                'alter table t1 add (virt' || to_char(i,'fm000') ||
                        ' generated always as ( c002 + ' || i || ') virtual)'
                ;
        end loop;
end;
/

select
        column_name, hidden_column, virtual_column, segment_column_id, internal_column_id
from
        user_tab_cols
where
        table_name = 'T1'
order by
        internal_column_id
;

prompt  ============================================================================================
prompt  We can now get up to 33 extended stats
prompt  This will raise an error on the attempt to add the 34th set
prompt  ORA-20008: Number of extensions in table TEST_USER.T1 already reaches the upper limit (32.5)
prompt  ============================================================================================

declare
        ext_string varchar2(20);
begin
        for i in 30..34 loop
                ext_string := '(c001 + ' || i || ')';
                dbms_output.put_line(
                        dbms_stats.create_extended_stats(
                                ownname         => user,
                                tabname         => 'T1',
                                extension       => ext_string
                        )
                );
        end loop;
end;
/

select
        column_name, hidden_column, virtual_column, segment_column_id, internal_column_id
from
        user_tab_cols
where
        table_name = 'T1'
order by
        internal_column_id
;


select
        column_name, internal_column_id
from
        user_tab_cols
where
        table_name = 'T1'
and     hidden_column = 'YES'
and     virtual_column = 'YES'
order by
        internal_column_id
;

prompt  ============================
prompt  Now drop the virtual columns
prompt  ============================

begin
        for r in (
                select column_name from user_tab_cols
                where  column_name like 'VIRT%'
        ) loop
                execute immediate
                'alter table t1 drop column ' || r.column_name;
        end loop;
end;
/

select
        column_name, internal_column_id
from
        user_tab_cols
where
        table_name = 'T1'
and     virtual_column = 'YES'
order by
        internal_column_id
;

When I ran this code I ended up with a table consisting of 286 columns, of which 253 were my original columns and 33 – with internal column ids of 254 to 286 inclusive – were the extended stats. It seems there is a way to bypass the limit if you really want to – though I’m not sure I’d really want to do it on a production system.

Left as Exercise for the Reader:

Create a table with 5 real columns and the 26 column groups needed to represent all (multi-column) combinations of those five columns. (Remember that the order of columns in a column group is not really significant). (The 26 groups consist of: 1 x 5 column, 5 x 4 column, 10 x 3 column, 10 x 2 column – this may remind some of you of binomial expansions, others may remember it as a row from Pascal’s triangle, you could also view it as a particular subset of the binary representations of the integers from 1 to 31.)

 


Histogram Upgrade

Fri, 2016-12-02 09:02

I’ve written notes in the past about the improvements 12c introduces for histograms – particularly the frequency and top-N histograms which can be generated “free of charge” while the basic “approximate NDV” scans are taking place to gather stats. Gathering histograms in 12c is much safer than it used to be in earlier versions of Oracle even in the case of the new hybrid histograms (which are still sampled on a very small sample and therefore still a stability risk).

There is a threat, though, recently highlighted by Franck Pachot, that sneaked in at 11.2.0.4 – the way in which the calculation of endpoint values for histograms on char() and nchar() columns has changed. When you upgrade from anything prior to 11.2.0.4 to either 11.2.0.4 or 12c you need to recreate any historgams on those columns; note that this does not apply to varchar2() and nvarchar2() columns, just the fixed length character types. If you fail to do this then you may find that Oracle produces some very silly estimates of cardinality, which could result in some very inefficient tablescans – in particular you are likely to find (as Franck showed) cases where you “know” that a particular value is IN the histogram but the optimizer behaves as if it isn’t – which means it uses the “half the least popular” estimate for the cardinality.

Here’s a little demo to show the underlying difference:


rem
rem     Script:         histogram_change_11204.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2016
rem     Purpose:
rem

create table t1 (v1 varchar2(32), c1 char(32));
insert into t1
select
        case when rownum <= 100 then 'N' else 'Y' end,
        case when rownum <= 100 then 'N' else 'Y' end
from
        all_objects
where
        rownum <= 1000
;
begin
        dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 10'
        );
end;
/

column endpoint_value format 999,999,999,999,999,999,999,999,999,999,999,999
break on column_name skip 1

select
        column_name, endpoint_number, endpoint_value, to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
from
        user_tab_histograms
where
        table_name = 'T1'
order by
        column_name,
        endpoint_number
;

Here are the results from an instance of 11.1.0.7 (though anything up to 11.2.0.3 should produce the same), and 11.2.0.4 (and later – including 12.2):

Results 11.1.0.7
----------------
COLUMN_NAME          ENDPOINT_NUMBER                                   ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX
-------------------- --------------- ------------------------------------------------ ---------------------------------
C1                               100  405,650,737,316,592,000,000,000,000,000,000,000    4e20202020203a7bb119d5f6000000
                                1000  462,766,002,760,475,000,000,000,000,000,000,000    59202020202034d998ff0b5ae00000

V1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

Results 11.2.0.4
----------------
COLUMN_NAME          ENDPOINT_NUMBER                                   ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX
-------------------- --------------- ------------------------------------------------ ---------------------------------
C1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

V1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

Look particularly at the first 6 bytes of the Hex version of the endpoint values for the char() column c1. In 11.1.0.7 you see “4e2020202020”, “592020202020” – that’s ASCII ‘N’ and ‘Y’ respectively, padded to 6 characters with spaces. In 11.2.0.4 the spaces have disappeared – the char() columns are now padded to 6 characters with zeros (which is how varchar2() columns have always been treated).

In 11.1.0.7 the optimizer will find a histogram entry for c1 = ‘Y’ and produce a cardinality of 900; if you upgrade the database to 11.2.0.4 without recreating the histograms the optimizer won’t find a histogram entry for the predicate and will produce a cardinality of 50 (i.e. 100 / 2).

Footnote

There’s a brief summary of the algorithm Oracle uses to generate values for character-based histograms at this URL.

 


Delete/Insert #2

Tue, 2016-11-29 10:33

In the previous post I threw out a couple of options for addressing the requirement to transfer data from one table to another (“cut and paste” rather than just “copy”) without running into odd inconsistency errors. This triggered of a wonderful comment trail of alternatives based on how large the volume might be, how relaxed the concurrency requirements might be, and so on.

A comment by SydOracle1 picked up on my failure to get Oracle working with the “as of SCN” syntax because it kept reporting ORA-08187 and suggested a straightforward use of the VERSIONS strategy. I thought it was quite surprising that this could work given that “as of SCN” didn’t, so I whipped up a quick test to check it – adding a couple of little refinements to the supplied sample – and it worked.

create table t1
as
select	object_id, object_name, owner
from	all_objects
;

create table t2
as
select	* from t1
where	rownum = 0
;

alter table t1 add constraint t1_pk primary key(object_id);

execute dbms_stats.gather_table_stats(user,'t1')

execute dbms_lock.sleep(5)

That dbms_lock.sleep() is very important for the purposes of this demonstration; it has to be just a few seconds otherwise the references back to earlier SCNs could report error: “ORA-01466: unable to read data – table definition has changed”. This is probably a side effect due to the 3 second interval in the capture that Oracle uses in the table smon_scn_time.

So now we do the following:

  • find the current SCN,
  • start a transaction,
  • get the transaction id,
  • delete the data from the source table,
  • find the current SCN again,
  • insert into the target table the data that was deleted by our transaction in the interval spanned by the two SCNs.

 


rem
rem     insert_delete_2.sql
rem     Jonathan Lewis
rem     Nov 2016
rem

column current_scn new_value m_start_scn 
select to_char(current_scn,'FM999999999999999999999') current_scn from v$database;

variable m_tx_id varchar2(20)
exec :m_tx_id := dbms_transaction.local_transaction_id(true)

column xid new_value m_xid
select xid from v$transaction where xidusn || '.' || xidslot || '.' || xidsqn = :m_tx_id;

delete from t1 where owner = 'SYSTEM';
commit;

column current_scn new_value m_end_scn format 999999999999999999
select to_char(current_scn,'FM999999999999999999999') current_scn from v$database;
 
insert	into t2
select	r.*
from	t1  versions between scn &m_start_scn and &m_end_scn r
where 
	versions_operation = 'D'
and	versions_xid = '&m_xid'
and	owner = 'SYSTEM'
;

commit;

There are a couple of variants on getting the transaction ID – I decided to use a function call to start a transaction without doing any work rather than doing the delete and then finding the transaction id that the delete initiated (I could have linked v$session for my SID to v$transaction after the delete). Because of the choice I made I have to do a little bit of messing around in the subsequent code – the function call returns the transaction ID in the form 31.16.19111 (that’s undo segment, slot number, sequence) but the VERSIONS mechanism wants a transaction ID in its HEX form which, for the example shown, would be ‘1F001000A74A0000’. I could have converted the three part form to the other using a messy bit of to_char(,’XXXXXXXX’) code, but I was feeling a little lazy.

To my surprise I didn’t see any ORA-08187 errors – which made me look back at the notes I had jotted down on the couple of tests I’d initially tried to find out what I had been doing wrong. My first attempt did the insert first then tried to do the delete “as of SCN” and failed, so my second attempt tried to do the delete first just in case the problem related to using “as of SCN” in the middle of a transaction:


column current_scn new_value m_scn

prompt  ============
prompt  Insert first
prompt  ============

select to_char(current_scn,'FM999999999999999999') current_scn from V$database;

insert into t2
select  *
from    t1 as of scn &m_scn r
where
        owner = 'SQLTXADMIN'
;

delete from t1 as of scn &m_scn r
where owner = 'SQLTXADMIN'
;

-- ORA-08171: snapshot expression not allowed here

rollback;

prompt  ============
prompt  Delete first
prompt  ============

select to_char(current_scn,'FM999999999999999999') current_scn from V$database;

delete from t1 as of scn &m_scn r
where owner = 'SQLTXADMIN'
;

-- ORA-08171: snapshot expression not allowed here

insert into t2
select  *
from    t1 as of scn &m_scn r
where
        owner = 'SQLTXADMIN'
;

rollback;

Clearly “delete as of scn” is illegal.

Of course, if I’d gone a little further with this idea I might have tried starting with a delete that didn’t use “as of SCN”, and then the code would have succeeded. In fact, though, this wouldn’t be a perfect solution because it would allow a window for error: some other session might delete a relevant row between my call for SCN and my delete, which means my insert would insert a row deleted by another user.

The code could be modified though in its choice of SCN. Provided I started my transaction with the delete I could then query v$transaction for the start SCN for the transaction, and use that as the “as of” SCN for the insert:


delete from t1
where owner = 'SQLTXADMIN'
;

select
        to_char(
                start_scnw * power(2,32) + start_scnb,
                'FM999999999999999999'
        )       current_scn
from
        v$transaction
where   ses_addr = (
                select  saddr
                from    v$session
                where   sid = (
                        select  sid
                        from    V$mystat
                        where   rownum = 1
                )
        )
;

insert into t2
select  *
from    t1 as of scn &m_scn r
where
        owner = 'SQLTXADMIN'
;

By using the delete to initiate the transaction and set the SCN I think we block any window of inconsistency and, apart from the messy little bit of code that finds the transaction entry, we have an even simpler piece of code than the example give by SydOracle.

Collaboration or, kicking ideas around, is a wonderful way to learn.

 


Delete/Insert

Tue, 2016-11-22 06:59

Many of the questions that appear on OTN are deceptively simple until you start thinking carefully about the implications; one such showed up a little while ago:

What i want to do is to delete rows from table where it matches condition upper(CATEGORY_DESCRIPTION) like ‘%BOOK%’.

At the same time i want these rows to be inserted into other table.

The first problem is this: how carefully does the requirement need to be stated before you can decide how to address it? Trying to imagine awkward scenarios, or boundary conditions, can help to clarify the issue.

If you delete before you insert, how do you find the data to insert ?

If you insert before you delete, what happens if someone updates a row you’ve copied so that it no longer matches the condition. Would it matter if the update changes the row in a way that leaves it matching the condition (what you’ve inserted is not totally consistent with what you’ve deleted).

If you insert before you delete, and someone executes some DML that makes another row match the requirement should you delete it (how do you avoid deleting it) or leave it in place.

Once you start kicking the problem about you’ll probably come to the conclusion that the requirement is for the delete and insert to be self-consistent – in other words what you delete has to be an exact match for what you insert as at the time you inserted it. You’ll ignore rows that come into scope in mid-process due to other activity, and you’ll have to stop people changing rows that are being transferred (in case there’s an audit trail that subsequently says that there was, at some point in time, a row that matched the condition but never arrived – and a row that has arrived that didn’t match the final condition of the rows that disappeared).

Somehow your code needs to lock the set of rows to be transferred and then transfer those rows and eliminate them. There are two “obvious” and simple strategies – readers are invited to propose others (or criticise the two I – or any of the comments – suggest). I’ll start with a simple data setup for testing:


create table t1
as
select  object_id, object_name, owner
from    all_objects
;

alter table t1 add constraint t1_pk primary key(object_id);

create table t2
as
select  * from t1
where   rownum = 0
;

execute dbms_stats.gather_table_stats(user,'t1')
execute dbms_stats.gather_table_stats(user,'t2')

Option 1:

The simplest approach is often the best – until, perhaps, you spot the drawbacks – do a basic delete of the data to be transferred (which handles the locking) but wrap the statement in a PL/SQL block that captures the data (using the returning clause) and then inserts it into the target table as efficiently as possible. With thanks to Andrew Sayer who prompted this blog post:

declare
        type t1_rows is table of t1%rowtype;
        t1_deleted t1_rows;

begin
        delete from t1 where owner = 'SYSTEM'
        returning object_id, object_name, owner bulk collect into t1_deleted;

        forall i in 1..t1_deleted.count
                insert into t2 values t1_deleted(i);

        commit;
end;
/

The drawback to this, of course, is that if the volume to be transferred is large (where “large” is probably a fairly subjective measure) then you might not want to risk the volume of memory (PGA) it takes to gather all the data with the bulk collect.

Option 2:

For large volumes of data we could reduce the threat to the PGA by gathering only the rowids of the rows to be transferred (locking the rows as we do so) then do the insert and delete based on the rowids:

declare
        type rid_type is table of rowid;
        r rid_type;

        cursor c1 is select rowid from t1 where owner = 'SYSTEM' for update;

begin
        open c1;
        fetch c1 bulk collect into r;
        close c1;

        forall i in 1..r.count
                insert into t2 select * from t1 where rowid = r(i);

        forall i in 1..r.count
                delete from t1 where rowid = r(i);

        commit;
end;
/

Note, particularly, the “for update” in the driving select.

Inevitably there is a drawback to this strategy as well (on top of the threat that the requirement for memory might still be very large even when the return set is restricted to just rowids). We visit the source data (possibly through a convenient index and avoid visiting the table, of course) to collect rowids; then we visit the data again by rowid (which is usually quite efficient) to copy it, then we visit it again (by rowid) to delete it. That’s potentially a significant increase in buffer cache activity (especially latching) over the simple “delete returning” strategy; moreover the first strategy gives Oracle the option to use the index-driven optimisation for maintaining indexes and this method doesn’t. You might note, by the way, that you could include an “order by rowid” clause on the select; depending on your data distribution and indexes this might reduce the volume of random I/O you have to do as Oracle re-visits the table for the inserts and deletes.

We can address the PGA threat, of course, by fetching the rowids with a limit:


declare
        type rid_type is table of rowid;
        r rid_type;

        cursor c1 is select rowid from t1 where owner = 'SYSTEM' for update;

begin
        open c1;

--      dbms_lock.sleep(60);

        loop
                fetch c1 bulk collect into r limit 5;

                forall i in 1..r.count
                        insert into t2 select * from t1 where rowid = r(i);

                forall i in 1..r.count
                        delete from t1 where rowid = r(i);

                exit when r.count != 5;
        end loop;

        close c1; 

        commit;
end;
/

One thing to be aware of is that even though we fetch the rowids in small batches we lock  all the relevant rows when we open the cursor, so we don’t run into the problem of inserting thousands of rows into t2 and then finding that the next batch we select from t1 has been changed or deleted by another session. (The commented out call to dbms_lock.sleep() was something I included as a way of checking that this claim was true.) This doesn’t stop us running into a locking (or deadlocking) problem, of course; if it takes us 10 seconds to lock 1M rows in our select for update another user might manage to lock our millionth row before we get there; if, a few seconds later, it then gets stuck in a TX/6 wait trying to lock one of our locked rows after we start waiting in a TX/6 wait for our millionth row our session will time out after 3 further seconds with an ORA-00060 deadlock error.

The limit of 5 is just for demonstration purposes, of course – there were 9 rows in all_objects that matched the select predicate; in a production system I’d probably raise the limit as high as 255 (which seems to be the limit of Oracle’s internal array-processing).

You’ll notice, of course, that we can’t use this limited fetch approach with the delete command – the entire delete would take place as we opened the equivalent cursor and, though we can use the bulk collect with the returning clause, there is no syntax that allows something like the fetch with limit to take place.

Discarded option

My first thought was to play around with the AS OF SCN clause.  Select the current SCN from v$database and then do things like delete “as of scn”, or “select for update as of scn” – there were ways of getting close, but invariably I ended up running into Oracle error: “ORA-08187: snapshot expression not allowed here”. But maybe someone else can come up with a way of doing this that doesn’t add significant overheads and doesn’t allow for inconsistent results.


Distributed Trap

Mon, 2016-11-14 07:19

Here’s an interesting (and potentially very useful) observation from an OTN database forum thread that appeared at the end of last week. It concerns the problems of pulling data from remote systems, and I’ll start by building some data:

rem
rem     Script:         remote_insert_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2016
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 50000
;

alter table t1 add constraint t1_pk primary key(id);

create table t2
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        trunc(sysdate) + rownum d1,
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',50)            padding
from
        generator
where
        rownum <= 500
;

alter table t2 add constraint t2_pk primary key(id);

create table t3
as
select sysdate d1, t1.* from t1
where rownum = 0
;

--  Now gather stats if you need to (depending on version)

I’ve created three tables. Table t3 is an empty copy of table t1 with a date column added, and t2 is some sort of reference table that looks rather like table t1 but has a lot less data. Now I’m going to pretend that t1 and t2 are in a remote database while t3 is in the local database and copy data from t1 to t3, omitting any data that is referenced in t2. The SQL is simple:


define m_target=test@loopback

insert into t3(
        id, small_vc, padding
)
select
        t1.id, t1.small_vc, t1.padding
from
        t1@&m_target    t1
where
        t1.id not in (
                select t2.id from t2@&m_target
        )
;

----------------------------------------------------------------------
| Id  | Operation                | Name | Cost (%CPU)| Inst   |IN-OUT|
----------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |     0   (0)|        |      |
|   1 |  LOAD TABLE CONVENTIONAL | T3   |            |        |      |
|   2 |   REMOTE                 |      |            |   TEST | R->S |
----------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------
   2 - EXPLAIN PLAN INTO PLAN_TABLE@! FOR SELECT
       "A1"."ID","A1"."SMALL_VC","A1"."PADDING" FROM "T1" "A1" WHERE
       "A1"."ID"<>ALL (SELECT "A2"."ID" FROM "T2" "A2") (accessing
       'TEST.LOCALDOMAIN@LOOPBACK' )

I’ve set up an SQL*Plus substitution variable to hold a database link name (and used a loopback qualifier so that I can pretend t1 and t2 are in a remote database. The execution plan (taken from an explain plan, but confirmed by running the query and calling dbms_xplan.display_cursor) shows that Oracle has executed select part of the insert as a “fully remote” statement – which is nice.

Unfortunately I forgot to include a datestamp as I copied the data over. So let’s modify the query to do that:


insert into t3(
        d1,
        id, small_vc, padding
)
select
        sysdate,
        t1.id, t1.small_vc, t1.padding
from
        t1@&m_target    t1
where
        t1.id not in (
                select t2.id from t2@&m_target
        )
;

--------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |      |      1 |        |   123 (100)|      0 |00:00:27.42 |   10908 |
|   1 |  LOAD TABLE CONVENTIONAL |      |      1 |        |            |      0 |00:00:27.42 |   10908 |
|*  2 |   FILTER                 |      |      1 |        |            |  49500 |00:00:26.51 |       0 |
|   3 |    REMOTE                | T1   |      1 |  50000 |   113   (3)|  50000 |00:00:00.33 |       0 |
|   4 |    REMOTE                | T2   |  50000 |      1 |     0   (0)|    500 |00:00:25.29 |       0 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( NOT EXISTS (SELECT 0 FROM  "T2" WHERE "T2"."ID"=:B1))

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT /*+ OPAQUE_TRANSFORM */ "ID","SMALL_VC","PADDING" FROM "T1" "T1"
       (accessing 'TEST.LOCALDOMAIN@LOOPBACK' )

   4 - SELECT /*+ OPAQUE_TRANSFORM */ "ID" FROM "T2" "T2" WHERE "ID"=:1 (accessing
       'TEST.LOCALDOMAIN@LOOPBACK' )

Whoops, the plan just changed – it also took 27.4 seconds instead of the 1.1 seconds that it used to – that’s because of the 50,000 remote calls to execute a remote query for the subquery filter. The addition of sysdate (which is the local sysdate@!) to the select list has made the select statement distributed instead of fully remote, and the query for a CTAS or “insert/select” has to be driven from the local site if it’s a distributed query.

Okay, plan (b), don’t insert sysdate, add it to the table as a default:


alter table t3 modify d1 default sysdate;

This doesn’t help; even though the query doesn’t mention sysdate explicitly the query is still treated as disrtibuted query.

Okay, plan (c) – don’t insert sysdate, add a trigger to the table:

alter table t3 modify d1 default null;

create or replace trigger t3_bri
before insert on t3
for each row
begin
        :new.d1 := sysdate;
end;
/


And this works (in 11.2.0.4 and 12.1.0.2, at least, which are the versions I tested).

I could have made the trigger a little more sophisticated, of course, but the point of the post was simply to demonstrate a problem and a simple workaround.

There are probably several other commonly used features (various sys_context() calls, perhaps) that have the same effect.

 


Reorg

Mon, 2016-11-07 11:31

A current question on the OTN database forum asks: “What’s the difference between object and tablespace reorganization?” Here’s an analogy to address the question.

I have three crates of Guiness in the boot (trunk) of my car, one crate has 4 bottles left, one has 7 bottles left and one has 2 bottles. I also have two cases of Louis Roederer Brut NV champagne, one case has 2 bottles left and one has only one. (I have two objects in my tablespace – one of type Beer, one of type Champagne – and my boot requires manual free space management .)

I move all the Guiness bottles into a single crate and all the champagne bottles into a single case. That’s a couple of “shrink space compact” calls – I’ve re-organised the objects to get all the bottles in each object close to each other, but the crates are still taking up space in the boot.

I take the two empty crates and the empty case out of the boot. That’s a couple of “resize” (or “shrink space” without “compact”) calls that free up space in the boot.

I now want to load a port barrel into car, but it won’t fit until I slide the remaining beer crate and champagne case together at one side of the boot. That’s a couple of “move” commands that have reorganized the boot (tablespace) to make the free space usable.

 


Filter Subquery

Mon, 2016-11-07 07:04

There’s a current thread on the OTN database forum showing an execution plan with a slightly unusual feature. It looks like this:

-----------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                |  Name                          | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |  
-----------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                         |                                |   137K|    27M|       |   134K|       |       |  
|*  1 |  HASH JOIN                               |                                |   137K|    27M|    27M|   134K|       |       |  
|*  2 |   HASH JOIN                              |                                |   140K|    26M|  1293M|   133K|       |       |  
|   3 |    TABLE ACCESS FULL                     | PDTCOST_CHARGE_MAP             |    30M|   948M|       | 24044 |       |       |  
|*  4 |    HASH JOIN                             |                                |    11M|  1837M|   810M| 57206 |       |       |  
|   5 |     INDEX FAST FULL SCAN                 | PDTCOST_BILL_INV_TRACK         |    29M|   475M|       | 16107 |       |       |  
|*  6 |     TABLE ACCESS BY LOCAL INDEX ROWID    | BILL_INVOICE_DETAIL            |  5840K|   478M|       |     2 |       |       |  
|   7 |      NESTED LOOPS                        |                                |    11M|  1634M|       |     6 |       |       |  
|   8 |       NESTED LOOPS                       |                                |     2 |   120 |       |     3 |       |       |  
|   9 |        TABLE ACCESS FULL                 | JDL_WORK_LIST                  |     2 |    96 |       |     2 |       |       |  
|  10 |        PARTITION RANGE ITERATOR          |                                |       |       |       |       |   KEY |   KEY |  
|  11 |         TABLE ACCESS BY LOCAL INDEX ROWID| BILL_INVOICE                   |     1 |    12 |       |     1 |   KEY |   KEY |  
|* 12 |          INDEX UNIQUE SCAN               | BILL_INVOICE_XSUM_BILL_REF_NO  |     1 |       |       |       |   KEY |   KEY |  
|  13 |       PARTITION RANGE ITERATOR           |                                |       |       |       |       |   KEY |   KEY |  
|* 14 |        INDEX RANGE SCAN                  | BILL_INVOICE_DETAIL_PK         |    32 |       |       |     1 |   KEY |   KEY |  
|  15 |    SORT AGGREGATE                        |                                |     1 |     8 |       |       |       |       |  
|  16 |     INDEX FAST FULL SCAN                 | PDTCOST_CHARGE_MAP_PK          |    30M|   229M|       | 17498 |       |       |  
|  17 |   INDEX FAST FULL SCAN                   | SERVICE_EMF_CONF_SUBSCR        |  1660K|    19M|       |   575 |       |       |  
-----------------------------------------------------------------------------------------------------------------------------------  

Spot the oddity ? If not, here’s a collapsed version of the plan that makes it easier to see – if you were viewing this plan through OEM or one of the other GUI interfaces to execution plans you’d probably be able to do this by clicking on some sort of  “+/-”  symbol by operation 4:

-----------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                |  Name                          | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |  
-----------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                         |                                |   137K|    27M|       |   134K|       |       |  
|*  1 |  HASH JOIN                               |                                |   137K|    27M|    27M|   134K|       |       |  
|*  2 |   HASH JOIN                              |                                |   140K|    26M|  1293M|   133K|       |       |  
|   3 |    TABLE ACCESS FULL                     | PDTCOST_CHARGE_MAP             |    30M|   948M|       | 24044 |       |       |  
|*  4 |    HASH JOIN                             |                                |    11M|  1837M|   810M| 57206 |       |       |  
|  15 |    SORT AGGREGATE                        |                                |     1 |     8 |       |       |       |       |  
|  16 |     INDEX FAST FULL SCAN                 | PDTCOST_CHARGE_MAP_PK          |    30M|   229M|       | 17498 |       |       |  
|  17 |   INDEX FAST FULL SCAN                   | SERVICE_EMF_CONF_SUBSCR        |  1660K|    19M|       |   575 |       |       |  
-----------------------------------------------------------------------------------------------------------------------------------  

How often have you seen a HASH JOIN (operation 2) with three child operations (3, 4, 15) ?

It’s not a formatting error – but since I’ve shown neither the Predicate section of the report nor the original query it’s a little difficult to recognise what’s going on, so here’s the critical part of the original WHERE clause:

AND     P.TRACKING_ID      = PCM.TRACKING_ID  
AND     P.TRACKING_ID_SERV = PCM.TRACKING_ID_SERV  
AND     (   (P.BILLING_INACTIVE_DT IS NULL AND PCM.INACTIVE_DT IS NULL)  
         OR (PCM.ACTIVE_DT = (SELECT MAX(ACTIVE_DT) FROM PDTCOST_CHARGE_MAP PCM1 ))
        )
;  

Operation 4 produces a set of rows derived by joining table P (an alias for pdtcost) to a couple of other tables, and operation 2 joins this to PCM (an alias for pdtcost_change_map) with a simple two-column equality and then introduces a pair of problems: first an “OR SUBQUERY” construct, secondly a predicate that requires data from both tables to be examined before any more rows can be discarded.

Just to clarify the performance implication of this combination of predicates:

If we start from pdtcost (p):

  • If the billing_inactive_dt is null we don’t discard it because satisfires a predicate and we need to check the matching pcm.inactive_dt.
  • If the billing_inactive_dt is NOT null we still can’t discard it because the matching pcm.active_dt may satisfy the subquery predicate.
  • Whatever the state of billing_inactive_dt we have to find the matching pcm row(s)

Starting from pdtcost_charge_map (pcm):

  • We can’t unnest the subquery and use it to drive into p (because of the OR), so we have to scan pcm to apply the subquery.
  • If the active_dt satisfies the subquery we have to find the matching p row.
  • If the active_dt doesn’t satisfy the subquery but pcm_inactive_dt is null we still have to find the matching p row to check the billing_dt.
  • The only time we don’t need to probe p for a match is if the active_dt doesn’t match the subquery and the inactive_dt is not null – which tells us that for a very specific data pattern we have the potential for a (relatively) efficient access path; however this path would require the optimizer to test one part of an OR predicate at one operation in the plan and the second part of the OR predicate at a different operation of the plan and it’s not programmed to do that, so the entire compound predicate test is always run late.

Returning to the question of interpreting this plan with three child operations for a hash join – what does it mean and how does it work ? In effect the plan is the wrong shape – it has concealed a filter operation.  As the join between the two tables takes place the rows are tested against the simple filter condition – each row that satisfies the predicate is passed to the next operation of the plan; for any row doesn’t satisfy the simple filter predicate the subquery is executed to provide a check against active_dt (fortunately, since this is a “constant” subquery we benefit enormously from scalar subquery caching and the subquery will run a most once in the lifetime of the whole query.)

The plan would probably be easier to understand if it looked like this (which may actually be how it would have looked in Oracle 8i):

-----------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                |  Name                          | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |  
-----------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                         |                                |   137K|    27M|       |   134K|       |       |  
|*  1 |  HASH JOIN                               |                                |   137K|    27M|    27M|   134K|       |       |  
|*  2a|   FILTER                                 |                                |   140K|    26M|  1293M|   133K|       |       |  
|*  2b|    HASH JOIN                             |                                |   140K|    26M|  1293M|   133K|       |       |  
|   3 |     TABLE ACCESS FULL                    | PDTCOST_CHARGE_MAP             |    30M|   948M|       | 24044 |       |       |  
|*  4 |     HASH JOIN                            |                                |    11M|  1837M|   810M| 57206 |       |       |  
|  15 |    SORT AGGREGATE                        |                                |     1 |     8 |       |       |       |       |  
|  16 |     INDEX FAST FULL SCAN                 | PDTCOST_CHARGE_MAP_PK          |    30M|   229M|       | 17498 |       |       |  
|  17 |   INDEX FAST FULL SCAN                   | SERVICE_EMF_CONF_SUBSCR        |  1660K|    19M|       |   575 |       |       |  
-----------------------------------------------------------------------------------------------------------------------------------  
 

Predicate Information (identified by operation id):
---------------------------------------------------
   2a - filter("P"."BILLING_INACTIVE_DT" IS NULL AND "PCM"."INACTIVE_DT" IS NULL
               OR "PCM"."ACTIVE_DT"= (SELECT MAX("ACTIVE_DT") FROM "PDTCOST_CHARGE_MAP" "PCM1"))

   2b - access("P"."TRACKING_ID"="PCM"."TRACKING_ID" AND
               "P"."TRACKING_ID_SERV"="PCM"."TRACKING_ID_SERV")

This modified plan makes it clear that the hash join (2b) is followed by execution of the filter (2a) subquery (though we can safely infer that the subquery runs only for join rows where at least one of p.billing_inactive_dt or pcm.inactive_dt is not null).

You might wonder whether Oracle actually runs the subquery once at a very early point in the query so that it can, effectively, turn the subquery predicate into “active_dt = {derived constant}” – it’s fairly easy to show that this isn’t the case. Perhaps the most obvious way to do this is to run the query with rowsource execution stats enabled after setting billing_inactive_dt and inactive_dt to null for every row in their respective tables – because if you do that the subquery won’t be run at all.

If you want to experiment with this problem, here’s some code to model it:


drop table pdtcost purge;
drop table pdtcost_charge_map purge;

create table pdtcost
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        mod(rownum,100)                 filter_col,
        rownum                          tracking_id,
        rownum                          tracking_id_serv,
        decode(
                mod(rownum,97),
                0 , trunc(sysdate),
                    null
        )                               billing_inactive_dt,
/*
        to_date(null)                   billing_inactive_dt,
*/
        lpad('x',100,'x')               padding
from
        generator       v2
where
        rownum <= 1e4
;

alter table pdtcost add constraint pdt_pk primary key(tracking_id, tracking_id_serv);

create table pdtcost_charge_map
nologging
as
with generator as (
        select
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
        rownum                          tracking_id,
        rownum                          tracking_id_serv,
        decode(
                mod(rownum,93), 
                0 , trunc(sysdate),
                    null
        )                               inactive_dt,
/*
        to_date(null)                   inactive_dt,    
*/
        trunc(sysdate + dbms_random.value(-100,0))      active_dt,
        lpad('x',100,'x')               padding
from
        generator       v2
where
        rownum <= 1e4
;

alter table pdtcost_charge_map add constraint pcm_pk primary key(tracking_id, tracking_id_serv, active_dt);
-- create index pcm_act_dt on pdtcost_charge_map(active_dt);

-- gather basic table stats if your version needs to.

select
        p.billing_inactive_dt,
        pcm.inactive_dt,
        pcm.active_dt
from
        pdtcost                 p,
        pdtcost_charge_map      pcm
where
        p.filter_col = 0
and     p.tracking_id      = pcm.tracking_id
and     p.tracking_id_serv = pcm.tracking_id_serv
and     (   (p.billing_inactive_dt is null and pcm.inactive_dt is null)
         or (pcm.active_dt = (select max(active_dt) from pdtcost_charge_map pcm1 ))
        )
;

The original question started with a table of 30 million rows and a result set of only 450 rows – suggesting that there ought to be a lot of scope for finding ways to eliminate data early. One possibility, assuming the appropriate indexes exist (which is why I have defined, but commented out, the pcm_act_dt index above), is to convert this query into a union all (taking care to eliminate duplication in the result set) in the following way:

select
        /*+ leading(p pcm) use_nl(pcm) */
        p.billing_inactive_dt,
        pcm.inactive_dt,
        pcm.active_dt
from
        pdtcost                 p,
        pdtcost_charge_map      pcm
where
        p.filter_col = 0
and     pcm.tracking_id      = p.tracking_id
and     pcm.tracking_id_serv = p.tracking_id_serv
and     (p.billing_inactive_dt is null and pcm.inactive_dt is null)
union all
select
        /*+ leading(p pcm) use_nl(pcm) */
        p.billing_inactive_dt,
        pcm.inactive_dt,
        pcm.active_dt
from
        pdtcost                 p,
        pdtcost_charge_map      pcm
where   
        p.filter_col = 0
and     pcm.tracking_id      = p.tracking_id   
and     pcm.tracking_id_serv = p.tracking_id_serv   
and     (p.billing_inactive_dt is not null or pcm.inactive_dt is not null)
and     pcm.active_dt = (select /*+ unnest */ max(active_dt) from pdtcost_charge_map pcm1)
;

Here is the resulting execution plan when the pcm_act_dt index exists. I had to hint the table order and join mechanism because my tables were rather small and the selectivity relatively high – it’s probably safe to assume that selectivities are much better on the original data set and that a path like this is more likely to be chosen unhinted (the full tablescan on pdtcost is irrelevant in the context of the demonstration):


---------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                    |      1 |        |     98 |00:00:00.06 |     657 |
|   1 |  UNION-ALL                     |                    |      1 |        |     98 |00:00:00.06 |     657 |
|   2 |   NESTED LOOPS                 |                    |      1 |     99 |     98 |00:00:00.05 |     386 |
|   3 |    NESTED LOOPS                |                    |      1 |     99 |     99 |00:00:00.05 |     287 |
|*  4 |     TABLE ACCESS FULL          | PDTCOST            |      1 |     99 |     99 |00:00:00.01 |     173 |
|*  5 |     INDEX RANGE SCAN           | PCM_PK             |     99 |      1 |     99 |00:00:00.01 |     114 |
|*  6 |    TABLE ACCESS BY INDEX ROWID | PDTCOST_CHARGE_MAP |     99 |      1 |     98 |00:00:00.01 |      99 |
|   7 |   NESTED LOOPS                 |                    |      1 |      2 |      0 |00:00:00.01 |     271 |
|   8 |    NESTED LOOPS                |                    |      1 |    100 |      1 |00:00:00.01 |     270 |
|*  9 |     TABLE ACCESS FULL          | PDTCOST            |      1 |    100 |    100 |00:00:00.01 |     166 |
|* 10 |     INDEX UNIQUE SCAN          | PCM_PK             |    100 |      1 |      1 |00:00:00.01 |     104 |
|  11 |      SORT AGGREGATE            |                    |      1 |      1 |      1 |00:00:00.01 |       2 |
|  12 |       INDEX FULL SCAN (MIN/MAX)| PCM_ACT_DT         |      1 |      1 |      1 |00:00:00.01 |       2 |
|* 13 |    TABLE ACCESS BY INDEX ROWID | PDTCOST_CHARGE_MAP |      1 |      1 |      0 |00:00:00.01 |       1 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("P"."FILTER_COL"=0 AND "P"."BILLING_INACTIVE_DT" IS NULL))
   5 - access("PCM"."TRACKING_ID"="P"."TRACKING_ID" AND
              "PCM"."TRACKING_ID_SERV"="P"."TRACKING_ID_SERV")
   6 - filter("PCM"."INACTIVE_DT" IS NULL)
   9 - filter("P"."FILTER_COL"=0)
  10 - access("PCM"."TRACKING_ID"="P"."TRACKING_ID" AND
              "PCM"."TRACKING_ID_SERV"="P"."TRACKING_ID_SERV" AND "PCM"."ACTIVE_DT"=)
  13 - filter(("P"."BILLING_INACTIVE_DT" IS NOT NULL OR "PCM"."INACTIVE_DT" IS NOT NULL))


You’ll notice that this plan also displays an interesting little quirk – at operation 10 we can see the index unique scan of index pcm_act_dt that occurs once for each row returned from pdtcost; but each unique scan is preceded by a call to run the subquery (except that scalar subquery caching means the subquery runs only once in total) to supply a value for active_dt that can be used in the unique scan. (In the absence of the pcm_act_dt index the full scan min/max would be a fast full scan of the primary key.)

With a little luck the OP will be able to apply the same strategy to his query, though it may be a little harder to get the desired plan since the original query includes 6 tables; but the principle doesn’t change.

Footnote:

various people on the OTN thread have pointed out that there are some odd details about the optimizers cardinality predictions which may mean that part of the problem is simply an issue of misleading (possibly out of date) object statistics. It’s possible that with better estimates the optimizer may change the plan so much that even the strategy of getting all the rows from pdtcost_charge_map related to the rows acquired from pdtcost and then eliminating based on a late filter may be efficient enough for the OP.  By changing the data volume and distribution in my test case one of the plans (which predicted 100 rows from 100,000) was as follows:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                    |      1 |        |     98 |00:00:01.58 |    2307 |
|   1 |  NESTED LOOPS                |                    |      1 |     98 |     98 |00:00:01.58 |    2307 |
|   2 |   NESTED LOOPS               |                    |      1 |    100 |    100 |00:00:00.01 |    1814 |
|*  3 |    TABLE ACCESS FULL         | PDTCOST            |      1 |    100 |    100 |00:00:00.01 |    1699 |
|*  4 |    INDEX RANGE SCAN          | PCM_PK             |    100 |      1 |    100 |00:00:00.01 |     115 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| PDTCOST_CHARGE_MAP |    100 |      1 |     98 |00:00:01.57 |     493 |
|   6 |    SORT AGGREGATE            |                    |      1 |      1 |      1 |00:00:01.57 |     393 |
|   7 |     INDEX FAST FULL SCAN     | PCM_PK             |      1 |    100K|    100K|00:00:01.02 |     393 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("P"."FILTER_COL"=0)
   4 - access("P"."TRACKING_ID"="PCM"."TRACKING_ID" AND
              "P"."TRACKING_ID_SERV"="PCM"."TRACKING_ID_SERV")
   5 - filter((("P"."BILLING_INACTIVE_DT" IS NULL AND "PCM"."INACTIVE_DT" IS NULL) OR
              "PCM"."ACTIVE_DT"=))

So Long ACED

Mon, 2016-10-31 14:53

… and thanks for all the fish.

Today I removed myself from the OTN ACE program. This isn’t a reflection on the anything to do with the ACE program – quite the reverse, in fact – it’s because they’re introducing steps to ensure that the ACE Directors can justify their titles. Unfortunately, as anyone who has gone through (e.g.) ISO 9001 certification can tell you, quality assurance tends to translate into paperwork and ticking boxes – and while I can always find time to write up some interesting feature of Oracle I really find it hard to prioritise time for filling in forms.

In the last 4 months I’ve failed to file my monthly list of relevant activities twice, failed to request funding for two of the international conferences I’ve spoken at, and failed to submit claims against the two for which I had requested and received funding approval – so there really was no hope of me being motivated to collect all the extra details that the new regime requires.

So, best wishes to the ACE program – I’m still happy to do what I’ve been doing for the last 25+ years, and I’ll still be meeting up with ACEDs, but I’ll just be wearing one label less as I do it.


Anniversary OICA

Mon, 2016-10-24 07:00

Happy anniversary to me!

On this day 10 years ago I published the first article in my blog. It was about the parameter optimizer_index_cost_adj (hence OICA), a parameter that has been a  source of many performance problems and baffled DBAs over the years and, if you read my first blog posting and follow the links, a parameter that should almost certainly be left untouched.

It seems appropriate to mention it today because I recently found a blog posting (dated 3rd May 2013) on the official Oracle Blogs where the director for Primavera advises setting this parameter to 1 (and the optimizer_index_caching parameter to 90) for the Primavera P6 OLTP (PMDB) database. The recommendation is followed by a fairly typical “don’t blame me” warning, viz: “As with any changes that affect query optimization, it is paramount to TEST, TEST and TEST again. At least these settings are easily adjusted or change back to the original value”.

Here’s a thought, though: setting the optimizer_index_cost_adj to the extreme value 1 is a catastrophic change so don’t suggest it unless you are extremely confident that it’s almost certain to be the right thing to do. If you’re confident that it’s a good idea to reduce the parameter to a much smaller value than the default then suggest a range of values that varies from “ideal if it works, but high risk” to “low risk and mostly helpful”. Maybe a suggestion like: “Primavera P6 OLTP (PMDB) tends to work best with this parameter set to a value in the range of 1 to 15” would be a more appropriate comment from someone in a position of authority.

Here’s another thought: if you work for Oracle you could always contact the optimizer group to present them with your argument for the strategy and see what they think about it. Then you can include their opinion when you offer your suggestion.

For what it’s worth, here’s my opinion: as a general rule you shouldn’t be working around performance issues by fiddling with the optimizer_index_cost_adj; as a specific directive do not set it to 1. If you want to encourage Oracle to be enthusiastic about indexes in general then adjust the system statistics (preferably with a degree of truth). If you need to persuade Oracle that particular indexes are highly desirable than you can use dbms_stats.set_index_stats() to adjust the clustering_factor (and avg_data_blocks_per_key) of those indexes. If you are running 11.2.0.4 or later then you can use dbms_stats.set_table_prefs() to set the “table_cached_blocks” parameter for tables where you think Oracle should be particularly keen on using indexes but isn’t; and if your queries are suffering from bad cardinality estimates because of a pattern of multi-column filter predicates create some column group (extended) statistics.

Why am I so firmly set against setting the optimizer_index_cost_adj to 1 ? Because it doesn’t tell Oracle to “use indexes instead of doing tablescans”, it tells Oracle that every index is just about as good as every other index for almost any query. Here’s a pdf file of an article (formerly published on DBAZine and then on my old website) I wrote over twelve years ago explaining the issue. Various links in the article no longer work, and the data pattern was generated to display the problem in 8i and 9i and you would need to modify the data to display the same effect in newer versions of Oracle – but the principle remains the same.

If you would like to see a slightly newer example of how the parameter causes problems. Here’s a thread dated April 2012 from the OTN database forum where a SYS-recursive query caused a performance problem because the parameter was set 1.

 


Conjuctive Normal Form

Thu, 2016-10-20 07:00

I recently tweeted about a comment I’d picked up at the Trivadis performance days regarding tablescans and performance.

“If you can write your SQL in conjunctive normal form it can help the optimizer to offload more predicates”

Inevitably someone asked me if I had an example to demonstrate this – I didn’t, and still don’t really, but here’s an interesting demo based on an example from the Oracle In-Memory blog showing how the optimizer will rearrange your filter predicates before passing them to the tablescan code for evaluation against an inmemory table.


rem
rem     Script:         in_memory_conjunctive.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2016
rem     Purpose:
rem
rem     Last tested
rem             12.1.0.2
rem

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                          id,
        trunc(dbms_random.value(1,501)) qty,
        mod(rownum,200) + 1             part_no,
        lpad(rownum,10,'0')             v1,
        lpad('x',50,'x')                padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e7
;
prompt  ==========
prompt  Base query
prompt  ==========

select
        count(v1)
from
        t1
where
        (qty > 495 or (qty < 3 and part_no = 50))
;
prompt  ===============
prompt  predicate added
prompt  ===============

select
        count(v1)
from
        t1
where
        (qty > 495 or qty < 3) and (qty > 495 or (qty < 3 and part_no = 50))
;
prompt  =================
prompt  Ordered predicate
prompt  =================

select  /*+ ordered_predicates */
        count(v1)
from
        t1
where
        (qty > 495 or qty < 3) and (qty > 495 or (qty < 3 and part_no = 50))
;

The 2nd and 3rd queries add a predicate to the first query – which, unfortunately, changes the estimated cardinality even though it has no effect on the result. This predicate is one that would be added by the inmemory code path if the table were declared to be inmemory. I’ve got two versions of the query, one with the (deprecated) ordered_predicates hint because in my initial tests the optimizer swapped the order of the predicates and I wanted to see if the ordering was at all critical.

Here’s the plan for the base query – first before declaring the table inmemory, then after declaring the table inmemory:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14739 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   100K|  1862K| 14739   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))
------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |       |       |  1974 (100)|          |
|   1 |  SORT AGGREGATE             |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS INMEMORY FULL| T1   |   100K|  1862K|  1974  (44)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - inmemory((("QTY">495 OR "QTY"<3) AND ("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))) filter(("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))

And here, after putting the table back to no inmemory are the plans for the second and third queries; note, particularly the different order of the predicates in the predicate section: the predicate order matches the inmemory predicate order only if I use the ordered_predicates hint:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14741 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1404 | 26676 | 14741   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)) AND ("QTY">495
              OR "QTY"<3)))
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14741 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1404 | 26676 | 14741   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((("QTY">495 OR "QTY"<3) AND ("QTY">495 OR ("QTY"<3 AND
              "PART_NO"=50))))

Finally the run times – after running the queries a few times each to check for consistency:

  • Base query: 0.82 seconds
  • Query with extra predicate: 0.86 seconds
  • Query with extra predicate and forced order of predicate evaluation: 0.71 seconds

The query with the predicate arrangement matching the inmemory rewrite actually ran 13% faster than the original. Unfortunatly the rewrite without the ordered_predicates hint ran slower – which is a bit of a shame but understandable – the first predicate is the more complex, and then the code has to run a completely redundant second predicate; I was a little surprised at how much slower it was, but the table is 10M rows and we’re only looking at sub-second times anyway.

My table was fully cached and just under 112,000 blocks, so not very large, and this was running a serial query on a basic Oracle instance. Nevetheless there is a difference in execution time that is more than just “random noise” – If this is an indication of how a little unsightly tweaking of SQL for small data sets can make a difference, you can imagine that there might be a worthwhile benefit to considering ways of tweaking your predicates that make a significant difference to execution time if the extra predicates end up being pushed down to storage on an Exadata machine.

Footnote:

Another “not quite” example I happen to have written about a few months ago is a case where rewriting “not exists() OR not exists() OR not exists()” as “not (exists() AND exists() AND exists())” allowed Oracle to rewrite three subqueries as a single subquery with three-table join.

 


Fixed Stats

Mon, 2016-10-17 06:43

There are quite a lot of systems around the world that aren’t using the AWR (automatic workload repository) and ASH (active session history) tools to help them with trouble shooting because of the licensing requirement – so I’m still finding plenty of sites that are using Statspack and I recently came across a little oddity at one of these sites that I hadn’t noticed before: one of the Statspack snapshot statements was appearing fairly regularly in the Statspack report under the “SQL Ordered by Elapsed Time” section – even when the application had been rather busy and had generated lots of other work that was being reported. It was the following statement – the collection of file-level statistics:


select
       ts.name      tsname
     , df.name      filename
     , fs.phyrds
     , fs.phywrts
     , fs.readtim
     , fs.writetim
     , fs.singleblkrds
     , fs.phyblkrd
     , fs.phyblkwrt
     , fs.singleblkrdtim
     , fw.count     wait_count
     , fw.time      time
     , df.file#
  from x$kcbfwait   fw
     , v$filestat   fs
     , v$tablespace ts
     , v$datafile   df
 where ts.ts#    = df.ts#
   and fs.file#  = df.file#
   and fw.indx+1 = df.file#
;

The execution plan didn’t look very friendly, and the volume of I/O it generated (several hundred thousand disk reads) was surprising. The reason why the statement stood out so much in this case was that there was a fairly large number of files in the database (over 1,000) and the default execution plan was showing very bad cardinality estimates that resulted in highly inappropriate cartesian merge joins. At best the statement was taking around 2 minutes to run, at worst it was much, much worse.

This system was running 10g – also something which is still fairly common, though becoming much scarcer – which produced the following execution plan (which  I’ve recreated on a much smaller system):

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     1 |   535 |     2 (100)| 00:00:01 |
|*  1 |  HASH JOIN                    |                 |     1 |   535 |     2 (100)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN        |                 |     5 |  1190 |     0   (0)| 00:00:01 |
|   3 |    NESTED LOOPS               |                 |     1 |   199 |     0   (0)| 00:00:01 |
|   4 |     MERGE JOIN CARTESIAN      |                 |     1 |   173 |     0   (0)| 00:00:01 |
|*  5 |      FIXED TABLE FULL         | X$KCCTS         |     1 |    43 |     0   (0)| 00:00:01 |
|   6 |      BUFFER SORT              |                 |     1 |   130 |     0   (0)| 00:00:01 |
|*  7 |       FIXED TABLE FULL        | X$KCFIO         |     1 |   130 |     0   (0)| 00:00:01 |
|*  8 |     FIXED TABLE FIXED INDEX   | X$KCCFE (ind:1) |     1 |    26 |     0   (0)| 00:00:01 |
|   9 |    BUFFER SORT                |                 |   100 |  3900 |     0   (0)| 00:00:01 |
|  10 |     FIXED TABLE FULL          | X$KCBFWAIT      |   100 |  3900 |     0   (0)| 00:00:01 |
|* 11 |   VIEW                        | GV$DATAFILE     |     1 |   297 |     1 (100)| 00:00:01 |
|  12 |    SORT ORDER BY              |                 |     1 |   957 |     1 (100)| 00:00:01 |
|  13 |     NESTED LOOPS              |                 |     1 |   957 |     0   (0)| 00:00:01 |
|  14 |      NESTED LOOPS             |                 |     1 |   647 |     0   (0)| 00:00:01 |
|  15 |       NESTED LOOPS            |                 |     1 |   371 |     0   (0)| 00:00:01 |
|* 16 |        FIXED TABLE FULL       | X$KCCFN         |     1 |   323 |     0   (0)| 00:00:01 |
|* 17 |        FIXED TABLE FIXED INDEX| X$KCVFH (ind:1) |     1 |    48 |     0   (0)| 00:00:01 |
|* 18 |       FIXED TABLE FIXED INDEX | X$KCCFE (ind:1) |     1 |   276 |     0   (0)| 00:00:01 |
|* 19 |      FIXED TABLE FULL         | X$KCCFN         |     1 |   310 |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TSTSN"="TS#" AND "K"."KCFIOFNO"="FILE#" AND "FILE#"="FW"."INDX"+1)
   5 - filter("TSTSN"<>(-1) AND "INST_ID"=USERENV('INSTANCE'))
   7 - filter("K"."INST_ID"=USERENV('INSTANCE'))
   8 - filter("F"."FEDUP"<>0 AND "F"."FENUM"="K"."KCFIOFNO")
  11 - filter("INST_ID"=USERENV('INSTANCE'))
  16 - filter("FN"."FNNAM" IS NOT NULL AND "FN"."FNTYP"=4 AND BITAND("FN"."FNFLG",4)<>4)
  17 - filter("FN"."FNFNO"="FH"."HXFIL")
  18 - filter("FE"."FEDUP"<>0 AND "FN"."FNFNO"="FE"."FENUM" AND
              "FE"."FEFNH"="FN"."FNNUM" AND "FE"."FETSN"<>(-1))
  19 - filter("FE"."FEPAX"<>65535 AND "FE"."FEPAX"<>0 AND "FE"."FEPAX"="FNAUX"."FNNUM"
              OR ("FE"."FEPAX"=0 OR "FE"."FEPAX"=65535) AND "FE"."FENUM"="FNAUX"."FNFNO" AND
              "FNAUX"."FNTYP"=4 AND "FNAUX"."FNNAM" IS NOT NULL AND BITAND("FNAUX"."FNFLG",4)<>4 AND
              "FE"."FEFNH"="FNAUX"."FNNUM")

Note particularly the two Cartesian merge joins and the very late filter at operation 1.

Note also the number of times the cardinality estimate is 1 – always a bit of a threat when the query gets complicated: “anything goes following a one for Rows”.

The easy (first thought) solution was simply to gather stats on all the fixed objects in this query:


begin
        dbms_stats.gather_table_stats('sys','x$kcbfwait',method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kccfe',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kccfn',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kccts',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kcfio',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kcvfh',   method_opt=>'for all columns size 1');
end;
/

The option to gather fixed objects stats individually with a call to dbms_stats.gather_table_stats() is not commonly known, but it does work.

Here’s the plan (again from the small system) after stats collection:

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |    29 | 10411 |     3 (100)| 00:00:01 |
|*  1 |  HASH JOIN                     |                 |    29 | 10411 |     3 (100)| 00:00:01 |
|*  2 |   HASH JOIN                    |                 |     1 |   350 |     2 (100)| 00:00:01 |
|   3 |    MERGE JOIN CARTESIAN        |                 |    25 |  1325 |     0   (0)| 00:00:01 |
|   4 |     NESTED LOOPS               |                 |     4 |   148 |     0   (0)| 00:00:01 |
|*  5 |      FIXED TABLE FULL          | X$KCFIO         |   200 |  6200 |     0   (0)| 00:00:01 |
|*  6 |      FIXED TABLE FIXED INDEX   | X$KCCFE (ind:1) |     1 |     6 |     0   (0)| 00:00:01 |
|   7 |     BUFFER SORT                |                 |     7 |   112 |     0   (0)| 00:00:01 |
|*  8 |      FIXED TABLE FULL          | X$KCCTS         |     7 |   112 |     0   (0)| 00:00:01 |
|*  9 |    VIEW                        | GV$DATAFILE     |     1 |   297 |     1 (100)| 00:00:01 |
|  10 |     SORT ORDER BY              |                 |     1 |   316 |     1 (100)| 00:00:01 |
|  11 |      NESTED LOOPS              |                 |     1 |   316 |     0   (0)| 00:00:01 |
|  12 |       NESTED LOOPS             |                 |     1 |   248 |     0   (0)| 00:00:01 |
|  13 |        NESTED LOOPS            |                 |     1 |   226 |     0   (0)| 00:00:01 |
|* 14 |         FIXED TABLE FULL       | X$KCCFE         |     4 |   612 |     0   (0)| 00:00:01 |
|* 15 |         FIXED TABLE FIXED INDEX| X$KCCFN (ind:1) |     1 |    73 |     0   (0)| 00:00:01 |
|* 16 |        FIXED TABLE FIXED INDEX | X$KCVFH (ind:1) |     1 |    22 |     0   (0)| 00:00:01 |
|* 17 |       FIXED TABLE FULL         | X$KCCFN         |     1 |    68 |     0   (0)| 00:00:01 |
|  18 |   FIXED TABLE FULL             | X$KCBFWAIT      |   400 |  3600 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("FILE#"="FW"."INDX"+1)
   2 - access("TSTSN"="TS#" AND "K"."KCFIOFNO"="FILE#")
   5 - filter("K"."INST_ID"=USERENV('INSTANCE'))
   6 - filter("F"."FEDUP"<>0 AND "F"."FENUM"="K"."KCFIOFNO")
   8 - filter("TSTSN"<>(-1) AND "INST_ID"=USERENV('INSTANCE'))
   9 - filter("INST_ID"=USERENV('INSTANCE'))
  14 - filter("FE"."FEDUP"<>0 AND "FE"."FETSN"<>(-1))
  15 - filter("FN"."FNTYP"=4 AND "FN"."FNNAM" IS NOT NULL AND BITAND("FN"."FNFLG",4)<>4
              AND "FN"."FNFNO"="FE"."FENUM" AND "FE"."FEFNH"="FN"."FNNUM")
  16 - filter("FN"."FNFNO"="FH"."HXFIL")
  17 - filter("FE"."FEPAX"<>65535 AND "FE"."FEPAX"<>0 AND "FE"."FEPAX"="FNAUX"."FNNUM" OR
              ("FE"."FEPAX"=0 OR "FE"."FEPAX"=65535) AND "FE"."FENUM"="FNAUX"."FNFNO" AND
              "FNAUX"."FNTYP"=4 AND "FNAUX"."FNNAM" IS NOT NULL AND BITAND("FNAUX"."FNFLG",4)<>4 AND
              "FE"."FEFNH"="FNAUX"."FNNUM")

Note the changes in cardinality estimates: they now look a little more realistic and we’re down to one cartesian merge join which (if you have a rough idea of what your X$ tables hold) still looks a little surprising at first sight but not completely unreasonable. A change of plan doesn’t necessarily mean much without the data and time behind it, of course, so here are the two sets of results from a 10g database with a handful of datafiles and tablespaces showing the Row Source Operation sections from the tkprof output before and after stats collection:

Before stats collection:

Rows     Row Source Operation
-------  ---------------------------------------------------
      6  HASH JOIN  (cr=0 pr=0 pw=0 time=1957860 us)
  16800   MERGE JOIN CARTESIAN (cr=0 pr=0 pw=0 time=33855662 us)
     42    NESTED LOOPS  (cr=0 pr=0 pw=0 time=73795 us)
   1400     MERGE JOIN CARTESIAN (cr=0 pr=0 pw=0 time=21555 us)
      7      FIXED TABLE FULL X$KCCTS (cr=0 pr=0 pw=0 time=3204 us)
   1400      BUFFER SORT (cr=0 pr=0 pw=0 time=7233 us)
    200       FIXED TABLE FULL X$KCFIO (cr=0 pr=0 pw=0 time=1210 us)
     42     FIXED TABLE FIXED INDEX X$KCCFE (ind:1) (cr=0 pr=0 pw=0 time=1859092 us)
  16800    BUFFER SORT (cr=0 pr=0 pw=0 time=67643 us)
    400     FIXED TABLE FULL X$KCBFWAIT (cr=0 pr=0 pw=0 time=2008 us)
      6   VIEW  GV$DATAFILE (cr=0 pr=0 pw=0 time=68087 us)
      6    SORT ORDER BY (cr=0 pr=0 pw=0 time=68065 us)
      6     NESTED LOOPS  (cr=0 pr=0 pw=0 time=65989 us)
      6      NESTED LOOPS  (cr=0 pr=0 pw=0 time=56632 us)
      6       NESTED LOOPS  (cr=0 pr=0 pw=0 time=47217 us)
      6        FIXED TABLE FULL X$KCCFN (cr=0 pr=0 pw=0 time=19830 us)
      6        FIXED TABLE FIXED INDEX X$KCVFH (ind:1) (cr=0 pr=0 pw=0 time=25568 us)
      6       FIXED TABLE FIXED INDEX X$KCCFE (ind:1) (cr=0 pr=0 pw=0 time=9849 us)
      6      FIXED TABLE FULL X$KCCFN (cr=0 pr=0 pw=0 time=9715 us)

After stats collection:


Rows     Row Source Operation
-------  ---------------------------------------------------
      6  HASH JOIN  (cr=0 pr=0 pw=0 time=196576 us)
      6   HASH JOIN  (cr=0 pr=0 pw=0 time=195829 us)
     42    MERGE JOIN CARTESIAN (cr=0 pr=0 pw=0 time=4390 us)
      6     NESTED LOOPS  (cr=0 pr=0 pw=0 time=7810 us)
    200      FIXED TABLE FULL X$KCFIO (cr=0 pr=0 pw=0 time=1224 us)
      6      FIXED TABLE FIXED INDEX X$KCCFE (ind:1) (cr=0 pr=0 pw=0 time=150150 us)
     42     BUFFER SORT (cr=0 pr=0 pw=0 time=1574 us)
      7      FIXED TABLE FULL X$KCCTS (cr=0 pr=0 pw=0 time=1353 us)
      6    VIEW  GV$DATAFILE (cr=0 pr=0 pw=0 time=41058 us)
      6     SORT ORDER BY (cr=0 pr=0 pw=0 time=41005 us)
      6      NESTED LOOPS  (cr=0 pr=0 pw=0 time=39399 us)
      6       NESTED LOOPS  (cr=0 pr=0 pw=0 time=34229 us)
      6        NESTED LOOPS  (cr=0 pr=0 pw=0 time=15583 us)
      6         FIXED TABLE FULL X$KCCFE (cr=0 pr=0 pw=0 time=1124 us)
      6         FIXED TABLE FIXED INDEX X$KCCFN (ind:1) (cr=0 pr=0 pw=0 time=15067 us)
      6        FIXED TABLE FIXED INDEX X$KCVFH (ind:1) (cr=0 pr=0 pw=0 time=18971 us)
      6       FIXED TABLE FULL X$KCCFN (cr=0 pr=0 pw=0 time=5581 us)
    400   FIXED TABLE FULL X$KCBFWAIT (cr=0 pr=0 pw=0 time=1615 us)

The execution time has dropped from about 2 seconds to less than 1/10th of a second – and all I’ve got is 6 or 7 files and tablespaces.  (Some of the “time=” values in the first plan are very odd, but the final time figure is about right.)

Generating an intermediate result set of 16,800 rows for a tiny number of files is not a good thing – just imagine how big that number would get with 1,000 files and a couple of hundred tablespaces.

I have to say that (for a couple of tiny databases) the 11.2.0.4 and 12.1.0.2 systems I checked this query on had no problem and immediately picked a sensible path. It’s possible that the definition of some of the v$ objects has actually changed or that the optimizer features have changed (some assistence from complex view merging, perhaps) – but if you are still running Statspack, even if it’s on 11g or 12c, then it’s worth checking from time to time how much work goes into executing the snapshot and seeing if you need some fixed object stats to make things a little more efficient.

Footnote:

Several years ago I wrote a short note about how Statspack actually captured its own execution time (from 10g onwards) and how you could run a report on it to check the run time. It’s worth running that report from time to time. I’ve recently updated that note to add the equivalent query against the AWR.


#ThanksOTN

Tue, 2016-10-11 12:57

To mark the OTN Appreciation Day I’d like to offer this thought:

“Our favourite feature is execution plans … execution plans and rowsource execution statistics … rowsource execution statistics and execution plans …  our two favourite features and rowsource execution stats and execution plans … and ruthless use of SQL monitoring …. Our three favourite features are rowsource execution stats, execution plans, ruthless use of SQL monitoring and an almost fanatical devotion to the Cost Based Optimizer …. Our four … no … amongst our favourite features  are such elements as rowsource execution statistics, execution plans …. I’ll come in again.”

With apologies to Monty Python.

 

 

 


InMemory Bonus

Mon, 2016-10-10 07:13

It should be fairly well known by now that when you enable the 12c InMemory (Columnar Store) option (and set the inmemory_size) your SQL may take advantage of a new optimizer transformation know as the Vector Transformation, including Vector Aggregation. You may be a little surprised to learn, though, that some of your plans may change even when they don’t produce any sign of a vector transformation as a consequence. This is because In-Memory Column Store isn’t just about doing tablescans very quickly it’s also about new code paths for doing clever things with predicates to squeeze all the extra benefits from the technology. Here’s an example:


rem
rem     Script:         12c_inmemory_surprise.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2016
rem
rem     Last tested
rem             12.1.0.2
rem

drop table t2 purge;
drop table t1 purge;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                                          id,
        trunc((rownum - 1)/100)                         n1,
        trunc((rownum - 1)/100)                         n2,
        trunc(dbms_random.value(1,1e4))                 rand,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1
;

create table t2
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                                          id,
        trunc((rownum - 1)/100)                         n1,
        trunc((rownum - 1)/100)                         n2,
        trunc(dbms_random.value(1,1e4))                 rand,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

create index t1_n1   on t1(n1)   nologging;
create index t2_rand on t2(rand) nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for columns (n1,n2) size 1'
        );
end;
/

There’s nothing particularly special about these two tables – I engineered them for a different demo, and the call to gather extended stats on the column group (n1, n2) is just a minor detail in getting better cardinality estimates in the upcoming plans. At this point I haven’t specified that either table should be in memory, so let’s see what plan I get from dbms_xplan.display_cursor() when I run a query that should do a hash join using t1 as the build table and t2 as the probe table:


select
        /*+
                qb_name(main)
        */
        count(*)
from    (
        select
                /*+ qb_name(inline) */
                distinct t1.v1, t2.v1
        from
                t1,t2
        where
                t1.n1 = 50
        and     t1.n2 = 50
        and     t2.rand = t1.id
        )
;

select * from table(dbms_xplan.display_cursor);

Plan hash value: 1718706536

-------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |       |       |       |  2441 (100)|          |
|   1 |  SORT AGGREGATE                         |       |     1 |       |            |          |
|   2 |   VIEW                                  |       | 10001 |       |  2441   (4)| 00:00:01 |
|   3 |    HASH UNIQUE                          |       | 10001 |   351K|  2441   (4)| 00:00:01 |
|*  4 |     HASH JOIN                           |       | 10001 |   351K|  2439   (4)| 00:00:01 |
|*  5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   100 |  2100 |     3   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                  | T1_N1 |   100 |       |     1   (0)| 00:00:01 |
|   7 |      TABLE ACCESS FULL                  | T2    |  1000K|    14M|  2416   (4)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."RAND"="T1"."ID")
   5 - filter("T1"."N2"=50)
   6 - access("T1"."N1"=50)

Thanks to the column group the optimizer has estimated correctly that the number of rows selected from t1 would be 100. Beyond that there’s very little exciting about this execution plan.

So let’s modify t2 to be in-memory and see how the plan changes as we re-execute the query:


alter table t2 inmemory;

select
        /*+ qb_name(main) */
        count(*)
...

select * from table(...);


Plan hash value: 106371239

----------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |         |       |       |   259 (100)|          |
|   1 |  SORT AGGREGATE                          |         |     1 |       |            |          |
|   2 |   VIEW                                   |         | 10001 |       |   259  (27)| 00:00:01 |
|   3 |    HASH UNIQUE                           |         | 10001 |   351K|   259  (27)| 00:00:01 |
|*  4 |     HASH JOIN                            |         | 10001 |   351K|   257  (26)| 00:00:01 |
|   5 |      JOIN FILTER CREATE                  | :BF0000 |   100 |  2100 |     3   (0)| 00:00:01 |
|*  6 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1      |   100 |  2100 |     3   (0)| 00:00:01 |
|*  7 |        INDEX RANGE SCAN                  | T1_N1   |   100 |       |     1   (0)| 00:00:01 |
|   8 |      JOIN FILTER USE                     | :BF0000 |  1000K|    14M|   234  (20)| 00:00:01 |
|*  9 |       TABLE ACCESS INMEMORY FULL         | T2      |  1000K|    14M|   234  (20)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."RAND"="T1"."ID")
   6 - filter("T1"."N2"=50)
   7 - access("T1"."N1"=50)
   9 - inmemory(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))

The cost of the tablescan drops dramatically as the optimizer assumes the table will be in the in-memory column store (IMCS) but (in manys way more significantly) we suddenly have a (serial) Bloom filter for a hash join – which eliminates (most of) the data that wouldn’t have survived the hash join without having to use the CPU that would normally be spent probing the build table.

This is an interesting example of what the in-memory code path can do for us. There’s no point in using the Bloom filter for a serial hash join in “classic” Oracle because the Bloom filter is basically the bitmap of the hash table that is the first thing Oracle examines when probing the hash table – but with “in-memory” Oracle there are some enhancements to the Bloom filter and the code path that make using the Bloom filter an effective strategy. Most significant, perhaps, is that the in-memory code path can use SIMD instructions to perform multiple probes from t2 simultaneously, so not only do we get the benefits of avoiding disk access, buffer activity and row-by-row access to columns, we also reduce the CPU time spent on making the first-stage comparisons of the hash join. (And shared columnar dictionaries in 12.2 could reduce this even further!)

Footnote: I also have a note I scribbled dowsn at the Trivadis performance days last month that the Bloom filter used with IMCS carries the actual low and high values from the build table.  I may have misinterpreted this as I wrote it, but if that’s correct then it’s another step in eliminating data very quickly and very early when using IMCS (or Exadata if the same feature exists in the Bloom filters that get pushed to the storage servers.)

 


My session workload

Thu, 2016-10-06 07:19

My old website (www.jlcomp.demon.co.uk) will be disappearing in a couple of weeks – but there are a couple of timeless articles on it that are worth saving and although the popularity of this one has probably been surpassed by Tanel Poder’s Snapper script, or other offerings by Tom Kyte or Adrian Billington, it’s still one of those useful little things to have around – it’s a package to takes a snapshot of your session stats.

The package depends on a view created in the SYS schema, and the package itself has to be installed in the SYS schema – which is why other strategies for collecting the information have become more popular; but if you want to have it handy, here are the two scripts:

rem
rem     Script:         c_mystats.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2001
rem     Purpose:        Put names to v$mystat
rem
rem     Last tested
rem             12.1.0.2        -- naming issue
rem             11.2.0.4
rem             10.2.0.5
rem             10.1.0.4
rem              9.2.0.8
rem              8.1.7.4
rem
rem     Notes:
rem     Should be run by SYS - which means it has to be re-run
rem     on a full database export/import
rem
rem     It looks as if it is illegal to create a view with a
rem     name starting with v$ in the sys account as from 12c.
rem     (ORA-00999: invalid view name). Hence the JV$ name.
rem
rem     But you can create a public synonym starting "v$"
rem

create or replace view jv$my_stats
as
select
        /*+
                first_rows
                ordered
        */
        ms.sid,
        sn.statistic#,
        sn.name,
        sn.class,
        ms.value
from
        v$mystat        ms,
        v$statname      sn
where
        sn.statistic# = ms.statistic#
;

drop public synonym v$my_stats;
create public synonym v$my_stats for jv$my_stats;
grant select on v$my_stats to public;

rem
rem	Script:		snap_myst.sql
rem	Author:		Jonathan Lewis
rem	Dated:		March 2001
rem	Purpose:	Package to get snapshot start and delta of v$mystat
rem
rem	Last tested
rem		12.1.0.2
rem		11.2.0.4
rem		10.2.0.5
rem		10.1.0.4
rem		 9.2.0.8
rem		 8.1.7.4
rem
rem	Notes
rem	Has to be run by SYS to create the package
rem	Depends on view (j)v$my_stats (see c_mystats.sql)
rem
rem	Usage:
rem		set serveroutput on size 1000000 format wrapped
rem		set linesize 120
rem		set trimspool on
rem		execute snap_my_stats.start_snap
rem		-- do something
rem		execute snap_my_stats.end_snap
rem

create or replace package snap_my_stats as
	procedure start_snap;
	procedure end_snap (i_limit in number default 0);
end;
/

create or replace package body snap_my_stats as

cursor c1 is
	select 
		statistic#, 
		name,
		value
	from 
		v$my_stats
	where
		value != 0
	;


	type w_type is table of c1%rowtype index by binary_integer;
	w_list		w_type;
	empty_list	w_type;

	m_start_time	date;
	m_start_flag	char(1);
	m_end_time	date;

procedure start_snap is
begin

	m_start_time := sysdate;
	m_start_flag := 'U';
	w_list := empty_list;

	for r in c1 loop
		w_list(r.statistic#).value := r.value;
	end loop;

end start_snap;


procedure end_snap (i_limit in number default 0) 
is
begin

	m_end_time := sysdate;

	dbms_output.put_line('---------------------------------');

	dbms_output.put_line('Session stats - ' ||
				to_char(m_end_time,'dd-Mon hh24:mi:ss')
	);

	if m_start_flag = 'U' then
		dbms_output.put_line('Interval:-  '  || 
				trunc(86400 * (m_end_time - m_start_time)) ||
				' seconds'
		);
	else
		dbms_output.put_line('Since Startup:- ' || 
				to_char(m_start_time,'dd-Mon hh24:mi:ss')
		);
	end if;

	if (i_limit != 0) then
		dbms_output.put_line('Lower limit:-  '  || i_limit);
	end if;

	dbms_output.put_line('---------------------------------');

	dbms_output.put_line(
		rpad('Name',60) ||
		lpad('Value',18)
	);

	dbms_output.put_line(
		rpad('----',60) ||
		lpad('-----',18)
	);

	for r in c1 loop
		if (not w_list.exists(r.statistic#)) then
		    w_list(r.statistic#).value := 0;
		end if;

		if (
		       (r.value > w_list(r.statistic#).value + i_limit)
		) then
			dbms_output.put(rpad(r.name,60));
			dbms_output.put(to_char(
				r.value - w_list(r.statistic#).value,
					'9,999,999,999,990')
			);
			dbms_output.new_line;
		end if;
	end loop;

end end_snap;

begin
	select
		logon_time, 'S'
	into
		m_start_time, m_start_flag
	from
		v$session
	where
		sid = 	(
				select /*+ no_unnest */ sid 
				from v$mystat 
				where rownum = 1
			);

end snap_my_stats;
/

drop public synonym snap_my_stats;
create public synonym snap_my_stats for snap_my_stats;
grant execute on snap_my_stats to public;

One point to be cautious about with this package: do not embed it inside anonymous pl/sql blocks, e.g.

begin

        snap_my_stats.start_snap;

        -- some other code

        snap_my_stats.end_snap;

end;
/

There are some statistics in v$my_stats (v$mystat / v$sesstat) which are not updated until the end of a database call – and calling an anonymous pl/sql block counts as a single database call, so some of your statistics (for example “CPU used by this session”) will report misleading values.


Kill CPU

Mon, 2016-10-03 02:58

My old website (www.jlcomp.demon.co.uk) will be disappearing in a couple of weeks – but there are a couple of timeless articles on it that are worth saving and a method for soaking up all the CPU on your system with a simple SQL statement against a small data set is, surely, one of them. Here, then is a little script that I wrote (or, at least, formalised) 15 years ago to stress out a CPU:


rem
rem     Script:         kill_cpu.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2001
rem     Purpose:        Exercise CPU and latches
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem             11.1.0.7
rem             10.2.0.5
rem             10.1.0.4
rem              9.2.0.8
rem              8.1.7.4
rem
rem     Notes:
rem     The count(*) will return power((2,n-1))
rem     To run from Oracle 9 and later we have to set parameter
rem             _old_connect_by_enabled = true;
rem
rem     Base calculation (historical):
rem     ==============================
rem     Rule of thumb - ca. 10,000 logical I/Os per sec per 100 MHz of CPU.
rem     (Modern versions of Oracle on modern CPUs - about twice that, maybe
rem     due to the introduction of the "fastpath" consistent gets with the
rem     elimination of some logging activity that used to exist.)
rem
rem     With the value of 23 shown we do 6M buffer visits of which 4M
rem     are "conistent read gets", and 2M are "buffer is pinned count".
rem     (That's power(2,23-1) and power(2,23-2) respectively). For each
rem     row you add to the kill_cpu table you double the run-time.
rem
rem      This is an example of SQL that can take MUCH longer when run
rem      with rowsource_execution_statistics enabled. Mostly spent on 
rem      CPU calling the O/S timer. (On my last test, using a 12c VM
rem      the time jumped from 6 seconds - 23 rows - to 75 seconds when
rem      I set statistics_level to all; but half would be the effect of
rem      running through the VM.)
rem

drop table kill_cpu;

begin

        begin           execute immediate 'purge recyclebin';
        exception       when others then null;
        end;

        begin           execute immediate 'alter session set "_old_connect_by_enabled"=true';
        exception       when others then null;
        end;

end;
/

create table kill_cpu(n, primary key(n))
organization index
as
select  rownum n
from    all_objects
where   rownum <= 23
;
execute snap_my_stats.start_snap 

set timing on 
set serveroutput off

-- alter session set statistics_level = all;

spool kill_cpu

select  count(*) X
from    kill_cpu 
connect by 
        n > prior n
start with 
        n = 1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

set serveroutput on
execute snap_my_stats.end_snap

spool off

set timing off
alter session set statistics_level = typical;

The calls to snap_my_stats use a package (owned by sys) that I wrote a long time ago for taking a snapshot of v$mystats; many people use Tanel Poder’s “Snapper” script or Tom Kyte’s script instead.


Pages