Skip navigation.

Jonathan Lewis

Syndicate content Oracle Scratchpad
Just another Oracle weblog
Updated: 7 hours 31 min ago

Row Migration

Mon, 2014-02-10 12:55

At one of the presentations I attended at RMOUG this year the presenter claimed that if a row kept increasing in size and had to migrate from block to block as a consequence then each migration of that row would leave a pointer in the previous block so that an indexed access to the row would start at the original table block and have to follow an ever growing chain of pointers to reach the data.

This is not correct, and it’s worth making a little fuss about the error since it’s the sort of thing that can easily become an urban legend that results in people rebuilding tables “for performance” when they don’t need to.

Oracle behaves quite intelligently with migrated rows. First, the migrated row has a pointer back to the original location and if the row has to migrate a second time the first place that Oracle checks for space is the original block, so the row might “de-migrate” itself; however, even if it can’t migrate back to the original block, it will still revisit the original block to change the pointer in that block to refer to the block it has moved on to – so the row is never more than one step away from its original location. As a quick demonstration, here’s some code to generate and manipulate some data:

create table t1 (
id	number(6,0),
v1	varchar2(1200)
)
pctfree 0
;

prompt	==========================================
prompt	The following code fits 74 rows to a block
prompt	==========================================

insert into t1 select rownum - 1, rpad('x',100) from all_objects where rownum <= 75;
commit;

prompt	======================================
prompt	Make the first row migrate and dump it
prompt	======================================

update t1 set v1 = rpad('x',400) where id = 0;
commit;

alter system flush buffer_cache;
execute dump_seg('t1',2)

prompt	===========================================================
prompt	Fill the block the long row is now in, force it to migrate,
prompt	then dump it again.
prompt	===========================================================

insert into t1 select rownum + 75, rpad('x',100) from all_objects where rownum <= 75;
commit;

update t1 set v1 = rpad('x',800) where id = 0;
commit;

alter system flush buffer_cache;
execute dump_seg('t1',3)

prompt	========================================================
prompt	Fill the block the long row is now in and shrink the row
prompt	to see if it returns to its original block. (It won't.)
prompt	========================================================

insert into t1 select rownum + 150, rpad('x',100) from all_objects where rownum <= 75;
commit;

update t1 set v1 = rpad('x',50) where id = 0;
commit;

alter system flush buffer_cache;
execute dump_seg('t1',3)

prompt	========================================================
prompt	Make a lot of space in the first block and force the row
prompt	to migrate again to see if it migrates back. (It does.)
prompt	========================================================

delete from t1 where id between 1 and 20;
commit;

update t1 set v1 = rpad('x',1200) where id = 0;
commit;

alter system flush buffer_cache;
execute dump_seg('t1',3)

My test database was using 8KB blocks (hence the 74 rows per block), and 1MB uniform extents with freelist management. The procedure dump_seg() takes a segment name as its first parameter and a number of blocks as the second (then the segment type and starting block as the third and fourth) and dumps the first N data blocks of the segment. To demonstrate what goes on, I’ve extracted the content of the first row (id = 0) after each of the four dumps:

After the first update - the column count (cc) is zero and the "next rowid" (nrid) is row 1 of block 0x0140000b

tab 0, row 0, @0xb3
tl: 9 fb: --H----- lb: 0x2  cc: 0
nrid:  0x0140000b.1

After the second update - the next rowid is row 7 of block 0x0140000c

tab 0, row 0, @0xb3
tl: 9 fb: --H----- lb: 0x1  cc: 0
nrid:  0x0140000c.7

After the third update (shrinking the row) the row hasn't moved from block 0x0140000c

tab 0, row 0, @0xb3
tl: 9 fb: --H----- lb: 0x2  cc: 0
nrid:  0x0140000c.7

After the fourth update (making space, and growing the row too much) the row moves back home

tab 0, row 0, @0x4c1
tl: 1208 fb: --H-FL-- lb: 0x2  cc: 2
col  0: [ 1]  80
col  1: [1200]
78 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

My calls to dump blocks included the blocks where the row migrated to, so we’ll have a look at the target locations (as given by the original row location’s nrid) in those blocks over time. First we check block 0x0140000b, row 1 after the first two migrations:

tab 0, row 1, @0x1d7f
tl: 414 fb: ----FL-- lb: 0x2  cc: 2
hrid: 0x0140000a.0
col  0: [ 1]  80
col  1: [400]

tab 0, row 1, @0x1d7f
tl: 2 fb: ---DFL-- lb: 0x1

After the first migration (the row arrives here) we have a “head rowid” (hrid) pointer telling us where the row came from. After the second migration, when the row has moved on, we simply have a typical “deleted stub” – two bytes reserving the row directory entry until the commit has been done and cleaned out.

Then we can examine the second target (0x140000c, row 7) on the second and third and fourth updates:


tab 0, row 7, @0x1966
tl: 814 fb: ----FL-- lb: 0x2  cc: 2
hrid: 0x0140000a.0
col  0: [ 1]  80
col  1: [800]

tab 0, row 7, @0xb1
tl: 62 fb: ----FL-- lb: 0x1  cc: 2
hrid: 0x0140000a.0
col  0: [ 1]  80
col  1: [50]

tab 0, row 7, @0xb1
tl: 2 fb: ---DFL-- lb: 0x2

As you can see, on arrival this location gets the original rowid as its “head rowid” (hrid), and it knows nothing about the intermediate block where the row was briefly in transit. I’ve copied the length byte (in square brackets) of column 1 in the dumps so that you can see that the row stayed put as it shrank. We can then see on the last update that we are left with a deleted stub in this block as the row migrates back to its original location when we try to extend it beyond the free space in this block.

Migrated rows are only ever one step away from home. It’s not nice to have too many of them, but it’s not necessarily a disaster.


RAC Plans

Mon, 2014-02-10 07:12

Recently appeared on Mos – “Bug 18219084 : DIFFERENT EXECUTION PLAN ACROSS RAC INSTANCES”

Now, I’m not going to claim that the following applies to this particular case – but it’s perfectly reasonable to expect to see different plans for the same query on RAC, and it’s perfectly possible for the two different plans to have amazingly different performance characteristics; and in this particular case I can see an obvious reason why the two nodes could have different plans.

Here’s the query reported in the bug:

SELECT /*+ INDEX(C IDX3_GOD_USE_LOT)*/
   PATTERN_ID, STB_TIME
    FROM mfgdev.MTR_AUTO_GOD_AGENT_BT C
   WHERE 1 = 1
     AND EXISTS (SELECT /*+ INDEX(B IDX_MTR_STB_LOT_EQP)*/ 1
            FROM MFGDEV.MTR_STB_BTH B
           WHERE B.PATTERN_ID = C.PATTERN_ID
             AND B.STB_TIME = C.STB_TIME
             AND B.ACTUAL_START_TIME < SYSDATE
             AND EXISTS (SELECT /*+ INDEX(D CW_LOTID)*/
                   1
                    FROM F14DM.DM_CW_WIP_BT D
                   WHERE D.LOT_ID = B.LOT_ID
                     AND D.SS = 'BNKI'));

See the reference to “sysdate”. I can show you a system where you had a 15 minute window each day (until the problem was addressed) to optimize a particular query if you wanted a good execution plan; if you optimized it any other time of day you got a bad plan – and the problem was sysdate: it acts like a peeked bind variable.

Maybe, on this system, if you optimize the query at 1:00 am you get one plan, and at 2:00 am you get another – and if those two optimizations occur on different nodes you’ve just met the conditions of this bug report.

Here’s another thought to go with this query: apparently it’s caused enough problems in the past that someone’s written a couple of hints into the code. With three tables and two correlated subqueries in the code a total of three index() hints is not enough. If you’re going to hard-code hints into a query then take a look at the outline it generates when it does the right thing, and that will tell you about the 15 or so hints you’ve missed out. (Better still, consider generating an SQL Baseline from the hinted code and attaching it to the unhinted code.)


IOT Catalogue

Sun, 2014-02-09 12:52

I’ve just added a catalogue of Richard Foote’s articles on IOTs to the list I made a couple of years ago of Martin Widlake’s articles, so this is just a temporary note to point people to the updated list.


12c fixed subquery

Thu, 2014-02-06 08:25

Here’s a simple little demonstration of an enhancement to the optimizer in 12c that may result in some interesting changes in execution plans as cardinality estimates change from “guesses” to accurate estimates.


create table t1
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum				id,
	trunc(sysdate) + rownum		d1
from
	generator	v1
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

set autotrace traceonly explain

select * from t1 where id > (select 9900 from dual);

set autotrace off

Since there are no indexes in sight the execution plan has to be a tablescan. The interesting thing, though, is the optimizer’s prediction for the number of rows returned. If you look at the code you can work out that the actual result set should be 100; but Oracle has used a standard “guess” of 5% for predicates of the form “column greater than (scalar subquery)”, and sure enough, here’s the plan for 11.2.0.3


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500 |  2000 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   500 |  2000 |     3   (0)| 00:00:01 |
|   2 |   FAST DUAL       |      |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"> (SELECT 9900 FROM "SYS"."DUAL" "DUAL"))

However, after upgrading to 11.2.0.4, the numbers change:

12c plan - shows correct cardinality estimate
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 |   400 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |   400 |     3   (0)| 00:00:01 |
|   2 |   FAST DUAL       |      |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"> (SELECT 9900 FROM "SYS"."DUAL" "DUAL"))

Shades of precompute_subquery – it looks like the optimizer recognizes that the subquery has to return a constant, and extracts that constant for use in the plan. Strangely, though, I can’t see any sign in the 10053 trace file of the subquery value being used until the plan just magically appears with the right estimate.


Minus

Wed, 2014-02-05 11:42

Here’s a little script to demonstrate an interesting observation that appeared in my email this morning (that’s morning Denver time):

create table t1
as
select * from all_objects where rownum = 1;

delete from t1;
commit;

create table t2
as
select * from all_objects where rownum <= 100000;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'
	);
end;
/

alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';

prompt  ======================
prompt  And now the test query
prompt  ======================

select * from t1
minus
select * from t2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set events '10046 trace name context off';

Clearly the first query block in the test query will return no rows, and since the MINUS operator returns rows from the first result set that do not appear in the second result set there is no need for Oracle to run the second query block. Well, guess what …

The ‘create where rownum = 1′ followed by ‘delete’ is a lazy workaround to avoid side effects of deferred segment creation so that you can run the script on any (recent) version of Oracle. The flush, combined with 10046 trace, allowed me to see waits that showed which objects Oracle scanned and when, and the display_cursor() was just icing on the cake.

I’ve checked 11.2.0.4 and 12.1.0.1, and both of them scan t1 first and then scan t2 unnecessarily.

This surprised me slightly given how smart the optimizer can be, but I guess it’s one of those boundary cases where the optimizer has just one strategy for an entire class of queries. I couldn’t think of any “legal” way to control the effect, but here’s the first dirty trick that came to my mind. If you’re sure that the first subquery is going to be cheap and you’re worried that the second subquery is expensive, you could do the following:

select v2.*
from
	(select * from t1 where rownum = 1)	v1,
	(
		select * from t1
		minus
		select * from t2
	)	v2
;

Introduce a spurious query to return one row from the first subquery and join it do the MINUS query. If the inline view doesn’t return any rows Oracle short-circuits the join, as shown by the following execution path with stats:

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |      1 |        |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   1 |  MERGE JOIN CARTESIAN  |      |      1 |      1 |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   2 |   VIEW                 |      |      1 |      1 |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|*  3 |    COUNT STOPKEY       |      |      1 |        |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   4 |     TABLE ACCESS FULL  | T1   |      1 |      1 |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   5 |   BUFFER SORT          |      |      0 |      1 |      0 |00:00:00.01 |       0 |      0 | 73728 | 73728 |          |
|   6 |    VIEW                |      |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   7 |     MINUS              |      |      0 |        |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   8 |      SORT UNIQUE       |      |      0 |      1 |      0 |00:00:00.01 |       0 |      0 | 73728 | 73728 |          |
|   9 |       TABLE ACCESS FULL| T1   |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  10 |      SORT UNIQUE       |      |      0 |  70096 |      0 |00:00:00.01 |       0 |      0 | 73728 | 73728 |          |
|  11 |       TABLE ACCESS FULL| T2   |      0 |  70096 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(ROWNUM=1)

The only thing to watch out for is that the “rownum = 1″ doesn’t make the optimizer switch to an unsuitable “first_rows(1)” execution plan.


Philosophy 21

Tue, 2014-02-04 08:46

I’ll  be buying the tickets for my flight to Seattle and Kaleidoscope 14 some time tomorrow. The cut-off date on my credit card bill is today, so if I get the tickets tomorrow I won’t have to pay for them until the end of March.

When you know you have to pay it’s worth thinking about when you have to pay. It’s a principle that works in Oracle databases, too.

On the flip-side – sometimes you don’t realise that the clever thing you’ve done now is going to make someone else pay later.


Quiz

Mon, 2014-02-03 19:33

To create an index on a table (with no existing indexes) Oracle has to start by doing a tablescan.

What’s the difference between the tablescan it uses for a B-tree index and the tablescan it uses for a bitmap index ? Why ?

Update:

I was going to give a hint that if you answered the “why” first that might lead you to the right idea and a test for the “what”, but we already have an answer, with a sample of proof.


Clustaghhh!

Sun, 2014-02-02 16:27

It doesn’t matter which bit of Oracle technology you want to use, eventually someone, somewhere, runs into the special case where something nasty happens. Here’s an edge case for people using (index) clusters – Oracle Bug  17866999 ora-1499 for cluster following rman convert

It comes from a conversation on Oracle-L where Jack van Zanen reported a problem of inconsistent results after migrating data between platforms using rman to converts some tablespaces containing index clusters. This is the starting post where he shows a query that is clearly getting the wrong answer (select where channel_number = 503 obviously shouldn’t return data with channel_number 501).

Unfortunately the “audit-trail” on this problem is incomplete because I accidentally started sending email to Jack instead of the list (reply, rather than reply all) – but given that the problem column was the cluster key of an index cluster the obvious guess was that something had gone wrong as the data for this key value had grown and chained into a second (or third) block. To confirm this I asked Jack to select the rowids reported for the cluster and use the dbms_rowid package to convert them into distinct (file_number, block_number) values:

select
	dbms_rowid.rowid_relative_fno(rowid)	file_no,
	dbms_rowid.rowid_block_number(rowid)	block_no,
	count(*)				rows_in_block
from
	&m_table	t1
group by
	dbms_rowid.rowid_relative_fno(rowid),
	dbms_rowid.rowid_block_number(rowid)
order by
	dbms_rowid.rowid_relative_fno(rowid),
	dbms_rowid.rowid_block_number(rowid)
;

With a few file and block numbers in hand, we then needed to see the results from dumping the blocks. Jack sent me the trace files, and I found that my guess was correct – and also discovered that I had been on a private discussion, so I posted the results back to the list. As you can see from that posting, the cluster key (indicated by the initial K in the flag byte, and by its presence in “tab 0″ in the block) is 4 columns long, and I’ve got entries from two blocks where the cluster keys are pointing to each other – but the cluster keys don’t hold the same value.

It’s an extraordinary corruption – the data structures appear to be totally self-consistent and perfectly clean, and yet the data content is wrong.  (Usually I find that a corruption means the data becomes unusable).

Oracle eventually tracked this down and reproduced it; it’s a problem that can arise if you are using index clusters and use rman to do a cross-platform conversion – and it’s not going to be easy to fix it. I don’t suppose many people are likely to see this bug (it’s been around for years, apparently, but there aren’t many people who use clusters, and of those perhaps very few have migrated between platforms using rman) – but at present the only workaround is to export and reimport the data; which is not nice when you’re talking about terabytes of clusters.


RMOUG

Sun, 2014-02-02 04:59

My provisional agenda for Training Days 2014

Wednesday:
3:15 I present “Selectivity and how Oracle calculates it”

Thursday:
8:30 – 9:30 (Difficult choice) Iordan Iotzov – How sure is the optimizer about its cardinality estimates
11:15 – 12:15 (Difficult choice) Andy Colvin – Common Exadata mistakes
13:15 – 14:15 Jeff Smith – Oracle Careers panel discussion
14:30 – 15:30 I present “The revolution in histograms”.
16:00 – 17:00 Terry Sutton – “Now what’s up with dbms_stats”

Friday:
8:30 – 9:30 Alex Fatkulin – Why you may not need offloading
9:45 – 10:45 Jeff Jacobs – SQL Performance anti-patterns
11:15 – 12:15 Alex Gorbachev – Hadoop for Oracle Database Professional
13:30 – 14:30 Bjoern Rost – The ins and outs of Flashback Data Archives

Then it’s off to the airport before the show is over to get my flight home.


Modify PK

Sat, 2014-02-01 05:00

Sitting in the lounge waiting to be called for my flight I was musing on the 12c feature of having multiple indexes defined on the same ordered column set  when a thought crossed my mind and I decided to run a little test that looked like this:

create table t1 as select * from all_objects where rownum <= 10000;

create unique index t1_pk on t1(object_id);

alter table t1 add constraint t1_pk primary key(object_id);

create index t1_i1 on t1(object_id, object_name);

drop index t1_pk;

/*
    expect ORA-02429: cannot drop index used for enforcement of unique/primary key
*/

alter table t1 modify primary key using index t1_i1;

drop index t1_pk;

For years I’ve been assuming that you really have to mess around with the PK (and any related FKs) if you want to change the index supporting the primary key – but this code demonstrates that you can add a new index to a table and “move” the primary key to it before dropping the original index.

The worrying thing about this (for me, at any rate) is that it isn’t a new feature – after testing it on 11.2.0.4 I started working backwards, and it works down to 9.2.0.8 (the earlist 9i I have access to). It doesn’t work on 8.1.7.4, and the 9.2.0.8 version behaves slightly differently from later versions because the original PK index disappears as the constraint is moved.

As I’ve often said about trust – keep an eye on the date and version of any article you read, it may no longer be true.


Travel

Mon, 2014-01-27 08:40

I’m going to be in Denver next week for the RMOUG Training Days – so I’ve just received an email asking me if I could stay on a couple of days and visit someone fairly close by while I’m in the area. It’s not the first time that I’ve had a last minute request like this; unfortunately I tend to book my flights weeks, and sometimes months, in advance – usually at a good price but with a hefty cancellation penalty (like, no refund), so not possible this time.

However, I’m going to be speaking at ODTUG Kscope 14  (my name’s not visible on the agenda yet) and I’m just about to book the flights but, in view of the email, I thought I’d wait a few days and let people know where I’m heading in case someone wants me to spend a couple of days on-site in Seattle at the end of the week commencing 21st June.  Email me if you’re interested:  jonathan@jlcomp.demon.co.uk

 


Expert

Mon, 2014-01-27 03:18

A recent “Hot topics” email from Oracle support listed the following bug as one which had recently been updated:


17727676 OPTIMIZER HINT IGNORED  WHEN USING INVISIBLE INDEXES

Since the optimizer is one of my pet topics I thought I’d take a quick look at what it said – and found this heart-warming introduction;


Hdr: 17727676 11.2.0.3.0 RDBMS 11.2.0.3.0 QRY OPTIMIZER PRODID-5 PORTID-226
Abstract: OPTIMIZER HINT IGNORED WHEN USING INVISIBLE INDEXES
*** 11/03/13 03:46 am ***

PROBLEM:
--------
Based on a blog article from the international recognized Oracle Expert Jonathan Lewis ...

If it’s on MoS surely it’s just got to be true!  (Yes, I know I’ve said the opposite in the past – but it’s definitely right some of the time)

The bug/blog in question was this one, and the problem is fixed in 12.2


Pagination

Sun, 2014-01-26 06:08

I was involved in a thread on Oracle-L recently started with the question: “How many LIOs is too many LIOs”. Rather than rewrite the whole story, I’ve supplied a list of links to the contributions I made, in order – the final “answer” is actually the answer to a different question – but travels an interesting path to get there.#

I’ve got a script to emulate the requirement so that people can see for themselves the bug that I mention in post 15; I’ll try to add a couple of notes to it and publish it some time, but for the moment I’ll just remind myself that it’s called (slightly counter-intuitively: no_sort_problem.sql)


Optimisation

Thu, 2014-01-23 12:05

Here’s a recent request from the OTN database forum – how do you make this query go faster (tkprof output supplied):

 select a.rowid
   from  a, b
   where A.MARK IS NULL
     and a.cntry_code = b.cntry_code and b.dir_code='XX' and b.numb_type='XXX'
     and upper(Trim(replace(replace(replace(replace(replace(replace(replace(a.co_name,'*'),'&'),'-'),'/'),')'),'('),' '))) like
         upper(Trim(substr(replace(replace(replace(replace(replace(replace(replace(b.e_name,'*'),'&'),'-'),'/'),')'),'('),' '),1,25)))||'%';

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        3   3025.53    3260.11       8367       7950          0          31
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        6   3025.54    3260.13       8367       7950          0          31

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 74

Rows     Row Source Operation
-------  ---------------------------------------------------
     31  HASH JOIN
 302790   INDEX FAST FULL SCAN OBJ#(39024) (object id 39024)   -- B 500,000 in the table
  55798   TABLE ACCESS FULL OBJ#(78942)                        -- A 175,000 in the table



-- and from some "explain plan" tool
SELECT STATEMENT  CHOOSE Cost: 333  Bytes: 52,355,940  Cardinality: 608,790 
3 HASH JOIN  Cost: 333  Bytes: 52,355,940  Cardinality: 608,790 
  1 INDEX FAST FULL SCAN UNIQUE B_PK Cost: 4  Bytes: 503,022  Cardinality: 12,898 
    2 TABLE ACCESS FULL A Cost: 215  Bytes: 3,150,034  Cardinality: 67,022

One thing you might note from the spartan tkprof output – this is an old version of Oracle (9.2.0.1 to be exact).

The first thing to do is note that most of the time is spent on the CPU – and maybe that multiply cascading replace() has something to do with it.  Now replace() and translate() are things I use so rarely that I usually get them wrong first time, but I think the predicate could be replaced by:

upper(translate(a.co_name, 'x*&-/)( ', 'x')) like upper(substr(translate(b.e_name, 'x*&-/)( ', 'x'),1,25))||'%'

Beyond making the code slightly less eye-boggling, though, I don’t think this is going to help much. Consider the information we have about the sizes of the rowsources involved.

If we can trust the tkprof row counts as being the complete output from the first execution of the statement (there seem to have been 2 in the trace file) – we selected 300,000 rows from one table and 56,000 rows from the other and then joined them with a hash join. A hash join requires equality predicates, and the only join predicate in the query that could be used is the one “a.cntry_code = b.cntry_code”.

Now, if cntry_code is short for “country code” we have a scaling problem: there are only about 600 countries in the world, so on average each row in the A table (56,000 rows acquired) is going to find roughly 500 rows in the B table (300,000 rows divided across 600 countries). So at run time the hash join will generate a rowsource of at least 56,000 * 500 = 28 Million rows; then Oracle is going to do that complicated bit of textual manipulation on two columns, compare them, and find that ultimately only 31 rows match !

So how can we do less work ?

If we’re lucky we can make the hash join much more efficient by thinking about what that nasty textual predicate means. We compare to see if one string looks like it’s starting with the first 25 characters of the other string – but if it does then the two strings have to be identical on the first 25 characters, and a hash join works with equality. So let’s just add in a new predicate to see what effect it has:

upper(substr(translate(a.co_name, 'x*&-/)( ', 'x'),1,25)) = upper(substr(translate(b.e_name, 'x*&-/)( ', 'x'),1,25))

I’ve made the suggestion on the forum – now I’m waiting to see if it has a beneficial effect (or whether I’ve made a silly mistake in my logic or guesswork)


Spoiler

Tue, 2014-01-21 15:42

Here’s a little detail I could do without in my database:



select
       owner, object_type, object_name
from
        all_objects
where
        object_name like '_'
order by
        object_name, object_type
;

OWNER           OBJECT_TYPE         OBJECT_NAME
--------------- ------------------- --------------------
APEX_030200     PROCEDURE           F
PUBLIC          SYNONYM             F
APEX_030200     PROCEDURE           G
APEX_030200     PROCEDURE           P
PUBLIC          SYNONYM             P
APEX_030200     FUNCTION            V
PUBLIC          SYNONYM             V
APEX_030200     PROCEDURE           Z
PUBLIC          SYNONYM             Z

9 rows selected.


Public names like P and F for procedures or functions are just not on (unless I create them myself).


Bitmap question

Fri, 2014-01-17 13:06

If you know anything about bitmap indexes you probably know that a single entry in a bitmap index takes the form (key_value, starting rowid, ending rowid, BBC compressed bit string). So an entry covers a single value for a column over a range of rowids  in the table, and the string of bits for that (notional) range is reduce to a minimum by a compression mechanism that eliminate repeated zeros in multiples of 8.

So here’s a question – to which I don’t know the answer, although you may be surprised when you try to find it:

If you have a very large table and in one of its columns the first row and the last row (and no others) hold the value 0 (say) and you create a bitmap index on this column, what’s the largest number of rows you could have in the table before Oracle would HAVE to create two index entries in order to cover both rows ?

Follow-up question – once you start getting close to working out the answer, can you think of a way to provide an example without actually creating a table with that many rows in it ?

 


Virtual Columns

Thu, 2014-01-16 02:56

This is just a temporary note to let people know I’ve added an update to my Virtual Stats article, highlighting a possible performance threat when you gather stats.

I’ll be deleting this note in a couple of days, so I’ve disabled comments and pingbacks

 

 


Single block reads

Tue, 2014-01-14 12:52

When a “cache read” tablescan (or index fast full scan) takes place we generally expect to see waits on “db file scattered read” as Oracle performs multi-block reads to do the scan. But we all know that Oracle will skip over blocks that are already in the cache and can therefore end up doing multi-block reads of many different sizes, even down to the point where it does single block reads (waiting for “db file sequential read”).

A quirky little question came up on OTN a little while ago: “for a large table we expect multiblock reads to be positioned at the end of the LRU for prompt re-use; but when Oracle does a single block read as part of a tablescan does it go to the end of the LRU (because it’s part of a large tablescan) or does it go to the mid-point of the LRU (because it’s a single block read)?”

The description of how blocks are treated in a tablescan has been simplified, of course, but the question is still valid – so what’s the answer, and how (without going into an extreme level of detail) would you demonstrate it ?

 


xDollar

Mon, 2014-01-13 04:09

I see Tanel has just started a short series of articles on where the X$ data come from so, for those with time on their hands (?anyone?) here’s a little script I wrote a few years ago when I wanted to check which v$ objects corresponded to the same internal structure in the SGA: (as v$session and v$session_wait used to)

It starts with a function that has to be created in the SYS schema – so no doing this on production systems. The it’s a pipelined function so that we can treat its output like a table, which means I need to create an object type and a table type before creating the function.  In the function I select each x$ name from the list of x$ names (x$kqfta) and for each table I construct a dynamic SQL statement selecting the first row from the table.

Over the versions of Oracle, various bugs have resulted in deadlocks and crashes when selecting a row from particular X$ objects, so there a short list of exceptions that has grown gradually over time.  The code has some calls to dbms_output(), which you can choose to hide by setting serveroutput off.  Depending on your system you may see a very large number of X$ with no data in them.

create type jpl_row as
	object(x_address raw(8), x_name varchar2(30), x_indx number);
/

create type jpl_table as
	table of jpl_row;
/

create or replace function x_first
return jpl_table pipelined
as
/*	Created by x_link.sql	*/
--
--	Pipeline function to examine addresses of data in x$ objects
--	Uses types jpl_row and jpl_table to implement the pipeline
--
	dg_broker_problem exception;
	pragma EXCEPTION_INIT(dg_broker_problem, -16525);

	logminer_problem exception;
	pragma EXCEPTION_INIT(logminer_problem, -1306);

	logminer_problem2 exception;
	pragma EXCEPTION_INIT(logminer_problem2, -1307);

	mem_advisor_problem exception;
	pragma exception_init(mem_advisor_problem, -1007);

	deadlock exception;
	pragma EXCEPTION_INIT(deadlock, -60);

	no_table exception;
	pragma EXCEPTION_INIT(no_table, -942);

	m_addr		x$kqfta.addr%type;
	m_indx		x$kqfta.indx%type;
	m_table		varchar2(30);

begin
	for c1 in (select kqftanam, indx from x$kqfta) loop
		begin
			execute immediate
				' select indx, addr, ''' || c1.kqftanam ||
				''' from ' || c1.kqftanam ||
				' where rownum = 1'
				into m_indx, m_addr, m_table;
				dbms_output.put_line(m_addr || ' ' || m_table || ' ' || c1.indx);
				pipe row (jpl_row(m_addr, m_table, c1.indx));
		exception
			when no_data_found then
				dbms_output.put_line('Warning: No data-' || ' ' || c1.kqftanam || ' ' || c1.indx);
				null;
			when no_table then
				dbms_output.put_line('Warning: No table-' || ' ' || c1.kqftanam || ' ' || c1.indx);
				null;
			when logminer_problem then
				dbms_output.put_line('Warning: logminer issue' || ' ' || c1.kqftanam || ' ' || c1.indx);
				null;
			when logminer_problem2 then
				dbms_output.put_line('Warning: logminer issue' || ' ' || c1.kqftanam || ' ' || c1.indx);
				null;
			when dg_broker_problem then
				dbms_output.put_line('Warning: DataGuard issue' || ' ' || c1.kqftanam || ' ' || c1.indx);
				null;
			when mem_advisor_problem then
				dbms_output.put_line('Warning: Memory Advisor' || ' ' || c1.kqftanam || ' ' || c1.indx);
				null;
			when deadlock then
				dbms_output.put_line('Warning: deadlock' || ' ' || c1.kqftanam || ' ' || c1.indx);
				null;
			when others then
				dbms_output.put_line('Warning: other' || ' ' || c1.kqftanam || ' ' || c1.indx);
				raise;
		end;
	end loop;
	return;
end;
/

show errors

Once the function has been created it’s a simple matter to call it, ordering the output by (for example) x_addr.

select
	*
from
	table(x_first)
order by
	x_address, x_indx
;

X_ADDRESS        X_NAME                             X_INDX
---------------- ------------------------------ ----------
0000000000000000 X$KSUTM                                41
                 X$KSUSGIF                              42
                 X$KTTVS                               477
                 X$KDXHS                               487
                 X$KZSRO                               528
00000000099EC500 X$KGSKCP                              139
00000000099EC560 X$KGSKPP                              140
00000000099EC5D0 X$KGSKASP                             141
...
000000000BEE9560 X$KSFDFTYP                            119
000000000C0F5B1C X$KQDPG                               495
000000000C0F7660 X$KQFTVRTTST0                         863

0000000060000000 X$KSMMEM                               71
0000000060001190 X$KSMSP_DSNEW                          82
00000000600077CC X$KGICS                               513
...
00000000AAE1B588 X$KSQST                                32
00000000AAE35598 X$KSRPCIOS                            118
00007FFB03E92478 X$DBGDIREXT                           820
00007FFB04274F50 X$KSMSP                                75
00007FFB045D4E28 X$ACTIVECKPT                          270
...
00007FFB093A7B48 X$KXFPSST                             569
00007FFB093A9D48 X$KOCST                               638

473 rows selected.

This was 64-bit Oracle version 11.2.0.4 – and I’ve only recorded data in 473 of the possible 950 x$ structures.
As far as Tanel’s series is concerned a key detail here is the way in which you can break the range into 4 chunks:

  • The zeros – x$ which don’t take any memory but simply return the result of a real-time function call.
  • The low range (up to C0F7660) which is a fixed data segment (whose technical name in x86-speak I forget) in the SGA
  • The mid range (60000000 to AAE35598) which is the SGA heap
  • The high range (from 7FFB03E9247 upwards) which is user space (PGA/UGA addressing)

As Tanel’s first post explains, and as you can infer from the sample listing above, when you query x$ksmsp, you are querying a structure in user space..

 

 


CR Trivia

Wed, 2014-01-08 00:44

Everybody “knows” that when you do a tablescan of a table that it starts with two buffer gets on the segment header, though older versions (ca. 8i and earlier) of Oracle used to do 4 buffer gets on the segment header. The upshot of this is that many people still say that if you create a table and insert a single row then you’re going to get 3 buffer gets when you tablescan a table: two for the segment header and one for the data block:

So here’s a test, with the second set of autotrace stats which, for reasons I’ll describe shortly, may not be immediately reproducible on your system:

create table t1 (n1 number);
insert into t1 values(1);
execute dbms_stats.gather_table_stats(user,'t1');

set autotrace traceonly statistics
select * from t1;
/

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         22  consistent gets
          0  physical reads
          0  redo size
        540  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

That really is 22 buffer gets to find the one row in the table. It gets worse; I used another session to insert (and commit) a second row in the table and the tablescan to return the two rows took 36 buffer gets.

If you want to get the same results you have to be a bit careful and a bit lucky. I was using ASSM (automatic segment space management) on a tablespace with an 8MB uniform extent size; when I inserted the single row into the table Oracle formatted a batch of 16 consecutive blocks in the first extent, deciding which range of blocks to format based on my process id. When I inserted my row, the block I inserted into was again dictated by my process id – this happened to be the 10th block in the formatted range.

When I ran the tablescan I did 6 gets on the segment’s space management blocks to work out what parts of the segment were formatted, then 10 gets to find the first row in the 10th block of the range, then a further 6 gets to scan the rest of the formatted range. I do not know why it takes 6 gets to read the space management blocks – but it may be two each for the segment header block, L2 bitmap block, and L1 bitmap block – it may be four on the segment header and one each on the other two blocks.

When I inserted a second row from another session (which I had set up very carefully), Oracle decided to format another batch of 16 blocks. This highlighted a little variation on what I’ve said so far. Because I had 8MB uniform extents the first 18 blocks of the first extent were all space management blocks (segment header, one L2 bitmap, and 16 L1 bitmaps – in the opposite order to that description). Oracle formats on boundaries of 16 blocks in the extent and in this case formatted the 14 blocks that would take it from block 19 to block 32 of the extent. That’s basically why my tablescan after the second insert took an extra 14 gets.

Consequences:
If you really do need to scan a “very small” table extremely frequently (and you shouldn’t, really) then it might be a good idea to check how many blocks have been formatted compared to the number of blocks used (dbms_rowid.rowid_block_number() will help with that check) and do an “alter table move” because in this special case Oracle won’t use the “format 16 blocks somewhere” strategy, it will format only as many blocks as needed starting from the first available block in the segment. For some reason the number of gets on space management blocks is also minimised in this case so that tablescan of a “single block” takes only 2 + number of blocks in table.

Footnote:
For ASSM segments Oracle maintains a Low High Water Mark (LHWM) and a High High Water Mark (HHWM). Every block below the LHWM is guaranteed to be formatted, blocks between the LHWM and HHWM will be formatted in batches of 16; so when doing a segment scan Oracle uses the largest multiblock-read it can from the start of the segment to the LHWM, then keeps checking the space management blocks to identify the batches of 16 that it can then read (and it can read consecutive formatted batches in a single read, so the reads are not necessarily limited to 16 blocks at a time).

One little side thought – because the choice of block and block range is dictated by the process id, the pattern of data insertion and contention using shared servers can be very different from the pattern produced by dedicated servers.