Jonathan Lewis
ISS
I’d like to dedicate this posting to fellow Oak Table member Richard Foote, for reasons that the readers we have in common will immediately recognise: http://www.youtube.com/watch?v=KaOC9danxNo
The singer is Canadian astronaut Commander Chris Hadfield who has been tweeting and posting pictures from space – be careful, you may get hooked: https://twitter.com/Cmdr_Hadfield/status/332819772989378560/photo/1
Update:When I posted the link to the video it had received 1.5M views; less than 24 hours later it’s up to roughly 7M. (And they weren’t all Richard Foote). Clearly the images have caught the imagination of a lot of people. If you have looked at the twitter stream it’s equally inspiring – and not just for the pictures.
Hakan Factor
Here’s a quick and dirty script to create a procedure (in the SYS schema – so be careful) to check the Hakan Factor for an object. If you’re not familiar with the Hakan Factor, it’s the value that gets set when you use the command “alter table minimize records_per_block;”.
I was prompted to publish this note by an item on the OTN SQL forum describing a problem with partition exchange with a table when there were bitmap indexes in place and the table had been changed to have some extra columns added. (Problem as yet unresolved as I publish).
If you start playing with the Hakan Factor, you’ll find that there are some odd little bugs in what gets stored and how it gets used. (SQL updated to use bitand() to reflect comments below and Karsten Spang’s blog note; also edited following a comment on OTN to show the rest of the spare1 flag bits)
create or replace procedure show_hakan(
i_table in varchar2,
i_owner in varchar2 default user
) as
m_obj number(8,0);
m_flags varchar2(12);
m_hakan number(8,0);
begin
/* created by show_hakan.sql */
select
obj#,
/*
case
when (spare1 > 5 * power(2,15))
then (spare1 - 5 * power(2,15))
when (spare1 > power(2,17))
then (spare1 - power(2,17))
when (spare1 > power(2,15))
then (spare1 - power(2,15))
else spare1
end hakan
*/
to_char(
bitand(
spare1, to_number('ffff8000','xxxxxxxx')
),
'xxxxxxxx'
) flags,
bitand(spare1, 32767) hakan -- 0x7fff
into
m_obj,
m_flags,
m_hakan
from
tab$
where obj# in (
select object_id
from dba_objects
where object_name = upper(i_table)
and object_type = 'TABLE'
and owner = upper(i_owner)
)
;
dbms_output.put_line(
'Hakan factor for object ' ||
m_obj || ' (' ||
i_owner || '.' ||
i_table || ') is ' ||
m_hakan || ' with flags ' ||
m_flags
);
end;
/
drop public synonym show_hakan;
create public synonym show_hakan for show_hakan;
grant execute on show_hakan to public;
You’ll notice that I’ve done an “upper()” on the table and owner – that means you’re in trouble if you have created an schemas or tables with mixed-case names (but you wouldn’t do that in a production system, would you?)
Update – A little bug historyOne of the odd details of the Hakan factor is that the value it shows is one less than the number of rows that will be stored in a block; and since it looks as if the factor is not allowed to drop to zero, you can’t hack the Hakan factor to force one row per block.
So here’s a (trivial and sub-optimal) piece of code to check current number of rows per block in a simple heap table (assuming the tablespace consists of a single file):
select ct, count(*) from ( select dbms_rowid.rowid_block_number(rowid), count(*) ct from t1 group by dbms_rowid.rowid_block_number(rowid) ) group by ct order by ct ;
Here’s the output of a session, running under 9.2.0.8, cut and pasted from the screen:
SQL> @afiedt.buf
CT COUNT(*)
---------- ----------
9 1
16 1
SQL> alter table t1 nominimize records_per_block;
SQL> alter table t1 minimize records_per_block;
SQL> execute show_hakan('t1')
Hakan factor for object 48865 (TEST_USER.t1) is: 15
SQL> alter table t1 move;
SQL> @afiedt.buf
CT COUNT(*)
---------- ----------
10 1
15 1
SQL> alter table t1 nominimize records_per_block;
SQL> alter table t1 minimize records_per_block;
SQL> execute show_hakan('t1')
Hakan factor for object 48865 (TEST_USER.t1) is: 14
Every time you moved the table, 9.2.0.8 (and earlier) used the actual stored value of the Hakan Factor to rebuild the table; but if you regenerated the Hakan Factor the stored value was one less than the actual row count. So if you kept repeating the process the number of rows per block would decrease by one each time and the table would get bigger and bigger.
It’s a silly example – but the real-world relevance was that a direct path insert behaved differently from a normal insert and this could result in a significant amount of wasted space if you were doing bulk loads in your overnight batch; so the code changed in 10g to make the normal and direct path inserts consistent with each other, but the change went the wrong way and, as a side effect, you get one more row per block than suggested by the Hakan Factor – and you can’t trick the Hakan factor into enforcing one row per block any more.
Clustering_factor
Cost Based Oracle – Fundamentals (November 2005)
But the most interesting function for our purposes is sys_op_countchg(). Judging from its name, this function is probably counting changes, and the first input parameter is the block ID portion (object_id, relative file number, and block number) of the table’s rowid, so the function is clearly matching our notional description of how the clustering_factor is calculated. But what is that 1 we see as the second parameter?
When I first understood how the clustering_factor was defined, I soon realized that its biggest flaw was that Oracle wasn’t remembering recent history as it walked the index; it only remembered the previous table block so that it could check whether the latest row was in the same table block as last time or in a new table block. So when I saw this function, my first guess (or hope) was that the second parameter was a method of telling Oracle to remember a list of previous block visits as it walked the index.
And finally, Oracle Corp. had implemented an official interface to the second parameter of sys_op_countchg() – provided you install the right patch – through a new table (or schema, or database) preference type available to the dbms_stats.set_table_prefs() procedure.
I’ve been meaning to write this post for two or three months, ever since Sean Molloy sent me an email about short blog note from Martin Decker describing Bug 13262857 Enh: provide some control over DBMS_STATS index clustering factor computation. Unfortunately I’ve not yet had time to investigate the patch, but I don’t think I need to any more because Richard Foote has written it up in his latest blog post.
Read Richard’s post – it’s important.
Update 10th MayRichard’s post has, unsurprisingly, produced a buzz of excitement in his reader – and started up the discussion of how best to use this capability; so here’s another quote from the book (p.111 – available in the download of chapter 5):
So using Oracle’s own function for calculating the clustering_factor, but substituting the freelists value for the table, may be a valid method for correcting some errors in the clustering_factor for indexes on strongly sequenced data. (The same strategy applies if you use multiple freelist groups—but multiply freelists by freelist groups to set the second parameter.)
Can a similar strategy be used to find a modified clustering_factor in other circumstances? I think the answer is a cautious “yes” for tables that are in ASSM tablespaces. Remember that Oracle currently allocates and formats 16 new blocks at a time when using automatic segment space management (even when the extent sizes are very large, apparently). This means that new data will be roughly scattered across groups of 16 blocks, rather than being tightly packed.
Calling Oracle’s sys_op_countchg() function with a parameter of 16 could be enough to produce a reasonable clustering_factor where Oracle currently produces a meaningless one. The value 16 should, however, be used as an upper bound. If your real degree of concurrency is typically less than 16, then your actual degree of concurrency would probably be more appropriate.
Whatever you do when experimenting with this function—don’t simply apply it across the board to all indexes, or even all indexes on a particular table. There will probably be just a handful of critical indexes where it is a good way of telling Oracle a little more of the truth about your system—in other cases you will simply be confusing the issue.
Note particularly the comments about how the best value depends on the data in the indexed columns, the table configuration, and the degree of concurrency - you don’t necessarily want to use the same value for every index on a given table. That’s a shame, since Oracle has defined the interface as a TABLE preference, so if you set it then you get the same for every index. Despite this, if you’re prepared to put in a little control work, it does mean that you can use an official Oracle mechanism to play the game I was suggesting in the book – for each “special” index, set the preference, collect the stats, then clear the preference.
SLOB
Anyone who has used Kevin Closson’s “Silly Little Oracle Benchmark” will want to know about his significantly updated SLOB2.
v$lock
The problem of slow queries on v$lock just came up again on the OTN database forum, so I thought I’d better push out a post that’s been hanging around on my blog for the last few months. This is actually mentioned in MOS in note 1328789.1: “Query Against v$lock Run from OEM Performs Slowly” which points out that it is basically a problem of bad statistics and all you have to do is collect the stats.
However, because the view is a messy union and join of several dynamic performance views sitting on top of a load of x$ structures, the advisory from MOS is to call the procedure dbms_stats.gather_fixed_objects_stats. This is not a nice thing to do on a busy production system, especially if it has a large number of users and a large shared pool – and doing it at a representative run-time is important if you’re going to do it at all.
There is an alternative. Here’s the basic execution plan (on my 11.2.0.2) for “select * from v$lock”:
----------------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | HASH JOIN | | 1 | | 2 | VIEW | GV$_LOCK | 10 | | 3 | UNION-ALL | | | | 4 | FILTER | | | | 5 | VIEW | GV$_LOCK1 | 2 | | 6 | UNION-ALL | | | | 7 | FIXED TABLE FULL| X$KDNSSF | 1 | | 8 | FIXED TABLE FULL| X$KSQEQ | 1 | | 9 | FIXED TABLE FULL | X$KTADM | 1 | | 10 | FIXED TABLE FULL | X$KTATRFIL | 1 | | 11 | FIXED TABLE FULL | X$KTATRFSL | 1 | | 12 | FIXED TABLE FULL | X$KTATL | 1 | | 13 | FIXED TABLE FULL | X$KTSTUSC | 1 | | 14 | FIXED TABLE FULL | X$KTSTUSS | 1 | | 15 | FIXED TABLE FULL | X$KTSTUSG | 1 | | 16 | FIXED TABLE FULL | X$KTCXB | 1 | | 17 | MERGE JOIN CARTESIAN | | 100 | | 18 | FIXED TABLE FULL | X$KSUSE | 1 | | 19 | BUFFER SORT | | 100 | | 20 | FIXED TABLE FULL | X$KSQRS | 100 | -----------------------------------------------------
Note, particularly, the Cartesian merge join at line 17, which assumes there will be one row from v$session (x$ksuse) joined to 100 rows from v$resource (x$ksqrs – the structure used to represent any resources that you want to lock). The big UNION ALL is then all the different types of locks (enqueues) that you might attach to a resource. The estimates relating to these two structures are the most significant problem – v$session always has FAR more than one row in it, and v$resource isn’t small: in my little system the Cartesian join produced about 325,000 rows, and that was just after starting up the database.
Having identified a couple of critical tables, I decided to see what would happen if I just collected stats on these two objects rather than doing the whole system, and the following little piece of pl/sql did what I wanted:
begin
dbms_stats.gather_table_stats('SYS','x$ksuse',method_opt=>'for all columns size 1');
dbms_stats.gather_table_stats('SYS','x$ksqrs',method_opt=>'for all columns size 1');
end;
/
The effect of the stats was to change the plan to the following which, while it might not be the absolute best in all cases, is certainly better than the previous one (it’s possible that you may also find that it helps to collect stats on x$ksqeq (the “generic enqueues” structure) which is likely to be the next most significant in terms of number of rows):
------------------------------------------------------ | Id | Operation | Name | Rows | ------------------------------------------------------ | 0 | SELECT STATEMENT | | 133 | | 1 | HASH JOIN | | 133 | | 2 | HASH JOIN | | 10 | | 3 | VIEW | GV$_LOCK | 10 | | 4 | UNION-ALL | | | | 5 | FILTER | | | | 6 | VIEW | GV$_LOCK1 | 2 | | 7 | UNION-ALL | | | | 8 | FIXED TABLE FULL| X$KDNSSF | 1 | | 9 | FIXED TABLE FULL| X$KSQEQ | 1 | | 10 | FIXED TABLE FULL | X$KTADM | 1 | | 11 | FIXED TABLE FULL | X$KTATRFIL | 1 | | 12 | FIXED TABLE FULL | X$KTATRFSL | 1 | | 13 | FIXED TABLE FULL | X$KTATL | 1 | | 14 | FIXED TABLE FULL | X$KTSTUSC | 1 | | 15 | FIXED TABLE FULL | X$KTSTUSS | 1 | | 16 | FIXED TABLE FULL | X$KTSTUSG | 1 | | 17 | FIXED TABLE FULL | X$KTCXB | 1 | | 18 | FIXED TABLE FULL | X$KSUSE | 252 | | 19 | FIXED TABLE FULL | X$KSQRS | 1328 | ------------------------------------------------------
Thanks to Timur Akhmadeev who recently published a note pointing out that you could collect stats on individual X$ tables. Do make sure you test this on your specific version of Oracle, though, and don’t use the production system as your first test case.
MV Refresh
Materialized views open up all sorts of possibilities for making reporting more efficient – but at the same time they can introduce some “interesting” side effects when you start seeing refreshes taking place. (Possibly one of the most dramatic surprises appeared in the upgrade that switched many refreshes into “atomic” mode, changing a “truncate / append” cycle into a massively expensive “delete / insert” cycle).
If you want to have some ideas of the type of work that is involved in the materialized view “fast refresh”, you could look at a recent pair of articles by Alberto Dell’Era on (very specifically) outer join materialized views (which a link back to a much older article on inner join materialized view refresh):
systimestamp
For your entertainment – there’s nothing up my sleeves, this was a simple cut-n-paste after real-time typing with no tricks:
20:39:51 SQL> create table t1 (t1 timestamp); Table created. 20:39:55 SQL> insert into t1 values(systimestamp); 1 row created. 20:39:59 SQL> select t1 - systimestamp from t1; T1-SYSTIMESTAMP --------------------------------------------------------------------------- +000000000 04:59:50.680620 1 row selected. 20:40:08 SQL>
My laptop runs Oracle so quickly that it took only 4 seconds for 5 hours to elapse !
11.2.0.3 on 64-bit Linux – the client is running with TZ=EST5EDT, while the server is running UK Time (currently BST (GMT+1))
Comments available on MOS: 340512.1 Timestamps & time zones – Frequently Asked Questions
Another MOS note, thanks to Jure Bratina in the comments: 227334.1 – “Dates & Calendars – Frequently Asked Questions” in the question
As Niall quotes in the comments: “times are difficult”.
Oracle supplies three timestamps: systimestamp, localtimestamp, and current_timestamp. (For reasons of consistency, only one of uses an underscore ;) )
Oracle also supplies three timestamp types: timestamp, timestamp with time zone, and timestamp with local time zone.
Oracle also supplies two timezone calls: dbtimezone, and sessiontimezone
If you need to figure out all the details of how these things hang together, I think you need to set your machine timezone to something that isn’t UTC (or GMT as I still tend to call it), then use two separate machines as clients, with their timezones set to two other timezones (again avoiding UTC).
I’ve done a few experiments but without being so rigorous in my settings – my machine was running on GMT, but I opened a (UNIX) session and set the session time zone to EST5EDT to start the database, while running other (UNIX) session with different TZ settings. The reason I should have restarted the machine in a different timezone is that Oracle “normalises” some timestamps to UTC – which means there are cases when I can’t be certain whether the stored value is in UTC because it has been normalised or because it simply was the actual machine time.
So here’s a little experiment (11.2.0.2, instance started in EST5EDT, unix session running in UTC, connecting across the network to the server).
select
current_timestamp,
localtimestamp,
systimestamp
from
dual
;
CURRENT_TIMESTAMP
---------------------------------------------------------------------------
LOCALTIMESTAMP
---------------------------------------------------------------------------
SYSTIMESTAMP
---------------------------------------------------------------------------
17-APR-13 11.37.10.870658 AM +01:00
17-APR-13 11.37.10.870658 AM
17-APR-13 06.37.10.870554 AM -04:00
Notes:
systimestamp reflects the instance timestamp – which is 5 hours earlier than the session timestamp.
systimestamp returns a timestamp with time zone, not just a timestamp
localtimestamp and current_timestamp show the client time, but localtimestamp doesn’t show the timezone, current_timestamp does (the +1:00 appears because Daylight Saving Time (British Summer Time) is active so my session is one hour ahead of UTC, while the database is 4 hours behind.)
Another quick test:
create table t1 (
t0 timestamp,
tz timestamp with time zone,
tl timestamp with local time zone,
ts_type varchar2(20)
)
;
insert into t1 values(
systimestamp, systimestamp, systimestamp,
'sys Timestamp'
);
commit;
select * from t1;
T0
---------------------------------------------------------------------------
TZ
---------------------------------------------------------------------------
TL TS_TYPE
--------------------------------------------------------------------------- --------------------
17-APR-13 06.44.04.353489 AM
17-APR-13 06.44.04.353489 AM -04:00
17-APR-13 11.44.04.353489 AM sys Timestamp
select
dump(t0,16),
dump(tz,16),
dump(tl,16),
ts_type
from
t1
;
DUMP(T0,16)
------------------------------------------------------------------------------------------------------------------------
DUMP(TZ,16)
------------------------------------------------------------------------------------------------------------------------
DUMP(TL,16)
------------------------------------------------------------------------------------------------------------------------
TS_TYPE
--------------------
Typ=180 Len=11: 78,71,4,11,7,2d,5,15,11,d0,68
Typ=181 Len=13: 78,71,4,11,b,2d,5,15,11,d0,68,10,3c
Typ=231 Len=11: 78,71,4,11,b,2d,5,15,11,d0,68
sys Timestamp
Notes:
T0 – the timestamp column, has the instance timestamp in it – but doesn’t have any timezone information stored; the raw dump show the value 6:44:04 (7, 2d, 5 – convert from hex and substract one). Anyone on ANY timezone will see their output showing 6:44:04 if they select this column.
TZ – the timestamp with time zone column, has the instance timestamp, but has stored it as (b, 2d,5 – 11:44:04) with time zone information (10,3c) that allows the session to know what “global” moment the information really represents and the location (or, rather, time zone) where is was entered.
TL – the timestamp with local time zone, has the instance timestamp, but has stored it as (b, 2d, 5 – 11:44:04) with NO timezone information. So the output when you query this column is adjusted to suit the local timestamp. It’s the right “global” moment, and it displays as the relevant local time. But, as a penalty, it’s lost the information about where (in which time zone) it was entered.
I think that examination of the content of the raw dumps of the three different types may help you understand why you need to store timestamps in a column type that includes a time zone – if you don’t then you lose some information, and time-based arithmetic will give you some surprises if your application crosses timezones.
Next Issue:Indexing time (though the link in the comments below to Tony Hasler’s blog probably gives you all the answers you need), and an Oracle design error that I’ve visited before.
Deadlocks
Here’s a deadlock graph that might cause a little confusion:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-001a0015-00014787 34 90 X 32 3 S
TX-00190008-0000601b 32 3 X 34 90 S
session 90: DID 0001-0022-00000327 session 3: DID 0001-0020-000009E9
session 3: DID 0001-0020-000009E9 session 90: DID 0001-0022-00000327
Rows waited on:
Session 90: obj - rowid = 00030CE5 - AAAwzlAAGAABDiuAAA
(dictionary objn - 199909, file - 6, block - 276654, slot - 0)
Session 3: obj - rowid = 00030CE5 - AAAwzlAAGAABp8gAAA
(dictionary objn - 199909, file - 6, block - 433952, slot - 0)
Both sessions are holding X and waiting S on a TX lock. There are several well-known reasons why you might see a share (mode 4) lock on a transaction table slot: collisions on bitmap indexes, unique indexes, index organized tables, or referential integrity checks are the commonest “external” examples, and problems with interested transaction lists (ITLs) or freelists are the common “internal” ones – so the presence of the waits in share mode shouldn’t, of themselves, a source of confusion.
The confusion is in the reported rowids. If you try to interpret them as real rowids you may be unlucky and discover that they seem to be related to the SQL reported for the deadlocked sessions when really the information they hold is garbage. (Just to avoid confusion, let me make it clear that there are cases where the rowids reported definitely WILL be garbage; on the other hand, there may be some scenarios where the rowids are relevant – although I haven’t done any exhaustive check to see if there really are such scenarios.)
When I see a deadlock graph on transaction locks and the waits are for S mode I tend to assume that the information about the rows waited on is probably misleading; when the slot number for the rowid is zero this increases my confidence that the rowid is rubbish. (Zero is a legal value for a rowid slot, of course, so a zero doesn’t prove that the rowid is rubbish, it’s just a coincidence that allows me to continue following a hypothesis.)
The problem is that Oracle doesn’t waste resources tidying up after itself, and in the case of deadlock graphs this laziness shows up in the rowids reported. The trace is simply reporting whatever happens to be in the row_wait_obj#, row_wait_file#, row_wait_block# and row_wait_row# columns of v$session; and if the waiting process hasn’t updated these columns with current row information you could be looking at the details of the last row (or block) that the session waited for. Here’s the description of a test to demonstrate the behaviour:
create table t1 (n1 number, n2 number); insert into t1 values(1,1); create unique index t1_i1 on t1(n1); create unique index t1_i2 on t1(n2); session 1: insert into t1 values(2,11); session 2: insert into t1 values(3,21); session 1: insert into t1 values(4,21); session 2: insert into t1 values(5,11);
With this table, and sequence of events following it, session 1 raised a deadlock error, and dumped the following trace:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-000a0021-0000c4aa 16 93 X 13 90 S
TX-00020020-0000fdb9 13 90 X 16 93 S
session 93: DID 0001-0010-00000057 session 90: DID 0001-000D-000000F2
session 90: DID 0001-000D-000000F2 session 93: DID 0001-0010-00000057
Rows waited on:
Session 90: obj - rowid = 00000009 - AAAAAJAABAAAQJcAAA
(dictionary objn - 9, file - 1, block - 66140, slot - 0)
Session 93: obj - rowid = 0002E7DC - AAAufaAAFAAAAAJAAA
(dictionary objn - 190428, file - 5, block - 9, slot - 0)
Object 9 is the I_FILE#_BLOCK# index in the data dictionary – and session 90 is definitely not doing anything with that object in this transaction.
Object 190428 is another table in the test schema, but session 93 didn’t access it in this transaction, and the block referenced is the segment header block, not a block that could hold a row.
In fact, just before I started the sequence of inserts I ran this query from a third session (connected as sys) with the following results:
select
sid,
row_wait_obj#, row_wait_file#, row_wait_block#, row_wait_row#
from
v$session
where
sid in (90,93)
;
SID ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW#
---------- ------------- -------------- --------------- -------------
90 9 1 66140 0
93 190428 5 9 0
The “rows” reported for the deadlock simply echoed the values that were already in the row_wait columns before the test started. (The sessions were waiting on “SQL*Net message from client” at the time.)
Footnote:Session 90 created the tables and indexes, that’s probably why it happened to have an outstanding reference to the i_file#_block# index.
Session 93 had just run a script to drop all the objects in the schema, which may explain why it happened to have an outstanding reference to a segment header block
If the row_wait_obj# had been set to -1 for either session then the deadlock graph would have reported “No row” for that session.
DML Tracking
You’ve probably seen questions on the internet occasionally about finding out how frequently an object has been modified. The question is a little ambiguous – does it mean how much change has occurred, or how many DML statements have been executed; either may be an interesting measure. Of course, Oracle gave us a method of answering the first question a long time ago: v$segstat (or v$segment_statistics if you don’t mind doing the join) and the resulting content in the AWR or Statspack reports:
Segments by DB Blocks Changes DB/Inst: XXXXXXXX/XXXXXXXX Snaps: 85563-85564
-> % of Capture shows % of DB Block Changes for each top segment compared
-> with total DB Block Changes for all segments captured by the Snapshot
Tablespace Subobject Obj. DB Block % of
Owner Name Object Name Name Type Changes Capture
---------- ---------- -------------------- ---------- ----- ------------ -------
XXXXXXXXX DATA MLOG$_xxxxxxxxxxxxxx TABLE 1,144,112 22.16
XXXXXXXXX DATA PK_xxxxxxxxxxxxxxxxx INDEX 614,256 11.90
XXXXXXXXX DATA xxxxxxxxxxxxxxx TABLE 490,080 9.49
XXXXXXXXX IDX IDX_xxxxxxxxxxxxxxxx INDEX 353,392 6.84
XXXXXXXXX IDX PK_yyyyyyyyyyyyyyy INDEX 273,664 5.30
-------------------------------------------------------------
There’s still a little ambiguity in this report (in the right circumstances you can change multiple rows in a block while only reporting a single block change), but it’s a very good indicator of the volume of change to an object; but is the million or so block changes due to one SQL statement or 1,000,000 SQL statements ?
With 11.2.0.2 there is an (undocumented) option for finding the answer to that question. Since it’s undocumented I wouldn’t use it on production unless I was really desperate, and even then I’d do the standard “check with Oracle support” first; however I might use it on a test system if I were do something like running tests of a new version of an overnight suite of batch jobs.
You can enable “DML frequency tracking” which seems to count the number of DML statements that modify an object or, to be more accurate, a table. It doesn’t do anything to identify indexes, although it will report index organized tables (using the table name); it won’t identify partition-level DML individually for partitioned tables, it will simply record the DML as being relevant to the table.
To enable the feature you modify hidden parameter _dml_frequency_tracking and then query dynamic performance view v$object_dml_frequencies.
alter system set "_dml_frequency_tracking"=true; select object, working, slot0, slot1, slot2, slot3, slot4 from v$object_dml_frequencies order by working + slot0 + slot1 + slot2 + slot3 ;
The view has 25 columns for numbered “slots”, and a “working” column. Each slot represents a 15 minute interval (set by parameter _dml_frequency_tracking_slot_time) and only 4 slots are maintained by default (set by parameter _dml_frequency_tracking_slots). The values roll from slot to slot every 15 minutes, and the “working” column keeps track of objects that have suffered DML since the last roll-over. Unfortunately there doesn’t seem to be any obvious way of tracking when a roll-over takes place, so you can’t tell how long the “working” column has been active for.
If you have the default 4 slots set, then “working” and slots 0 to 2 are populated with 15 minutes worth of information each, and slot3 seems to be the accumulated history of all accesses since you enabled the feature.
The view sits on top of x$ksxm_dft, but there are no interesting extra columns in the x$ that aren’t displayed in the v$ and gv$. (It’s an interesting little quirk that the v$ doesn’t following the usual pattern of being a view of the gv$ restricted to the current instance.) You can see the information in the shared pool (by querying v$sgastat) in an area named “dml frequency” – although the figure reported the last time I checked was wrong by a factor of roughly 3 – show about 8KB when I was expecting roughly 24KB.
I haven’t spent a lot of time investigating the feature yet, so there are probably a few questions and boundary conditions to test before using it in anger; but there don’t appear to be any special latches associated with the counters (so possibly the results would be under-count in a highly concurrent system, but maybe the results are collected under a mutex).
Related parameters_dml_frequency_tracking (false) Control DML frequency tracking — can be set at system level on demand
_dml_frequency_tracking_advance (true) Control automatic advance and broadcast of DML frequencies — ?
_dml_frequency_tracking_slot_time (15) Time length of each slot for DML frequency tracking — needs restart
_dml_frequency_tracking_slots ( 4) Number of slots to use for DML frequency tracking — needs restart
Delphix Overview
Update: Here’s the link to the recording of the webinar
I’ll b online tomorrow morning (Friday 5th, 9:00 Pacific time, 5:00 pm UK) in a webinar with Kyle Hailey to talk about my first impressions of Delphix, so I thought I’d write up a few notes beforehand.
I’ve actually installed a complete working environment on my laptop to model a production setup. This means I’ve got three virtual machines running under VMWare: my “production” machine (running Oracle 11.2.0.2 on OEL 5, 64-bit), a “development” machine (which has the 11.2.0.2 software installed, again on OEL 5, 64-bit), and a machine which I specified as Open Solaris 10, 64-bit for the Delphix server VM (pre-release bloggers’ version). The two Linux servers are running with 2.5GB of RAM, the Delphix server is running with 8GB RAM, and all three machines are running 2 virtual CPUs. (My laptop has an Intel quad core i7, running two threads per CPU, 16GB RAM, and 2 drives of 500GB each.) The Linux machines were simply clones of another virtual machine I previously prepared and the purpose of the exercise was simply to see how easy it would be to “wheel in” a Delphix server and stick it in the middle. The answer is: “pretty simple”. (At some stage I’ll be writing up a few notes about some of the experiments I’ve done on that setup.)
To get things working I had to create a couple of UNIX accounts for a “delphix” user on the Linux machines, install some software, give a few O/S privileges to the user (mainly to allow it to read and write a couple of Oracle directories), and a few Oracle privileges. The required Oracle privileges vary slightly with the version of Oracle and your prefered method of operation, but basically the delphix user needs to be able to run rman, execute a couple of Oracle packages, and query some of the dynamic performance views. I didn’t have any difficulty with the setup, and didn’t see any threats in the privileges that I had to give to the delphix user. The last step was simply to configure the Delphix server to give it some information about the Linux machines and accounts that it was going to have access to.
The key features about the Delphix server are that it uses a custom file system (DxFS, which is based on ZFS with a number of extensions and enhancements) and it exposes files to client machines through NFS; and there are two major components to the software that make the whole Delphix package very clever.
Oracle-related mechanismsAt the Oracle level, the Delphix server sends calls to the production database server to take rman backups (initially a full backup, then incremental backups “from SCN”); between backup requests it also pulls the archived redo logs from the production server – or can even be configured to copy the latest entries from the online redo logs a few seconds after they’ve been written (which is one of the reasons for requiring privileges to query some of the dynamic performance views, but the feature does depend on the Oracle version).
If you want to make a copy of the database available, you can use the GUI interface on the Delphix server to pick a target machine, invent a SID, and Service name, and pick an SCN (or approximate timetamp) that you want to database to start from, and within a few minutes the Delphix server will have combined all the necessary backup pieces, applied any relevant redo, and configured your target machine to start up an instance that can use the (NFS-mounted) database that now exists on the Delphix server. I’ll explain in a little while why this is a lot cleverer than a simple rman “restore and recover”.
DxFSSupporting the Oracle-related features, the other key component of the Delphix server is the Delphix file-system (DxFS). I wrote a little note a few days ago to describe how Oracle can handle “partial” updates to LOB values – the LOB exists in chunks with an index on (lob_id, chunk_number) that allows you to pick the right chunks in order. When you update a chunk in the LOB Oracle doesn’t really update the chunk, it creates a new chunk and modifies the index to point at it. If another session has a query running that should see the old chunk, though, Oracle can read the index “as at SCN” (i.e. it creates a read consistent copy of the required index blocks) and the read-consistent index will automatically be pointing at the correct version of the LOB chunk. DxFS does the same sort of thing – when a user “modifies” a file system block DxFS doesn’t overwrite the original copy, it writes a new copy to wherever there’s some free space and maintains some “indexing” metadata that tells it where all the pieces are. But if you never tell the file system to release the old block you can ask to see the file as at a previous point in time at no extra cost!
But DxFs is even cleverer than that because (in a strange imitation of the “many worlds” interpretation of quantum theory) a single file can have many different futures. Different users can be identified as working in different “contexts” and the context is part of the metadata describing the location of blocks that belong to the file. Imagine we have a file with 10 blocks sitting on DxFs - in your context you modify blocks 1,2 and 3 but at the same time I modify blocks 1,2 and 3 in my context. Under DxFS there are now 16 blocks associated with that file – the original 10, your three modified blocks and my three modified blocks and, depending on timestamp and context, someone else could ask to see any one of three different versions of that file – the original version, your version, or my version.
Now think of that in an Oracle context. If we copy an entire set of database files onto DxFS, then NFS-mount the files on a machine with Oracle installed, we can configure and start up an instance to use those files. At the same time we could NFS-mount the files on another machine, configuring and starting another instance to use the same data files at the same time! Any blocks changed by the first instance would be written to disc as private copies, any blocks changed by the second instance would be written to discs as private copies – if both instances managed to change 1% of the data in the course of the day then DxFs would end up holding 102% of the starting volume of data: the original datafiles plus the two sets changed blocks – but each instance would think it was the sole user of its version of the files.
There’s another nice (database-oriented) feature to Delphix, though. The file system has built-in compression that operates at the “block” level. You can specify what you mean by the block size (and for many Oracle sites that would be 8KB) and the file system would transparently apply a data compression algorithm on that block boundary. So when the database writer writes an 8KB block to disc, the actual disc space used might be significantly less than 8KB, perhaps by a factor of 2 to 3. So in my previous example, not only could you get two test databases for the space of 1 and a bit – you might get two test databases for the space of 40% or less of the original database.
Delphix vs. rmanI suggested earlier on that Delphix can be a lot clever than an rman restore and recover. If you take a full backup to Delphix on Sunday, and a daily incremental backup (let’s preted that’s 1% of the database per day) for the week, then Delphix can superimpose each incremental onto the full backup as it arrives. So on Monday we construct the equivalent of a full Monday backup, on Tuesday we construct the equivalent of a full Tuesday backup, and so on. But since DxFS keeps all the old copies of blocks this means two things that we can point an instance at a full backup for ANY day of the week simply by passing a suitable “timestamp” to DxFs – and we’ve 7 full backups for the space of 107% of a single full backup.
There are lots more to say, but I think they will have to wait for tomorrow’s conversation with Kyle, and for a couple more articles.
Register of Interests / DisclosureDelphix Corp. paid my consultancy rates and expenses for a visit to the office in Menlo Park to review their product.
Index Selectivity
Here’s a summary of a recent posting on OTN:
I have two indexes (REFNO, REFTYPESEQNO) and (REFNO,TMSTAMP,REFTYPESEQNO). When I run the following query the optimizer uses the second index rather than the first index – which is an exact match for the predicates, unless I hint it otherwise:
select *
from RefTable
where RefTypeSeqNo = :1
and RefNo = :2;
Default plan:
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 126 | 6 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| REFTABLE | 3 | 126 | 6 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | REFTABLE_CX03 | 3 | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("REFNO"=TO_NUMBER(:2) AND "REFTYPESEQNO"=TO_NUMBER(:1))
filter("REFTYPESEQNO"=TO_NUMBER(:1))
Hinted plan:
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 126 | 15 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| REFTABLE | 3 | 126 | 15 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | REFTABLE_CX02 | 14 | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("REFNO"=TO_NUMBER(:2) AND "REFTYPESEQNO"=TO_NUMBER(:1))
This is actually an example of a feature of the optimizer that I described a few years ago. The original note described a change as you moved from 10.1 to 10.2 and on to 11.1 – but once you’ve seen the basic issue there are a number of variations on how it might appear. In this case the OP seems to be using 10gR2, where the distinct_keys value from an index is used to calculate the cost and row estimate for the index and for the table access cost when that specific indexed access path is being considered.
So with the hint to use the accurate index we see an index cardinality estimate of 14 rows with a table cardinality of 3 despite the fact that the plan shows no extra predicates applied at the table; the cost of accessing the table is also clearly related to the cardinality estimate on the index line.
In the default plan when the wrong index is used, the optimizer doesn’t pay any attention to the distinct_keys from the other index, and simply uses the standard “product of column selectivities”.
11g introduces two changes – when calculating the table cardinality the distinct_keys value for the index is carried forward (so the plan with the high index cardinality but low table cardinality would report the same cardinality for both operations), and the distinct_keys from the first index would be used when doing the calculations for the second index – which would increase the cost of using the wrong index.
There’s really very little you can do to find a strategic fix for this type of problem in 10g – obviously you could add hints whenever Oracle used the wrong index, but that’s not reallya desirable approach, and it is possible to adjust column statistics in such a way that the calculations the optimizer uses give better approximations, but that’s not always very easy to do well. Ultimately you just have to be very careful about your choice of indexes – and when you think that two indexes show a significant overlap in columns consider the possibility that one carefully defined index may be able to do the job of both.
Missing SQL
From time to time I’ve looked at an AWR report and pointed out to the owner the difference in work load visible in the “SQL ordered by” sections of the report when they compare the summary figure with the sum of the work done by the individual statements. Often the summary will state that the captured SQL in the interval represents some percentage of the total workload in the high 80s to mid 90s – sometimes you might see a statement that the capture represents a really low percentage, perhaps in the 30s or 40s.
You have to be a little sensible about interpreting these figures, of course – at one extreme it’s easy to double-count the cost of SQL inside PL/SQL, at the other you may notice that every single statement reported does about the same amount of work so you can’t extrapolate from a pattern to decide how significant a low percentage might be. Nevertheless I have seen examples of AWR reports where I’ve felt justified in suggesting that at some point in the interval some SQL has appeared, worked very hard, and disappeared from the library cache before the AWR managed to capture it.
Now, from Nigel Noble, comes another explanation for why the AWR report might be hiding expensive SQL – a bug, which doesn’t get fixed until 12.2 (although there are backports in hand).
Open Cursors
Here’s a little detail that appeared in 11gR2 that may help you answer questions about open cursors. Oracle has added a “cursor type” column to the view v$open_cursor, so you can now see which cursors have been held open because of the pl/sql cursor cache, which have been held by the session cursor cache, and various other reasons why Oracle may take a short-cut when you fire a piece of SQL at it.
The following is the output showing the state of a particular session just after it has started up in SQL*Plus and called a PL/SQL procedure to run a simple count:
select
cursor_type, sql_text
from
V$open_cursor
where
sid = 17
order by
cursor_type,
sql_text
;
CURSOR_TYPE SQL_TEXT
-------------------------------- ------------------------------------------------------------
DICTIONARY LOOKUP CURSOR CACHED BEGIN DBMS_APPLICATION_INFO.SET_MODULE(:1,NULL); END;
BEGIN DBMS_OUTPUT.DISABLE; END;
BEGIN DBMS_OUTPUT.ENABLE(1000000); END;
BEGIN dbms_random.seed(0); END;
SELECT ATTRIBUTE,SCOPE,NUMERIC_VALUE,CHAR_VALUE,DATE_VALUE F
SELECT CHAR_VALUE FROM SYSTEM.PRODUCT_PRIVS WHERE (UPPER('
SELECT USER FROM DUAL
select /*+ connect_by_filtering */ privilege#,level from sys
select SYS_CONTEXT('USERENV', 'SERVER_HOST'), SYS_CONTEXT('U
select decode(failover_method, NULL, 0 , 'BASIC', 1, 'PRECON
select metadata from kopm$ where name='DB_FDO'
select privilege# from sysauth$ where (grantee#=:1 or grante
select to_char(sysdate,'hh24miss') time_now from dual
select value$ from props$ where name = 'GLOBAL_DB_NAME'
OPEN BEGIN spin_1; END;
table_1_ff_208_0_0_0
OPEN-RECURSIVE insert into sys.aud$( sessionid,entryid,statement,ntimestamp
PL/SQL CURSOR CACHED SELECT COUNT(*) X FROM KILL_CPU CONNECT BY N > PRIOR N START
SESSION CURSOR CACHED BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
SELECT DECODE('A','A','1','2') FROM DUAL
Variations are left to the user.
There are a few other cursor types – here’s the list given in the 11.2 Server Reference manual under the definition of v$open_cursor:
- BUNDLE DICTIONARY LOOKUP CACHED
- CONSTRAINTS CURSOR CACHED
- DICTIONARY LOOKUP CURSOR CACHED
- OPEN
- OPEN-PL/SQL
- OPEN-RECURSIVE
- PL/SQL CURSOR CACHED
- REPLICATION TRIGGER CURSOR CACHED
- SESSION CURSOR CACHED
It’s an interesting exercise to consider why there are so many types, and then create some tests to confirm or refute your hypothesis. I haven’t checked, but here are a few ideas:
- replication trigger cursor cached: I don’t remember which version introduced the change, but once upon a time the triggers updating the materialized view logs were real after insert/update/delete triggers, but now they’re “pre-compiled” – so it’s not surprising they form a special case.
- dictionary lookup cursor cached: are these, perhaps, the statements that are currently cached in the “_row_cache_cursors” cache for data dictionary access; the parameter was once set to 10, but currently defaults to 20.
- bundle dictionary lookup cached: why would there be a special case of dictionary lookup ? perhaps this is the set of cursors needed to read the first few tables in the data dictionary that allow the optimizer to do its work (how do you optimize a query against tab$ if you need to query syn$, obj$ and tab$ to discover that tab$ is a table ?)
- constraints cursor cached: probably something to do with the SQL (internal, or externalised) that Oracle has to run to check or implement details of referential integrity constraints.
By a strange coincidence a note came up on OTN today that pointed to a different version of the Oracle manual where the possible cursor types are listed under their “internal” names – but I’m not sure if there’s a version of Oracle where you’d see them looking like this:
- CACHED
- KNT CACHED
- KQD BUNDLE CACHED
- KQD CACHED
- KXCC CACHED
- PL/SQL
- PL/SQL CACHED
- SYSTEM
LOB Update
This note is about a feature of LOBs that I first desribed in “Practial Oracle 8i” but have yet to see used in real life. It’s a description of how efficient Oracle can be, which I’ll start with a description of, and selection from, a table:
create table test_lobs (
id number(5),
bytes number(38),
text_content clob
)
lob (text_content) store as text_lob(
disable storage in row
cache
)
;
-- insert a row
SQL> desc test_lobs
Name Null? Type
----------------------- -------- ----------------
ID NUMBER(5)
BYTES NUMBER(38)
TEXT_CONTENT CLOB
SQL> select id, bytes, dbms_lob.getlength(text_content) from test_lobs;
ID BYTES DBMS_LOB.GETLENGTH(TEXT_CONTENT)
---------- ---------- --------------------------------
1 365025 365025
1 row selected.
I’ve got a table with a single CLOB column holding a single row. The size of the single CLOB is roughly 365KB (or about 45 blocks of 8KB). Old hands who have had to suffer LONG columns will recognise the trick of recording the size of a LONG as a separate column in the table; it’s a strategy that isn’t really necessary with LOBs but old coding habits die hard. It’s quite hard to find details of how much space has been used in a LOB segment (the space_usage procedure in the dbms_space package doesn’t allow you to examine LOBSEGMENTs), but I did a coupld of block dumps to check on this LOBSEGMENT and it had allocated 46 blocks on the first insert.
So here’s the clever bit – how big will the LOBSEGMENT grow when I update that one CLOB ?
It’s common knowledge (to users of LOBs) that the undo mechanism Oracle has for LOBs is simply to leave the old LOB in place and create a new one – so the intial response to the question might be to guess that the LOBSEGMENT will grow to roughly double the size. But it doesn’t have to be like that, at least, not if you update the LOB the way I happen to have done, which is like this:
declare
m_length integer;
m_lob clob;
begin
select
text_content,
dbms_lob.getlength(text_content)
into m_lob, m_length
from
test_lobs
where
id = 1
for update
;
dbms_output.put_line('Lob size: ' || m_length);
dbms_lob.write(
lob_loc => m_lob,
amount => 17,
offset => 1,
buffer => 'This is an update'
);
commit;
end;
/
My code very specifically changes only the first 17 bytes of the LOB. So how much does Oracle have to do to effect this change ? The LOB-handling mechanisms are smart enough to work out that only the first (of 45) blocks in the LOB need to be changed, so Oracle need only add one block to the segment and write the new version of the first LOB block to that one block. (In fact the segment – which was in a tablespace using freelist management – grew by the “standard” 5 blocks from which Oracle selected just one block to add to the LOB.)
So how does Oracle keep track of the whole LOB if it can change it one piece at a time ? This is where the (notionally invisible and you don’t need to know about it) LOBINDEX comes into play. Oracle maintains an index keyed by (LOB_ID, chunk_number) *** pointing to all the chunks of a LOB in order, so when you update a single chunk Oracle simply creates an updated copy of the chunk and changes the appropriate index entry to point to the new chunk. So here’s an image representing our one LOB value just after we’ve created it and before we’ve updated:
And then we “modify” the first chunk – which means we have to add a chunk (which in this case is a single block) to the segment, create a new version of the first chunk, modify the index to point to the new block, and add an index entr – keyed by time-stamp – to the end of the index to point to the old chunk; something like this:
Now, when we run a query to select the LOB, Oracle will follow the index entries in order and pick up the new chunk from the end of the LOBSEGMENT. But the LOBINDEX is protected by undo in the standard fashion, so if another long-running query that started before our update needs to see the old version of the LOB it will create a read-consistent copy of the relevant index leaf block- which means that from its perspective the index will automatically be pointing to the correct LOB chunk.
The index is actually quite an odd one because it serves two functions; apart from pointing to current lobs by chunk number, it also points to “previous” chunks by timestamp (specifically the number of seconds between Midnight of 1st Jan 1970 and the time at which the chunk was “overwritten”). This makes it easy for Oracle to deal with the retention interval for LOBs – any time it needs space in the LOBSEGMENT it need only find the minimum timestamp value in the index and compare it with “sysdate – retention” to see if there are any chunks available for re-use.
To sum up – when you update LOBs, and it’s most beneficial if you have an application which doees piece-wise updates, you leave a trail of old chunks in in the LOBSEGMENT. The version of the LOB you see is dictated by the version of the index that you generate when you request a copy of the LOB at a given SCN.
*** Footnote: My description of the LOBINDEX was an approximation. Each index entry carries a fixed size “payload” listing up to eight lob chunks; so the (LOB_ID, chunk_number) index entries in a LOBINDEX may point to every 8th chunk in the LOB. The significance of the “fixed size” payload is that the payload can be modified in place if the pointer to a LOB chunk has to be changed – and this minimises disruption of the index (at a cost of some wasted space).
Delphix Debrief
I’ve had my week in Palo Alto with the Delphix people. I really don’t know where the time went to – but I had a lot of interesting conversations with a number of very able people; plenty of time to experiment; and I’ve even been able to install Delphix on my laptop (as a 64-bit Open Solaris 10 VM under VMWare). I liked the product, and I was impressed with the team they’ve got working on it.
We’ve pencilled in an intial online webinar for Friday 5th April which will probably feature an informal chat where Kyle Hailey and I talk about my impressions of the product and what I’ve done with it so far. Then, in about a month’s time, we’ll have a more technical discussion.
In the meantime, it occurred to me that my next blog post ought to be about LOBs as this would be a nice way to introduce you to one of the key ideas behind DxFS (the Delphix-extended file system that’s based on ZFS).
Update 22nd MarchThe link to register for the online conversation is now up. We’re timetabled for 5th April 9:00 am PDT (which is currently 4:00 pm GMT, but it will be 5:00 pm BST by then.)
Lock Bug
Here’s an oddity that I ran into a little while ago while trying to prepare a sample trace file showing a particular locking pattern; it was something that I’d done before, but trace files can change with different versions of Oracle so I decided to use a copy of 11.2.0.2 that happened to be handy at the time to check if anything had changed since the previous (11gR1) release. I never managed to finish the test; here are the steps I got through:
-- created a table t1 to work with, no data needed, column definitions don't matter. Session 1: lock table t1 in row exclusive mode; -- mode 3, lock acquired Session 2: lock table t1 in row exclusive mode; -- mode 3, lock acquired Session 3: lock table t1 in exclusive mode; -- mode 6, goes into wait for session 1 Session 1 (again): lock table t1 in share row exclusive mode; -- mode 5, should move to converter queue, wait for session 2
The mode 3 to mode 5 conversion is what happens if you have a foreign key referential integrity constraint without a covering index and run code that does something like “delete child rows for parent X, delete parent X” (which also happens under the covers if your foreign key is declared as “on delete cascade”).
Notice the “should” on the lock conversion line – this is exactly what happens in 10.2.0.3 and 11.2.0.3; but on the system I was using session 1 got an immediate deadlock (ORA-00060) error – in the absence of a deadlock scenario !
My problem is this – the instance that gives the deadlock error is 64-bit 11.2.0.2 running RAC on Linux (OEL 5); the instances that don’t give the error are 32-bit, non-RAC, running on Windows XP Pro. SO is the anomaly due to some difference in:
- Windows vs. Linux
- 32 bit vs. 64 bit
- RAC vs. non-RAC
- Specific version of Oracle
I’m inclined to think it’s a version dependent bug, but it’s possible that it’s a necessary side effect of RAC. So if you’ve got 11.2.0.3 RAC, or 11.2.0.2 non-RAC, or the terminal releases of 10g and 11.1 whether RAC or non-RAC, I’d like to hear from you which versions – if any – produce the same deadlock and which don’t. (And the scientific method being what it is, anyone with 11.2.0.2 on RAC on Linux might like to confirm – or contradict – my result.)
Update 8:30 p.m. GMTThe results in so far tend to support the idea that this may be a RAC-related issue. Justin Mitchell’s result is particularly interesting, and I confirmed on my 11.2.0.2 RAC system that if the two starting sessions are on different nodes then I don’t get the deadlock behaviour.
Users of RAC will probably be aware of the fact that v$lock.block commonly sets itself to 2 (potential blocker) on RAC the moment you acquire the lock; and this is what happens to both the “row exclusive” locks. When we request the exclusive lock, both TM locks change from block=2 to block=1 if they are on the same node; if they are on different nodes then only one of them changes.
Unfortunately Oleksandr didn’t see a deadlock when he tried RAC 11.2.0.3 – and since he didn’t make any comment about using multiple nodes we will have to wait to see if he can repeat the test and tell us whether his results are consistent with Justin’s, or whether they suggest that there are further considerations in effect.
Virtual DB
I’m heading off to Heathrow airport later on today to fly out to San Francisco for my week of experimenting with Delphix. I’ve done a little preparation work, of course, including browsing around the Internet to read about related technologies. Some of the material I found was very interesting, so I thought I’d go publish a few of the links that might be useful to other people.
It’s quite surprising to see how long the necessary core technology has been around; and yet there seems to have been minimal follow-up on the possibilities the technology makes available – perhaps because of the specific hardware, or special skills needed to put put together a working solution.
Here’s a short series from “Oracle Storage Guy” comparing Netapp and EMC (with a bit of a bias towards EMC) - it’s a few years old, and the companies mentioned have probably moved on, reviewed strategies, and refined what they do, but I suspect it’s still good background information for the non-specialist.
- Oracle backup – which storage is best – pt.1
- Oracle backup – which storage is best – pt.2
- Oracle backup – which storage is best – pt.3
- Oracle backup – which storage is best – pt.4
And an introduction to some OpenWorld presentations on Oracle’s dbClone from the same person.
A couple of posts by Kevin Closson on NFS and CloneDB
And a posting about Oracle’s clonedb from Tim Hall
A couple of (fairly long) Oracle white papers on cloning
- Cloning on ZFS with Rman – by Oracle
- Cloning on ZFS with Data Guard – by Oracle
- See also My Oracle Support article 1210656.1
And, of course, a couple of items from Kyle Hailey (Delphix) that talk about why Delphix is the right solution at the right time. Both items come with a useful list of links to articles on other technologoes
- Database Virtualization
- Database Virtualization and Instant provisioning (including video presentation)
Finally, a couple of posts from end-users:
Bottom line – it looks as if Delphix has created a product that is more flexible, more powerful and easier to use than the competition that has been around for several years, and they’ve done it on generic hardware. I’ll be trying to find out how reality lines up with expectations over the next week.
Duplicate indexes ?
I don’t think this is likely to happen on a production system (until 12c) – but look what you can do if you try hard enough:
1 select
2 index_name, column_name from user_ind_columns
3 where
4 table_name = 'T1'
5 order by
6* index_name , column_position
SQL> /
INDEX_NAME COLUMN_NAME
-------------------- --------------------
T1_I1 N1
V1
T1_I2 N1
V1
4 rows selected.
That’s a straight cut-n-paste from an Oracle 11.1.0.7 SQL*Plus session. (You can tell I typed it in real time because I missed the return before the FROM, and couldn’t be bothered to go back and do it again ;) )
Source Control
You may recall that I spent some time with the developers at the redgate offices in Cambridge (UK) a little while ago, looking at their Source Control for Oracle package. The product is about to go live, with a launch date of 12th March.
Because of the help I’ve given them they’ve offered my readers the chance of winning one of two 5-user licences for the product – provided I devise a strategy for picking the recipients.
So, to make it easy, all you have to do is persuade me that you really need or really deserve a copy. Write up a short description in the comments of the worst problem you’ve had to face because you didn’t have a decent source control system, or the best use you think you could make of a source control system for Oracle.
I’ll get the redgate developers to read what you wrote, and they will be the final judges of the two most interesting, or possibly most horrifying, or maybe even the most entertaining, cases.
Entries to be in by 23:59 GMT on 11th March.
UpdateThe product has launched early !
You can find out more and see screenshots on this web page.
And there’s an online demo of the tool on March 14 at 16:00 GMT (17:00 CET / 11:00 EST / 10:00 CST / 08:00 PST), where you will be able to ask any questions. If you’re interesting in seeing it you’ll have to register at this “go to meeting” URL.
The closing date for entries is still 11th March, 23:59 GMT.





