DBA Blogs
My First E-Book – Integrating Oracle Applications E-Business Suite 12.1.1 with OID 11.1.1.6 and OAM 11.1.2
I have released my first eBook on Amazon to share my experience on integrating Oracle Applications E-Business Suite 12.1.1 with Oracle Internet Directory 11.1.1.6 and Oracle Access Manager 11.1.2. The OAM version 11.1.2 got certified recently with 12.1 Ebusiness Suite. The initial idea was to make a blog post describing my experience and the bottlenecks faced. But as I went on digging deep into the project, I figured out that the path was not too short, simple, straight and smooth. I had to refer to numerous blogs, books, My Oracle Support documents, Oracle forums to seek help and look for answers to first-hand problems. Once the project was over, I thought of composing an eBook.
Link: eBook
I am personally very excited about this. The support I received from the Pythian crew, my fellow DBAs and from friends all around the world have been amazing and I am thrilled. I would like to write more!
We are all excited about Fusion, aren’t we? We are working on the Fusion technologies like Weblogic, OAM, OID, Portal, Webcenter, SSO, SOA, Databases, the various upgrade paths and the list is endless. When Oracle Applications 12.2 gets released, the architecture is going to change and new features like online patching, dual file system will be introduced. Middle tier will be hosted on Weblogic.
The following is what comes with IDM and IDAM:
Identity Management (IDM)
Includes:
- HTTP Server
- Internet Directory
- Directory Integration Platform
- Virtual Directory
- Directory Services Manager – for synchronization and provisioning
- Identity Federation
- Security Developer Tools
- Enterprise Manager Fusion Middleware Control
Required Additional Software:
- Identity Management
- WebLogic Server
- Repository Creation Utility
- Patch Scripts
- Oracle Database
Oracle Identity and Access Management (IDAM)
Includes:
- Access Manager
- Adaptive Access Manager
- Identity Manager
- Identity Navigator
- Oracle Security Token Service
- Oracle Entitlements Server
Required Additional Software:
- WebLogic Server
- Repository Creation Utility
- Patch Scripts
- Oracle Database
- BI Publisher
- SOA Suite (for Oracle Identity Manager)
Softwares used for the project:
- Oracle E-Business Suite R12.1.1 + few AD/TXK patches + AccessGate 1.2.1 patch
- Oracle JDK 6 Update 37 for Linux x86-64
- Weblogic 10.3.5
- Oracle Fusion Middleware Repository Creation Utility 11g (11.1.1.6.0)
- Oracle Identity Management 11g Patch Set 5 (11.1.1.6.0)
- Oracle Fusion Middleware Repository Creation Utility 11g (11.1.2)
- Oracle Identity and Access Management 11g (11.1.2.0.0)
- Oracle Fusion Middleware Web Tier Utilities 11g (11.1.1.2.0)
- Oracle Fusion Middleware Web Tier Utilities 11g Patch Set 4 (11.1.1.5.0)
- Oracle Access Manager WebGates 11.1.1.5.0
This book covers details about the basic architecture, flow of data/information, implementation plan comprising 52 steps and each step supported by command outputs and screenshots for every mouse click.
There are new things we need to be careful about while installing the products.
For example, the DIP (Directory Integration provisioning) process will fail to become active unless the required processes are started up in the correct order. First the IDM admin server, then the node manager followed by the ldap processes and finally the ODS managed server on top of which DIP application is resting. Review Note 887653.1 for the sequence of DIP startup if you find DIP is down.
If DIP is down, the asynchronous user propagation from OID to E-Biz will not happen. The provisioning template used for E-Biz and OID integration is bi-directional in this book.
OAM 11.1.2 comes up with additional steps related to the security store we need to perform before starting up the admin server, nodemanager and OAM managed server. Else, we need to wipe out the whole installation and go for it afresh.
We will also get to have a better understanding of the role of AccessGate and Webgate.
The troubleshooting section is interesting and I hope it will be of immense help.
This book is only available in the Kindle version. If you do not have a kindle/iPad/iPhone device, no worries! You can download the kindle software for your PC from Kindle for PC and then sign up in Amazon and get the book delivered directly to your kindle software installed on your PC.
The following documents act as good references:
- Oracle Fusion Middleware 11g – Video and Podcasts Index [ID 1307123.1]
- Overview of Single Sign-On Integration Options for Oracle E-Business Suite [ID 1388152.1]
- Migrating Oracle Single Sign-On 10gR3 (10.1.4.3) to Oracle Access Manager 11gR2 (11.1.2) with Oracle E-Business Suite [ID 1485033.1]
- Integrating Oracle E-Business Suite Release 12 with Oracle Access Manager 11gR2 (11.1.2) using Oracle E-Business Suite AccessGate [ID 1484024.1]
- LDAP Directories Explained: An Introduction and Analysis – by Brian Arkills
I am already working on my next book which will carry details about SSO 10g upgrade to OAM 11g leveraging mod-osso component and bypassing the AccessGate and webgate layer. It will also cover OAM integration with WNA (Windows Native Authentication) and OID 11g integration with Microsoft AD (Active Directory). I am planning to release it in quick time. So stay tuned!
Happy Reading!
Become a published author
When a few colleagues and I were getting started with writing books about our favourite software (you guessed it … the Oracle Server) we did not have a lot of options at our disposal. I had met some players in the Oracle space when attending my first conference in Anaheim USA in the fall of 1990. I was working for the Office of the Auditor General (OAG) in Ottawa and finally made the move to get to one of the user group shows. Once I demonstrated interest in going, my immediate superior plus a few colleagues decided they really wanted to go. My application was held up for a while as others checked in on their inability to attend based on schedule conflicts.
Off I went – a young whipper-snapper never having attended any events of that size. It was an educational beehive and I could not wait to get back to the office and try out new stuff. We were running 6.0 at the OAG in those days, my being roughly 10 months into my tenure when I toddled off to California for 5 days of my first Oracle software love-in. It was magic to say the least.
I got wind of a show happening in Washington DC the following April called ECO ’91 and that was the start of my torrid career giving papers at user group and vendor events. The following September I gave a paper at IOUW 1991 in Miami Beach called “Hark I’ve Logged”. The tidal wave began. I hooked up with some people from the Boston area who assisted a meteoric rise in the Oracle space for this young lad from Eastern Canada. In early 1994, I was approached to assist rescuing an initiative called “Tuning Oracle” which turned out to be the first publication in the Oracle Press series.
The first round of raw material had been presented to the publisher called Osborne McGraw-Hill in those days. There were lots of holes in the work and I spent hours of my time getting the finishing touches on the work, published in the fall of 1994. We had no other avenues to get technical chit-chat out there; no BLOGging, no social media but some hard-copy publications. The first round of editing was done via a brown paper envelope being dispatched from Berkeley California to my doorstep followed by surface mail back to the copy editor. It was a chore but was exciting.
A third player joined the team and we went on to write four works and a wealth of updates as new versions of the Oracle software were released. By 2002 we had settled into a steady list of OracleX: A Beginner’s Guide, having left the other topics for others better armed to pull it off. As we speak, Ian, Michelle, yours truly and a handful of others are putting the finishing touches on Oracle Database XX: Install, Deploy, Manage slated for release coincident with Oracle’s next version of the database.
So what’s changed … the routes one can follow to be a published author. Check this out as a prime example of where the publishing business has gone … electronic, digital and did I mention “electronic” or “digital”. Have a look at Subhajit’s offering at Amazon and let that get your creative juices flowing. No more hard-copy; no more exhaustive rounds of back and forth editing and revising. Just ensure at least one other person sees the work as it progresses and keep the following in mind from someone who has been there and back many may times:
- put a stake in the ground called “deadline” and stick to it as if someone was pressuring you to be done by that date
- once others have seen and offered feedback on the content, freeze the chapter and don’t be tempted to go back for another round of revisions – if you do this you will never finish
- be careful about publishing copyrighted material
- spell check your work as you go keeping in kind that this electronic engine is not foolproof and a manual viewing of all written material is warranted as well
- be careful with acronyms … when first used in a chapter, expand them to remove confusion they could generate in the readers’ mind
- avoid idioms and local colloquialisms in your text as they can alienate your readership – the last thing you want to do
Happy writing and publishing …
Short-Circuiting the COST
If it finds a predicate that causes a "short-circuit" --- one that prevents rows from being returned by the query --- it can evaluate the COST of the query to 0 (zero). Normally, we would never expect a COST to zero --- even where we know zero rows will be returned by the query, Oracle may have to undertake some I/O (a Full Table Scan or an Index Range Scan in the simplest cases) and some CPU cycles to verify the resulting blocks for the expected result.
Here I show a simple example of a short-circuit.
First, I build a Table and an Index, with statistics :
SQL> create table my_tableSo we know that all the statistics (Row Count in the table and number of Distinct values in the Index) are consistent. I then run a simple query :
2 as select * from dba_objects;
Table created.
SQL> select count(*) from my_table;
COUNT(*)
----------
76609
SQL> create index my_table_ndx on my_table(owner);
Index created.
SQL> exec dbms_stats.gather_table_stats('','MY_TABLE',estimate_percent=>100,cascade=>TRUE);
PL/SQL procedure successfully completed.
SQL>
SQL> select num_rows, sample_size
2 from user_tab_statistics
3 where table_name = 'MY_TABLE';
NUM_ROWS SAMPLE_SIZE
---------- -----------
76609 76609
SQL> select num_distinct
2 from user_tab_col_statistics
3 where table_name = 'MY_TABLE'
4 and column_name = 'OWNER';
NUM_DISTINCT
------------
44
SQL> select leaf_blocks, distinct_keys, num_rows, sample_size
2 from user_ind_statistics
3 where table_name = 'MY_TABLE'
4 and index_name = 'MY_TABLE_NDX';
LEAF_BLOCKS DISTINCT_KEYS NUM_ROWS SAMPLE_SIZE
----------- ------------- ---------- -----------
183 44 76609 76609
SQL>
SQL> col object_name format a30Oracle has estimated that it will fetch 1741 rowids from the index in the plan step id=3 and then fetch 1,741 rows and also apply a filter for object_name in plan step id=2 to reduce the row count to 24.
SQL> col object_type format a18
SQL> set autotrace on
SQL> select object_name, object_type
2 from my_table
3 where owner = 'HEMANT'
4 and object_name like 'S%'
5 order by 1,2;
OBJECT_NAME OBJECT_TYPE
------------------------------ ------------------
SOURCE_PK INDEX
SOURCE_PK1 INDEX
SOURCE_TABLE TABLE
Execution Plan
----------------------------------------------------------
Plan hash value: 1587485563
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 24 | 984 | 53 (2)| 00:00:01 |
| 1 | SORT ORDER BY | | 24 | 984 | 53 (2)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID| MY_TABLE | 24 | 984 | 52 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | MY_TABLE_NDX | 1741 | | 5 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OBJECT_NAME" LIKE 'S%')
3 - access("OWNER"='HEMANT')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
579 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
3 rows processed
SQL>
The actual resulting row count is 3. (The 24 row count is an estimate)
The estimated cost of the Index Range Scan is 5, the estimated cost of the Table Access (1741 rowids) and Filter is 47 (52-5) and the estimated cost of the Sort (for 24 rows) is 1 (53-52), resulting in a total cost of 53.
Oracle read a total of 5 blocks (consistent gets).
What happens if I add an "AND 1=2" predicate (which is always FALSE) ?
SQL> select object_name, object_typeThe query returns no rows (i.e. zero rows).
2 from my_table
3 where owner = 'HEMANT'
4 and object_name like 'S%'
5 and 1=2
6 order by 1,2;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 422461895
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 41 | 0 (0)| |
| 1 | SORT ORDER BY | | 1 | 41 | | |
|* 2 | FILTER | | | | | |
|* 3 | TABLE ACCESS BY INDEX ROWID| MY_TABLE | 24 | 984 | 52 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | MY_TABLE_NDX | 1741 | | 5 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NULL IS NOT NULL)
3 - filter("OBJECT_NAME" LIKE 'S%')
4 - access("OWNER"='HEMANT')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
356 bytes sent via SQL*Net to client
408 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
0 rows processed
SQL>
But notice the "filter (NULL IS NOT NULL)". This is the short-circuit that Oracle adds. This will always evaluate to FALSE. This is Oracle's way of translating the "1=2" condition that I added to the query.
Although there is a supposed COST of 5 for the 1,741 rowids from the Index Range Scan and a supposed COST of 47 (52-5) for the Table Access of the 1,741 rowids and filtering to 24 rows for "OBJECT_NAME LIKE 'S%'), Oracle then proceeds to add the NULL IS NOT NULL FILTER before the SORT ORDER BY. This converts the COST to 0.
We can see that Oracle*did not do any I/O* because the Statistics show 0 block gets (consistent gets).
Let me say that again : Oracle did NOT do any I/O inspite of the supposed Index Range Scan and Table Access by Index Rowid.
.
.
.
Don’t focus on cost of execution plan
I don’t focus on the Oracle optimizer’s cost of a particular execution plan when I’m tuning a query and I’m worried that many of the developers and DBAs that I’ve talked with about query tuning are too focused on lowering the cost of a plan when attempting to tune a query.
I hear comments like this all the time when talking about making a change X to improve the performance of a query. “I changed X and the cost in the explain plan was so much lower.” X could be adding an index or hint or making a parameter change, etc.
I just cringe inside when I hear this and I hope I am gracious but all the time I’m filled with fear that the person I’m talking to is missing a key concept when it comes to Oracle query tuning. The concept is just that in many cases the optimizer’s estimated cost is far off from reality. So, making a change and seeing the cost of the plan go down really doesn’t mean much. It could directly correlate to corresponding improvement in the query run time or it could be just the opposite. It is kind of like the buffer cache hit ratio. Sometimes this ratio really means something and sometimes it doesn’t.
Instead of focusing on the cost I focus on the plan itself. Based on my investigation of the tables in the query and how many rows will be accessed from each I’ve come up with an idea of a plan that should be better than the one I’m improving. So, my question about a proposed change X becomes “Does change X cause the plan to change to the one I determined to be better?”
I attempted to lay out this approach in my Intro to SQL Tuning presentation. For me query tuning is kind of like programming. I’m figuring out the best way to really do the steps of the plan based on my own study of the existing tables. Then I just have to figure out what change to make to get the optimizer to run the query my way. This is a time consuming approach but I would only spend the time on queries that really need it. Who has time to tune every query?
So, my recommended approach to query tuning is to figure out a good plan on your own and then to figure out how to get the database to run it your way. I don’t recommend focusing on what the cost of the new plan is compared with the original plan.
- Bobby
Log Buffer #309, A Carnival of the Vanities for DBAs
Oracle, SQL Server, and MySQL; these database technologies among various other similar innovations are running this world virtually and bloggers have got lot to say in this regard. This Log Buffer Edition is yet another voice in this arena.
Oracle:
Oliver Steinmeier is explaining the JDeveloper and Fusion Applications in a lucid way.
London prepares for a surge of developer brainpower during the Devoxx UK conference on Tuesday and Wednesday, March 26th and 27th, Yolande Poirer tells us.
It’s hard to believe that another year has passed from last RSA. But, indeed, time flies when you’re busy, I guess, Slavik has more.
Joel Goodman asks; Can Supporting an Exadata Database Machine Make You Certifiable?
Christopher Gait has blogged about Oracle Priority Service Infogram.
SQL Server:
Michael Swart is following up on Ad hoc TVP contention.
James Serra is sharing his Data Warehouse Architecture presentation slides.
Jen McCown is live on the PowerScripting podcast.
Chris Shaw is telling us as why do a security audits.
The Public Preview of Data Explorer (which some of you know I’ve been following for a while, since it first appeared in SQL Azure Labs), is now available for download. Chris Webb tells.
MySQL:
Björn Melinder is having a hands on with jet profiler.
Percona Server on the Nexus 7: Your own MySQL Database Server on an Android Tablet
A good blog post about installing Apache2 With PHP5 And MySQL Support On Scientific Linux 6.3 (LAMP).
Daniel van Eeden is talking about MySQL Events.
Daniel Nichter says that MySQL 5.5 lock_wait_timeout: patience is a virtue, and a locked server .
How many CPU cores do I really have?
How Many CPU Cores Do I Really Have?
The view operating system statistic view v$osstat is can be misleading with regards to CPU cores. Not that the information is incorrect, it's well... let's say troubling. If I ask ten people to email a sample AWR report, I'm likely to see CPU core-like statistics such as CPU_SOCKETS, NUM_CPUS, VCPU, LCPU, CPU_THREADS, and probably a variety of other names. Wow… what a mess!
But I'd still like to know because it's important for my work. For two reasons:
First, it helps me to understand how high the CPU utilization can go before performance starts to degrade. Based on queuing theory, the more processes a system can simultaneously process the higher the average utilization the system can sustain before performance begins to degrade. I write about this in the Operating System section in my Oracle Performance Firefighting book.
Second, I always check what the math and my observations indicate versus what the OS administrator and OS commands (such as vmstat, war, top) tell me. Paranoid perhaps, but doing Oracle work for 20-plus years has taught me a few things...
Call It a Server, Not a Core, Lcpu, Thread, etc.
To avoid the entire discussion about which provides the processing power; core or thread, let's simply call the unit of processing power a "server." Why? Two reasons. First, because it provides CPU service to processes, so it truly is a "server." Second, that's what capacity planners call something that services transactions; a server. In fact, its symbol is M (capital "m").
By the way, it is very easy to determine, on your system, what provides the true CPU processing power (cores, threads, or something else). I blogged about this in June of 2011.
So the question is, how many "servers" does your database host contain? That's what this posting is all about.
If you recall from my previous posting, I demonstrated two ways to calculate CPU utilization. Both follow the classic; requirements divided by capacity. But the capacity is where the two approaches differ.
Capacity Calculation Using "servers"
Using "servers" to calculate the capacity is simply the number of servers multiplied by the snapshot interval. So a 2 server (think two cores) host over a 60 minute period can provide a maximum of 120 minutes or 7200 seconds of CPU power.
Here's the utilization formula using the capacity approach:
U = R / C
where;
R = CPU consumption over the interval (seconds)
C = CPU "servers" X interval (seconds)
For example, looking at a real AWR report, over a 60 minute interval, the AWR's Operating System Statistics show show a BUSY_IIME of 1913617, IDLE_TIME of 7159367 and the NUM_CPUS of 24.
Therefore, the average CPU utilization over the interval is:
U = 19136.17 / ( 24 * 60 * 60 ) = 0.221 = 22%
Capacity Calculation Using Busy and Idle Time
In my previous posting I introduced using only v$osstat's BUSY_TIME and IDLE_TIME values to calculate the average CPU utilization over the snapshot interval. Here's the formula:
U = R / C = BUSY_TIME / ( BUSY_TIME + IDLE_TIME )
Using the above examples numbers;
U = 1913617 / ( 1913617 + 7159367 ) = 0.211 = 21%
Yes, the two utilization calculation results don't match perfectly but they are very close… close enough.
Calculating the Number of "Servers"
Notice that in the busy and idle time capacity calculation there is no reference to the number of servers. Suppose you don't trust the v$osstat CPU core-like statistics or are simply not sure which one is important. In other words, you want to understand the effective number of CPU "servers." Using the two utilization formulas and some algebra we can figure this out!
Making sure to use the same unit of time, here are two capacity calculations:
C = servers * interval
C = busy_time + idle_time
Let's put them together and solve for "servers".
servers * interval = busy_time + idle_time
servers = ( busy_time + idle_time ) / interval
OK… but does this really work? Let's give it a try! (I'm going to use seconds as my unit of time.)
effective servers = ( 19136.17 + 71593.67 ) / ( 60 * 60 ) = 25.2
The math tells us that based on the collected data, on average the system is operating with effectively 25 "servers." I know in this situation there are physically 24 CPU cores, so we're pretty close.
What to Do With AIX
While this "effective servers" formula has proven its worth in many systems, I still find it does not work well many times in an AIX environment. Sometimes it does, but not always. So do the math and compare it with vmstat or some other AIX based tool.
The Take-Aways
The big one:
servers = ( busy_time + idle_time ) / interval
Personally, I never initially trust the CPU number related v$osstat statistics. I always check with the OS administrator and also run a simple OS command like top or sar or do a "cat /proc/stat". It's always a good idea to casually check with the OS administrator. You don't want to be thinking and working with 12 "servers" when the administrator is thinking 24 "servers."
For me, knowing the number of CPU "servers" is important. And since I never blindly trust the v$osstat CPU statistics, this is a very fast and reliable way (so far at least) to check my work.
Thanks for reading!
Craig.
If you enjoy my blog, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting, Advanced Oracle Performance Analysis, and my one-day Oracle Performance Research Seminar. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.
P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently orapub.general@comcast .net.
Delphix First Month
My company is trying out a new product called Delphix.
We have had Delphix installed for about a month and I wanted this post to contain my first impressions for others who may be considering it. Essentially Delphix provides fast copies of databases so that a terabyte database can be refreshed in minutes. My most recent example took three minutes for a development db that is just over one TB.
Delphix is an appliance running Open Solaris that runs in a VMWare ESX virtual machine. At least, that is how we have it installed. You license Delphix by the number of CPUs in the virtual machine so you can use it as much as you want as long as the CPUs can handle it. Your database file systems are connected to the Delphix appliance using NFS. So, the critical component of Delphix is the network connection between your target database servers and the appliance. In our case we put new 10 gigabit ethernet hardware in to connect our HP-UX Itanium virtual machines (target database servers) to the Delphix vm. We made sure they were all on the same IP subnet with nothing but a nice high speed switch between them all. After some initial testing Delphix’s performance and support teams made some network configuration changes that helped with an initial performance/hang issue. Looks like it may have something to do with the packet flow control within TCP/IP but it works fine now.
The Delphix appliance uses RMAN to pull data from a source database. The source database has to be in archivelog mode and for best performance you need to enable block change tracking. Delphix does an initial full level 0 backup and then regular incremental level 1 backups. To create a new clone copy of the original database you pick one of these incremental backups – called snapshots – and point to the target machine and Delphix automatically mounts the appropriate NFS filesystems with the datafiles, tempfile, redo logs, etc. and brings up a clone of the source system with the new name you designate.
Also, Delphix takes snapshots of the clone databases – called VDBs – on a regular basis so these can be used as backups of the clones themselves and you can clone the clones – sounds like Star Wars Clone Wars doesn’t it?
One interesting challenge is space management. Each new clone takes up very little space – until you start updating it. I did some tests where I copied a 30 gig table to a new table and the vdb which previously took less than 100 megabytes of space now took gigabytes. Everything is compressed so I think it was less than half of the 30 gig but the point is that the more the copies get updated the more disk space you need for the copies. If you have an application that needs copies quickly but the copies aren’t heavily updated you can make many copies with very little disk space. I think this kind of capability opens up all kinds of possibilities we haven’t considered before since each copy without Delphix would take up an equal amount of space as the original if you just use normal disk storage. I.e. Before Delphix we had to minimize the number of production copies we used for development and testing. Imagine how our processes might change if we can spin off a quick production copy, use it for a short while, and then get rid of it all with minimal additional disk space and with the clone occurring in minutes.
Right now I’m working with a Delphix consultant on a script that the developers can use to refresh their own database from the most recent snapshot of its source. Delphix has a couple of ways it can be controlled from a Unix shell script. One way is with ssh and their command line interface (CLI). You can setup ssh with a public and private key so you can ssh into the VM as the “delphix admin” user and then run a set of commands in Delphix’s proprietary language. Delphix also has a GUI but it is nice that you can run a script and do things like kick off a refresh or clone. Also, there is a web service and they gave me a sample Python script to communicate with the VM through the web service but I haven’t delved into it. For one thing, we don’t have Python installed on our HP-UX servers. You can access Delphix’s documentation online here including the CLI interface commands.
Well, I’m not trying to sell Delphix to anyone but I thought it would be good to put down some of my experiences. The database cloning and refreshing is remarkably fast. Time will tell how the network NFS performance holds out during heavy development and testing but assuming we manage that performance and the disk space usage I’m pleased with the remarkable clone times that are supported by the Delphix VM.
- Bobby
A demonstration of Oracle row movement within a block.
SQL> create table TEST_TABLE(id integer, attribute1 varchar2(200)) tablespace test;
Table created.
SQL> select file_id, file_name from dba_data_files where tablespace_name = 'TEST';
FILE_ID
----------
FILE_NAME
----------------------------------------------------------------
5
D:\ORACLE\WIN64\DATAFILE\O1_MF_TEST_8LRBSQ59_.DBF
SQL> select object_id, data_object_id from dba_objects where object_name = 'TEST_TABLE';
OBJECT_ID DATA_OBJECT_ID
---------- --------------
73516 73516
SQL> insert into test_table values (1,'AAAAA');
1 row created.
SQL> insert into test_table values (2,'BBBBB');
1 row created.
SQL> commit;
Commit complete.
SQL> alter system checkpoint;
System altered.
SQL> select file_id, block_id, blocks, relative_fno from dba_extents where segme
nt_name = 'TEST_TABLE';
FILE_ID BLOCK_ID BLOCKS RELATIVE_FNO
---------- ---------- ---------- ------------
5 160 8 5
We extract the block contents using the following program :- http://blog.contractoracle.com/2013/02/java-program-to-extract-data-from.html
Block 163 Contains Object ID 73516
Block Header start :- 0
Block Type 06-DATA :- 06
Block Format :- A2
Spare1 :- 00
Spare2 :- 00
Relative Block Address :- 20971683
SCN Base :- 1075559
SCN Wrap :- 0
Sequence :- 01
Flag 01-NEW :- 06
CheckSum :- 32492
Spare3 :- 0
Transaction Header start :- 20
Type 01-DATA 02-INDEX :- 01
Spare 1 :- 00
Spare 2 :- 00
Spare 3 :- 00
Object ID :- 73516
Cleanout SCN Base :- 1075556
Cleanout SCN Wrap :- 0
Spare 4 :- 00
Spare 5 :- 00
ITL Slots :- 2
UNKNOWN Byte :- 00
Flag 00-FREE :- 32
ITL TX Feeelist Slot :- 00
Next Block On Free List :- 20971680
ITLSlot :- 1
Undo Segment :- 1
Undo Segment Slot :- 5
Transaction Sequence :- 633
Undo Block Address :- 12583566
Undo Sequence :- 165
Undo Record Number :- 23
Spare 1 :- 0
Flag :- 8194
_ktbitun :- 0
Base :- 1075559
ITLSlot :- 2
Undo Segment :- 0
Undo Segment Slot :- 0
Transaction Sequence :- 0
Undo Block Address :- 0
Undo Sequence :- 0
Undo Record Number :- 0
Spare 1 :- 0
Flag :- 0
_ktbitun :- 0
Base :- 0
Data Header start :- 100
Flags :- 00
Number of Tables :- 1
Number of Rows :- 2
Offset to Freespace Start :- -1
Offset to Freespace End :- 22
Available Space :- 8064
Available Space after Commit:- 8042
Table Directory start :-114
Table :- 1
Offset :- 0
Number of Rows :- 2
Total Number of Rows :- 2
Row Directory start :-118
Row Offset 1 :- 8076(+100)
Row Offset 2 :- 8064(+100)
1 Row Header start :-8176
Flags :- 2C Table Data
Lock Status :- 01
Number of Columns :- 2
Column 1 Bytes 2 Data :- C1 02
Column 2 Bytes 5 Data :- AAAAA
2 Row Header start :-8164
Flags :- 2C Table Data
Lock Status :- 01
Number of Columns :- 2
Column 1 Bytes 2 Data :- C1 03
Column 2 Bytes 5 Data :- BBBBB
From the above we can see that there are two rows in block 163, starting at bytes 8164 and 8176. The first record inserted (AAAAA) is at the end of the block, with no room to grow.
If we update the record with more data :-
SQL> update test_table set attribute1 = '012345678901234567890123456789012345678
9012345678901234567890123456789012345678901234567890' where attribute1 = 'AAAAA'
;
1 row updated.
SQL> commit;
Commit complete.
SQL> alter system checkpoint;
System altered.
We can now see that :-
1. the block SCNs and sequence have been updated
2. the block CheckSum has been updated
3. the Available Space in the block has decreased
4. the record that previously started at byte 8176 has now moved to byte 8066 to allow it to grow.
Block 163 Contains Object ID 73516
Block Header start :- 0
Block Type 06-DATA :- 06
Block Format :- A2
Spare1 :- 00
Spare2 :- 00
Relative Block Address :- 20971683
SCN Base :- 1075759
SCN Wrap :- 0
Sequence :- 02
Flag 01-NEW :- 06
CheckSum :- -25092
Spare3 :- 0
Transaction Header start :- 20
Type 01-DATA 02-INDEX :- 01
Spare 1 :- 00
Spare 2 :- 00
Spare 3 :- 00
Object ID :- 73516
Cleanout SCN Base :- 1075757
Cleanout SCN Wrap :- 0
Spare 4 :- 00
Spare 5 :- 00
ITL Slots :- 2
UNKNOWN Byte :- 00
Flag 00-FREE :- 32
ITL TX Feeelist Slot :- 00
Next Block On Free List :- 20971680
ITLSlot :- 1
Undo Segment :- 1
Undo Segment Slot :- 5
Transaction Sequence :- 633
Undo Block Address :- 12583566
Undo Sequence :- 165
Undo Record Number :- 23
Spare 1 :- 0
Flag :- -32768
_ktbitun :- 0
Base :- 1075559
ITLSlot :- 2
Undo Segment :- 7
Undo Segment Slot :- 22
Transaction Sequence :- 605
Undo Block Address :- 12583559
Undo Sequence :- 222
Undo Record Number :- 49
Spare 1 :- 0
Flag :- 8193
_ktbitun :- 0
Base :- 1075759
Data Header start :- 100
Flags :- 00
Number of Tables :- 1
Number of Rows :- 2
Offset to Freespace Start :- -1
Offset to Freespace End :- 22
Available Space :- 7966
Available Space after Commit:- 7956
Table Directory start :-114
Table :- 1
Offset :- 0
Number of Rows :- 2
Total Number of Rows :- 2
Row Directory start :-118
Row Offset 1 :- 7966(+100)
Row Offset 2 :- 8064(+100)
1 Row Header start :-8066
Flags :- 2C Table Data
Lock Status :- 02
Number of Columns :- 2
Column 1 Bytes 2 Data :- C1 02
Column 2 Bytes 91 Data :- 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
2 Row Header start :-8164
Flags :- 2C Table Data
Lock Status :- 00
Number of Columns :- 2
Column 1 Bytes 2 Data :- C1 03
Column 2 Bytes 5 Data :- BBBBB
What happens to Oracle data blocks during Truncate.
SQL> create table TEST_TABLE(id integer, attribute1 char(5), attribute2 varchar2 (5)) tablespace test;
Table created.
SQL> select file_id, file_name from dba_data_files where tablespace_name = 'TEST';
FILE_ID
----------
FILE_NAME
-----------------------------------------------------------------
5
D:\ORACLE\WIN64\DATAFILE\O1_MF_TEST_8LRBSQ59_.DBF
SQL> select object_id, data_object_id from dba_objects where object_name = 'TEST_TABLE';
OBJECT_ID DATA_OBJECT_ID
---------- --------------
73514 73514
SQL> insert into test_table values (1,'AAAAA','BBB');
1 row created.
SQL> insert into test_table values (2,'CCCCC','DDDDD');
1 row created.
SQL> commit;
Commit complete.
SQL> alter system checkpoint;
System altered.
SQL> select file_id, block_id, blocks, relative_fno from dba_extents where segment_name = 'TEST_TABLE';
FILE_ID BLOCK_ID BLOCKS RELATIVE_FNO
---------- ---------- ---------- ------------
5 152 8 5
We can find the data for the table in file 5 in an extent starting at block 152. When we scan for the data, we find that the records we inserted are in block 155, with empty blocks 156,7,8,9. The following output is from :- http://blog.contractoracle.com/2013/02/java-program-to-extract-data-from.html
Block 155 Contains Object ID 73514Block Header start :- 0 Block Type 06-DATA :- 06 Block Format :- A2 Spare1 :- 00 Spare2 :- 00 Relative Block Address :- 20971675 SCN Base :- 1074483 SCN Wrap :- 0 Sequence :- 01 Flag 01-NEW :- 06 CheckSum :- 14737 Spare3 :- 0Transaction Header start :- 20 Type 01-DATA 02-INDEX :- 01 Spare 1 :- 00 Spare 2 :- 00 Spare 3 :- 00 Object ID :- 73514 Cleanout SCN Base :- 1074478 Cleanout SCN Wrap :- 0 Spare 4 :- 00 Spare 5 :- 00 ITL Slots :- 2 UNKNOWN Byte :- 00 Flag 00-FREE :- 32 ITL TX Feeelist Slot :- 00 Next Block On Free List :- 20971672ITLSlot :- 1 Undo Segment :- 1 Undo Segment Slot :- 14 Transaction Sequence :- 632 Undo Block Address :- 12583042 Undo Sequence :- 162 Undo Record Number :- 51 Spare 1 :- 0 Flag :- 8194 _ktbitun :- 0 Base :- 1074483ITLSlot :- 2 Undo Segment :- 0 Undo Segment Slot :- 0 Transaction Sequence :- 0 Undo Block Address :- 0 Undo Sequence :- 0 Undo Record Number :- 0 Spare 1 :- 0 Flag :- 0 _ktbitun :- 0 Base :- 0Data Header start :- 100 Flags :- 00 Number of Tables :- 1 Number of Rows :- 2 Offset to Freespace Start :- -1 Offset to Freespace End :- 22 Available Space :- 8054 Available Space after Commit:- 8032Table Directory start :-114 Table :- 1 Offset :- 0 Number of Rows :- 2 Total Number of Rows :- 2Row Directory start :-118 Row Offset 1 :- 8072(+100) Row Offset 2 :- 8054(+100)1 Row Header start :-8172 Flags :- 2C Table Data Lock Status :- 01 Number of Columns :- 3Column 1 Bytes 2 Data :- C1 02 Column 2 Bytes 5 Data :- AAAAAColumn 3 Bytes 3 Data :- BBB2 Row Header start :-8154 Flags :- 2C Table Data Lock Status :- 01 Number of Columns :- 3Column 1 Bytes 2 Data :- C1 03 Column 2 Bytes 5 Data :- CCCCCColumn 3 Bytes 5 Data :- DDDDD
Block 156 Contains Object ID 73514Block 157 Contains Object ID 73514Block 158 Contains Object ID 73514Block 159 Contains Object ID 73514
Now we truncate the table. Selecting from the data dictionary table we can see that the data blocks listed in DBA_EXTENTS did not change, but in DBA_OBJECTS the DATA_OBJECT_ID was updated.
SQL> truncate table test_table;
Table truncated.
SQL> alter system checkpoint;
System altered.
SQL> select * from test_table;
no rows selected
SQL> select object_id, data_object_id from dba_objects where object_name = 'TEST_TABLE';
OBJECT_ID DATA_OBJECT_ID---------- -------------- 73514 73515
SQL> select file_id, block_id, blocks, relative_fno from dba_extents where segment_name = 'TEST_TABLE';
FILE_ID BLOCK_ID BLOCKS RELATIVE_FNO---------- ---------- ---------- ------------ 5 152 8 5
If we again scan for blocks associated with the old DATA_OBJECT_ID 73514 we can see that they are unchanged and still have the original data in them. This means that even after a table is truncated, the data remains in the blocks on disk, and could be extracted (until the blocks are overwritten).
Block 155 Contains Object ID 73514
Block Header start :- 0
Block Type 06-DATA :- 06
Block Format :- A2
Spare1 :- 00
Spare2 :- 00
Relative Block Address :- 20971675
SCN Base :- 1074483
SCN Wrap :- 0
Sequence :- 01
Flag 01-NEW :- 06
CheckSum :- 14737
Spare3 :- 0
Transaction Header start :- 20
Type 01-DATA 02-INDEX :- 01
Spare 1 :- 00
Spare 2 :- 00
Spare 3 :- 00
Object ID :- 73514
Cleanout SCN Base :- 1074478
Cleanout SCN Wrap :- 0
Spare 4 :- 00
Spare 5 :- 00
ITL Slots :- 2
UNKNOWN Byte :- 00
Flag 00-FREE :- 32
ITL TX Feeelist Slot :- 00
Next Block On Free List :- 20971672
ITLSlot :- 1
Undo Segment :- 1
Undo Segment Slot :- 14
Transaction Sequence :- 632
Undo Block Address :- 12583042
Undo Sequence :- 162
Undo Record Number :- 51
Spare 1 :- 0
Flag :- 8194
_ktbitun :- 0
Base :- 1074483
ITLSlot :- 2
Undo Segment :- 0
Undo Segment Slot :- 0
Transaction Sequence :- 0
Undo Block Address :- 0
Undo Sequence :- 0
Undo Record Number :- 0
Spare 1 :- 0
Flag :- 0
_ktbitun :- 0
Base :- 0
Data Header start :- 100
Flags :- 00
Number of Tables :- 1
Number of Rows :- 2
Offset to Freespace Start :- -1
Offset to Freespace End :- 22
Available Space :- 8054
Available Space after Commit:- 8032
Table Directory start :-114
Table :- 1
Offset :- 0
Number of Rows :- 2
Total Number of Rows :- 2
Row Directory start :-118
Row Offset 1 :- 8072(+100)
Row Offset 2 :- 8054(+100)
1 Row Header start :-8172
Flags :- 2C Table Data
Lock Status :- 01
Number of Columns :- 3
Column 1 Bytes 2 Data :- C1 02
Column 2 Bytes 5 Data :- AAAAA
Column 3 Bytes 3 Data :- BBB
2 Row Header start :-8154
Flags :- 2C Table Data
Lock Status :- 01
Number of Columns :- 3
Column 1 Bytes 2 Data :- C1 03
Column 2 Bytes 5 Data :- CCCCC
Column 3 Bytes 5 Data :- DDDDD
Block 156 Contains Object ID 73514
Block 157 Contains Object ID 73514
Block 158 Contains Object ID 73514
Block 159 Contains Object ID 73514
What happens in an Oracle block during a delete.
SQL> create table TEST_TABLE(id integer, attribute1 char(5), attribute2 varchar2 (5)) tablespace test;
Table created.
SQL> select file_id, file_name from dba_data_files where tablespace_name = 'TEST';
FILE_ID
----------
FILE_NAME
----------------------------------------------------------------
5
D:\ORACLE\WIN64\DATAFILE\O1_MF_TEST_8LRBSQ59_.DBF
SQL> select object_id, data_object_id from dba_objects where object_name = 'TEST_TABLE';
OBJECT_ID DATA_OBJECT_ID
---------- --------------
73513 73513
SQL> insert into test_table values (1,'AAAAA','BBB');
1 row created.
SQL> insert into test_table values (2,'CCCCC','DDDDD');
1 row created.
SQL> commit;
Commit complete.
SQL> select file_id, block_id, blocks, relative_fno from dba_extents where segment_name = 'TEST_TABLE';
FILE_ID BLOCK_ID BLOCKS RELATIVE_FNO
---------- ---------- ---------- ------------
5 144 8 5
SQL> alter system checkpoint;
System altered.
When we scan the file for data object ID 73513 we find the data in block 150. The following block data was extracted using :- http://blog.contractoracle.com/2013/02/java-program-to-extract-data-from.html
Block 150 Contains Object ID 73513
Block Header start :- 0
Block Type 06-DATA :- 06
Block Format :- A2
Spare1 :- 00
Spare2 :- 00
Relative Block Address :- 20971670
SCN Base :- 1073062
SCN Wrap :- 0
Sequence :- 01
Flag 01-NEW :- 06
CheckSum :- 14473
Spare3 :- 0
Transaction Header start :- 20
Type 01-DATA 02-INDEX :- 01
Spare 1 :- 00
Spare 2 :- 00
Spare 3 :- 00
Object ID :- 73513
Cleanout SCN Base :- 1073058
Cleanout SCN Wrap :- 0
Spare 4 :- 00
Spare 5 :- 00
ITL Slots :- 2
UNKNOWN Byte :- 00
Flag 00-FREE :- 32
ITL TX Feeelist Slot :- 00
Next Block On Free List :- 20971664
ITLSlot :- 1
Undo Segment :- 5
Undo Segment Slot :- 33
Transaction Sequence :- 814
Undo Block Address :- 12583110
Undo Sequence :- 169
Undo Record Number :- 6
Spare 1 :- 0
Flag :- 8194
_ktbitun :- 0
Base :- 1073062
ITLSlot :- 2
Undo Segment :- 0
Undo Segment Slot :- 0
Transaction Sequence :- 0
Undo Block Address :- 0
Undo Sequence :- 0
Undo Record Number :- 0
Spare 1 :- 0
Flag :- 0
_ktbitun :- 0
Base :- 0
Data Header start :- 100
Flags :- 00
Number of Tables :- 1
Number of Rows :- 2
Offset to Freespace Start :- -1
Offset to Freespace End :- 22
Available Space :- 8054
Available Space after Commit:- 8032
Table Directory start :-114
Table :- 1
Offset :- 0
Number of Rows :- 2
Total Number of Rows :- 2
Row Directory start :-118
Row Offset 1 :- 8072(+100)
Row Offset 2 :- 8054(+100)
1 Row Header start :-8172
Flags :- 2C Table Data
Lock Status :- 01
Number of Columns :- 3
Column 1 Bytes 2 Data :- C1 02
Column 2 Bytes 5 Data :- AAAAA
Column 3 Bytes 3 Data :- BBB
2 Row Header start :-8154
Flags :- 2C Table Data
Lock Status :- 01
Number of Columns :- 3
Column 1 Bytes 2 Data :- C1 03
Column 2 Bytes 5 Data :- CCCCC
Column 3 Bytes 5 Data :- DDDDD
Now we delete one row :-
SQL> delete from test_table where id = 1;
1 row deleted.
SQL> commit;
Commit complete.
SQL> select * from test_table;
ID ATTRI ATTRI
---------- ----- -----
2 CCCCC DDDDD
SQL> alter system checkpoint;
System altered.
And we can extract the block data again and confirm that the result of the delete is that :-
1. the block SCNs and Sequence have been updated2. the block CheckSum has been updated3. ITL Slot 2 was used for the transaction4. ITL Slot 1 flag was set to -32768 (unused)5. The row header flag was updated from 2C to 3C6. The row header lock status was updated to 02
Note that the delete did not actually remove the record from the block, it just updated the row flag. This means that even if a record has been deleted from an Oracle database table is possible to read the data for the deleted record directly from the block (until it is overwritten).
Block 150 Contains Object ID 73513
Block Header start :- 0
Block Type 06-DATA :- 06
Block Format :- A2
Spare1 :- 00
Spare2 :- 00
Relative Block Address :- 20971670
SCN Base :- 1073294
SCN Wrap :- 0
Sequence :- 02
Flag 01-NEW :- 06
CheckSum :- -6419
Spare3 :- 0
Transaction Header start :- 20
Type 01-DATA 02-INDEX :- 01
Spare 1 :- 00
Spare 2 :- 00
Spare 3 :- 00
Object ID :- 73513
Cleanout SCN Base :- 1073292
Cleanout SCN Wrap :- 0
Spare 4 :- 00
Spare 5 :- 00
ITL Slots :- 2
UNKNOWN Byte :- 00
Flag 00-FREE :- 32
ITL TX Feeelist Slot :- 00
Next Block On Free List :- 20971664
ITLSlot :- 1
Undo Segment :- 5
Undo Segment Slot :- 33
Transaction Sequence :- 814
Undo Block Address :- 12583110
Undo Sequence :- 169
Undo Record Number :- 6
Spare 1 :- 0
Flag :- -32768
_ktbitun :- 0
Base :- 1073062
ITLSlot :- 2
Undo Segment :- 6
Undo Segment Slot :- 27
Transaction Sequence :- 797
Undo Block Address :- 12583068
Undo Sequence :- 174
Undo Record Number :- 43
Spare 1 :- 0
Flag :- 8193
_ktbitun :- 14
Base :- 1073294
Data Header start :- 100
Flags :- 00
Number of Tables :- 1
Number of Rows :- 2
Offset to Freespace Start :- -1
Offset to Freespace End :- 22
Available Space :- 8054
Available Space after Commit:- 8032
Table Directory start :-114
Table :- 1
Offset :- 0
Number of Rows :- 2
Total Number of Rows :- 2
Row Directory start :-118
Row Offset 1 :- 8072(+100)
Row Offset 2 :- 8054(+100)
1 Row Header start :-8172
Flags :- 3C Deleted
Lock Status :- 02
Number of Columns :- 3
Column 1 Bytes 2 Data :- C1 02
Column 2 Bytes 5 Data :- AAAAA
Column 3 Bytes 3 Data :- BBB
2 Row Header start :-8154
Flags :- 2C Table Data
Lock Status :- 00
Number of Columns :- 3
Column 1 Bytes 2 Data :- C1 03
Column 2 Bytes 5 Data :- CCCCC
Column 3 Bytes 5 Data :- DDDDD
What happens in an Oracle Block during an update.
SQL> create table TEST_TABLE(id integer, attribute1 char(5), attribute2 varchar2(5)) tablespace test;
Table created.
SQL> select file_id, file_name from dba_data_files where tablespace_name = 'TEST';
FILE_ID
----------
FILE_NAME
-----------------------------------------------------------------
5
D:\ORACLE\WIN64\DATAFILE\O1_MF_TEST_8LRBSQ59_.DBF
SQL> select object_id, data_object_id from dba_objects where object_name = 'TEST
_TABLE';
OBJECT_ID DATA_OBJECT_ID
---------- --------------
73512 73512
SQL> insert into test_table values (1,'AAAAA','BBB');
1 row created.
SQL> commit;
Commit complete.
SQL> insert into test_table values (2,'CCCCC','DDDDD');
1 row created.
SQL> commit;
Commit complete.
SQL> select file_id, block_id, blocks, relative_fno from dba_extents where segme
nt_name = 'TEST_TABLE';
FILE_ID BLOCK_ID BLOCKS RELATIVE_FNO
---------- ---------- ---------- ------------
5 136 8 5
SQL> alter system checkpoint;
System altered.
The table we are working on for this test is stored in file 5 with extent starting at block 136 in blocks with DATA_OBJECT_ID 73512 . When we scan the file for blocks with data object ID 73512 we can see that the rows we are interested in are in block 139. Two ITL Slots have been created in the block.
The following block data was extracted using :- http://blog.contractoracle.com/2013/02/java-program-to-extract-data-from.html
Block 139 Contains Object ID 73512
Block Header start :- 0
Block Type 06-DATA :- 06
Block Format :- A2
Spare1 :- 00
Spare2 :- 00
Relative Block Address :- 20971659
SCN Base :- 1072047
SCN Wrap :- 0
Sequence :- 02
Flag 01-NEW :- 06
CheckSum :- 17170
Spare3 :- 0
Transaction Header start :- 20
Type 01-DATA 02-INDEX :- 01
Spare 1 :- 00
Spare 2 :- 00
Spare 3 :- 00
Object ID :- 73512
Cleanout SCN Base :- 1072042
Cleanout SCN Wrap :- 0
Spare 4 :- 00
Spare 5 :- 00
ITL Slots :- 2
UNKNOWN Byte :- 00
Flag 00-FREE :- 32
ITL TX Feeelist Slot :- 00
Next Block On Free List :- 20971656
ITLSlot :- 1
Undo Segment :- 10
Undo Segment Slot :- 29
Transaction Sequence :- 633
Undo Block Address :- 12583185
Undo Sequence :- 140
Undo Record Number :- 3
Spare 1 :- 0
Flag :- 8193
_ktbitun :- 0
Base :- 1072043
ITLSlot :- 2
Undo Segment :- 5
Undo Segment Slot :- 19
Transaction Sequence :- 814
Undo Block Address :- 12583108
Undo Sequence :- 169
Undo Record Number :- 33
Spare 1 :- 0
Flag :- 8193
_ktbitun :- 0
Base :- 1072047
Data Header start :- 100
Flags :- 00
Number of Tables :- 1
Number of Rows :- 2
Offset to Freespace Start :- -1
Offset to Freespace End :- 22
Available Space :- 8054
Available Space after Commit:- 8032
Table Directory start :-114
Table :- 1
Offset :- 0
Number of Rows :- 2
Total Number of Rows :- 2
Row Directory start :-118
Row Offset 1 :- 8072(+100)
Row Offset 2 :- 8054(+100)
1 Row Header start :-8172
Flags :- 2C Table Data
Lock Status :- 01
Number of Columns :- 3
Column 1 Bytes 2 Data :- C1 02
Column 2 Bytes 5 Data :- AAAAA
Column 3 Bytes 3 Data :- BBB
2 Row Header start :-8154
Flags :- 2C Table Data
Lock Status :- 02
Number of Columns :- 3
Column 1 Bytes 2 Data :- C1 03
Column 2 Bytes 5 Data :- CCCCC
Column 3 Bytes 5 Data :- DDDDD
Now we update one record in the table.
SQL> update test_table set attribute1 = 'EEEEE' where attribute1 = 'AAAAA';
1 row updated.
SQL> commit;
Commit complete.
SQL> alter system checkpoint;
System altered.
After dumping the block we can see that :-
1. block SCNs were updated
2. block CheckSum was updated
3. ITL Slot 1 was used for the transaction
4. ITL Slot 2 flag was updated to -32768 (I guess indicating unused)
5. The Row lock status for the untouched row was cleared.
6. The data was updated.
Block 139 Contains Object ID 73512
Block Header start :- 0
Block Type 06-DATA :- 06
Block Format :- A2
Spare1 :- 00
Spare2 :- 00
Relative Block Address :- 20971659
SCN Base :- 1072247
SCN Wrap :- 0
Sequence :- 02
Flag 01-NEW :- 06
CheckSum :- -6509
Spare3 :- 0
Transaction Header start :- 20
Type 01-DATA 02-INDEX :- 01
Spare 1 :- 00
Spare 2 :- 00
Spare 3 :- 00
Object ID :- 73512
Cleanout SCN Base :- 1072246
Cleanout SCN Wrap :- 0
Spare 4 :- 00
Spare 5 :- 00
ITL Slots :- 2
UNKNOWN Byte :- 00
Flag 00-FREE :- 32
ITL TX Feeelist Slot :- 00
Next Block On Free List :- 20971656
ITLSlot :- 1
Undo Segment :- 4
Undo Segment Slot :- 16
Transaction Sequence :- 596
Undo Block Address :- 12583572
Undo Sequence :- 147
Undo Record Number :- 55
Spare 1 :- 0
Flag :- 8193
_ktbitun :- 0
Base :- 1072247
ITLSlot :- 2
Undo Segment :- 5
Undo Segment Slot :- 19
Transaction Sequence :- 814
Undo Block Address :- 12583108
Undo Sequence :- 169
Undo Record Number :- 33
Spare 1 :- 0
Flag :- -32768
_ktbitun :- 0
Base :- 1072047
Data Header start :- 100
Flags :- 00
Number of Tables :- 1
Number of Rows :- 2
Offset to Freespace Start :- -1
Offset to Freespace End :- 22
Available Space :- 8054
Available Space after Commit:- 8032
Table Directory start :-114
Table :- 1
Offset :- 0
Number of Rows :- 2
Total Number of Rows :- 2
Row Directory start :-118
Row Offset 1 :- 8072(+100)
Row Offset 2 :- 8054(+100)
1 Row Header start :-8172
Flags :- 2C Table Data
Lock Status :- 01
Number of Columns :- 3
Column 1 Bytes 2 Data :- C1 02
Column 2 Bytes 5 Data :- EEEEE
Column 3 Bytes 3 Data :- BBB
2 Row Header start :-8154
Flags :- 2C Table Data
Lock Status :- 00
Number of Columns :- 3
Column 1 Bytes 2 Data :- C1 03
Column 2 Bytes 5 Data :- CCCCC
Column 3 Bytes 5 Data :- DDDDD
Java program to extract records from Oracle Control Files.
It is a large Java program laid out in order of execution for easy reading and learning purposes. Yes, I know that a "real Java programmer" would do a better job, and there is lots of room for improvement, but it is good enough for learning about Oracle data structures.
Anyone is welcome to download it, modify it, and use it for any purpose, but I don't guarantee that it extracts all records available, or will work on all OS or DB versions. Use it at your own risk. Enjoy !
An example of the output can be found here :- http://blog.contractoracle.com/2013/02/oracle-control-file-physical-structure.html
# ControlScanCMD.java
import java.io.IOException;
import java.io.RandomAccessFile;
public class ControlScanCMD {
public static void main(String[] args) {
try {
RandomAccessFile raf = new RandomAccessFile("D:\\oracle\\WIN64\\CONTROLFILE\\O1_MF_8L5T35GK_.CTL", "r");
int BlockSize = 8192*2;
int BlockStart = 0;
int BlockNum;
int FileSize =(int)raf.length();
int NumBlocks = FileSize / BlockSize;
int NextRecordStart;
int FileRecordSize = 524;
int TablespaceRecordSize = 68;
int HeaderSize =18;
int TailSize = 4;
int RecordCount = 0;
int FileIdOffset = 4;
int TablespaceFirstChar = 9;
int NameOffset = 14;
int Pointer = 0;
byte MyByte = 0;
char MyChar = 0;
int FileID = 0;
System.out.println("Number of Blocks :- " + NumBlocks);
// get database name
BlockStart = BlockSize * 1;
System.out.println("Block :- 2");
System.out.print("Tablespace Name :- ");
raf.seek(BlockStart + HeaderSize + NameOffset);
for(int ct1 = 1; ct1 <= 8; ct1++)
{
MyByte = raf.readByte();
if (MyByte == 0)
{
System.out.print("");
//System.out.println(String.format("%02X ", raf.readByte()));
}
else if (32 < MyByte && MyByte <= 126)
{
//MyChar = ((char)Integer.parseInt(String.format("%02X", MyByte), 16));
MyChar = (char)MyByte;
System.out.print("" + MyChar);
//jTextAreaFile.append(" " + ct + "-" + String.format("%02X", MyByte) + " ");
}
else
{
System.out.print(" " + ct1 + "-" + String.format("%02X", MyByte) + " ");
}
}
System.out.println("");
System.out.println("");
// get file details
for(BlockNum = 31; BlockNum <= 32; BlockNum++)
{
BlockStart = BlockSize * BlockNum;
NextRecordStart = HeaderSize;
RecordCount = 0;
System.out.println("Block :- " + BlockNum);
raf.seek(BlockStart + NextRecordStart + FileIdOffset);
FileID = raf.readByte();
while (FileID != 0 && (NextRecordStart + FileRecordSize) < BlockSize)
{
RecordCount = RecordCount + 1;
raf.seek(BlockStart + NextRecordStart);
System.out.println("Record :- " + RecordCount);
System.out.println("Byte 1 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 2 :- " + String.format("%02X ", raf.readByte()));
System.out.println("File Type 3-LOG, 4-DATA, 7-TEMP :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 4 :- " + String.format("%02X ", raf.readByte()));
System.out.println("File ID :- " + raf.readByte());
System.out.println("Byte 6 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 7 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 8 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 9 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 10 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 11 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 12 :- " + String.format("%02X ", raf.readByte()));
System.out.print("File Name :- ");
for(int ct1 = 1; ct1 <= 512; ct1++)
{
MyByte = raf.readByte();
if (MyByte == 0)
{
System.out.print("");
//System.out.println(String.format("%02X ", raf.readByte()));
}
else if (32 < MyByte && MyByte <= 126)
{
//MyChar = ((char)Integer.parseInt(String.format("%02X", MyByte), 16));
MyChar = (char)MyByte;
System.out.print("" + MyChar);
//jTextAreaFile.append(" " + ct + "-" + String.format("%02X", MyByte) + " ");
}
else
{
System.out.print(" " + ct1 + "-" + String.format("%02X", MyByte) + " ");
}
}
System.out.println("");
System.out.println("");
NextRecordStart = NextRecordStart + FileRecordSize;
raf.seek(BlockStart + NextRecordStart + FileIdOffset);
FileID = raf.readByte();
}
// get tablespace names
for(BlockNum = 179; BlockNum <= 179; BlockNum++)
{
BlockStart = BlockSize * BlockNum;
NextRecordStart = HeaderSize;
RecordCount = 0;
System.out.println("Block :- " + BlockNum);
raf.seek(BlockStart + NextRecordStart + TablespaceFirstChar);
FileID = raf.readByte();
while (FileID != 0 && (NextRecordStart + TablespaceRecordSize) < BlockSize)
{
RecordCount = RecordCount + 1;
raf.seek(BlockStart + NextRecordStart);
System.out.println("Record :- " + RecordCount);
System.out.println("Byte 1 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 2 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Tablespace ID :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 4 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 5 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 6 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 7 :- " + String.format("%02X ", raf.readByte()));
System.out.println("Byte 8 :- " + String.format("%02X ", raf.readByte()));
System.out.print("Tablespace Name :- ");
for(int ct1 = 1; ct1 <= 30; ct1++)
{
MyByte = raf.readByte();
if (MyByte == 0)
{
System.out.print("");
//System.out.println(String.format("%02X ", raf.readByte()));
}
else if (32 < MyByte && MyByte <= 126)
{
//MyChar = ((char)Integer.parseInt(String.format("%02X", MyByte), 16));
MyChar = (char)MyByte;
System.out.print("" + MyChar);
//jTextAreaFile.append(" " + ct + "-" + String.format("%02X", MyByte) + " ");
}
else
{
System.out.print(" " + ct1 + "-" + String.format("%02X", MyByte) + " ");
}
}
System.out.println("");
System.out.println("");
NextRecordStart = NextRecordStart + TablespaceRecordSize;
raf.seek(BlockStart + NextRecordStart + TablespaceFirstChar);
FileID = raf.readByte();
}
}
}
raf.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
Java program to extract data from Oracle datafile blocks.
Because it extracts directly from the block it can be used to extract data from corrupt files or blocks, truncated tables, and even extract deleted rows.
The program is written purely for learning, so I don't guarantee it will extract all records in a block, or that it can handle all data types. I am happy for anyone to copy the program or improve it, but I do not provide any guarantee for it. Use it at your own risk.
It is a large Java program laid out in order of execution for easy reading and learning purposes. Yes, I know that a "real Java programmer" would do a better job, and there is lots of room for improvement, but it is good enough for learning about Oracle data structures.
An example of the program output can be found here :- http://blog.contractoracle.com/2013/02/extracting-data-directly-from-oracle.html
# OraTabExport.java
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
public class OraTabExport {
public static void main(String[] args) {
try {
RandomAccessFile raf = new RandomAccessFile("D:\\oracle\\WIN64\\DATAFILE\\O1_MF_TEST_8LRBSQ59_.DBF", "r");
int BlockSize = 8192;
int ObjectID = 73520;
int StartByte = 0;
int ObjectIDLoc = 24;
int ITLSlotsLoc = 36;
int ITLSlots = 2;
int ITLSlot = 0;
int NumTablesLoc = 0;
int BlockObjectID = 0;
int FileSize =(int)raf.length();
int NumBlocks = FileSize / BlockSize;
int BlockNum = 0;
int DataHeaderStart = 0;
int TableDirectoryStart = 0;
int RowDirectoryStart =0;
int RowStart = 0;
int BlockHeaderSize = 20;
int TransactionHeaderSize = 24;
int DataHeaderSize = 14;
int TableDirectorySize = 4;
int NumTables = 0;
int TableNum = 0;
int NumTableRows = 0;
int TotalTableRows = 0;
int RowNumber = 0;
int NumColumns = 0;
int ColNum = 0;
int ColumnSize = 0;
int RowFlag = 0;
short RowOffset = 0;
int MinOffset = 0;
int MaxOffset = 0;
int OffsetCount = 0;
int ByteNum = 0;
Byte MyByte = 0;
char MyChar = 0;
for(BlockNum = 0; BlockNum <= NumBlocks -1; BlockNum++)
{
StartByte = BlockSize * BlockNum;
raf.seek(StartByte + ObjectIDLoc);
BlockObjectID = Integer.reverseBytes(raf.readInt());
raf.seek(StartByte + ITLSlotsLoc);
ITLSlots = raf.read();
if ( BlockObjectID == ObjectID)
{
raf.seek(StartByte);
System.out.println("\nBlock " + BlockNum + " Contains Object ID " + BlockObjectID);
// The first 20 bytes is the Block Header
System.out.println("Block Header start :- " + 0);
System.out.println(" Block Type 06-DATA :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Block Format :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Spare1 :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Spare2 :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Relative Block Address :- " + Integer.reverseBytes(raf.readInt()));
System.out.println(" SCN Base :- " + Integer.reverseBytes(raf.readInt()));
System.out.println(" SCN Wrap :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" Sequence :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Flag 01-NEW :- " + String.format("%02X ", raf.readByte()));
System.out.println(" CheckSum :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" Spare3 :- " + Short.reverseBytes(raf.readShort()));
// The next 24 bytes are Fixed Transaction Header
System.out.println("Transaction Header start :- " + BlockHeaderSize);
System.out.println(" Type 01-DATA 02-INDEX :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Spare 1 :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Spare 2 :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Spare 3 :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Object ID :- " + Integer.reverseBytes(raf.readInt()));
System.out.println(" Cleanout SCN Base :- " + Integer.reverseBytes(raf.readInt()));
System.out.println(" Cleanout SCN Wrap :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" Spare 4 :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Spare 5 :- " + String.format("%02X ", raf.readByte()));
System.out.println(" ITL Slots :- " + raf.readByte());
System.out.println(" UNKNOWN Byte :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Flag 00-FREE :- " + String.format("%02X ", raf.readByte()));
System.out.println(" ITL TX Feeelist Slot :- " + String.format("%02X ", raf.readByte()));
System.out.println(" Next Block On Free List :- " + Integer.reverseBytes(raf.readInt()));
// Each ITL Slot has 24 bytes allocated.
for(ITLSlot = 1; ITLSlot <= ITLSlots; ITLSlot++)
{
System.out.println("ITLSlot :- " + ITLSlot);
System.out.println(" Undo Segment :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" Undo Segment Slot :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" Transaction Sequence :- " + Integer.reverseBytes(raf.readInt()));
System.out.println(" Undo Block Address :- " + Integer.reverseBytes(raf.readInt()));
System.out.println(" Undo Sequence :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" Undo Record Number :- " + raf.readByte());
System.out.println(" Spare 1 :- " + raf.readByte());
System.out.println(" Flag :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" _ktbitun :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" Base :- " + Integer.reverseBytes(raf.readInt()));
}
DataHeaderStart = BlockHeaderSize + TransactionHeaderSize + ITLSlots*TransactionHeaderSize + 8;
raf.seek(StartByte + DataHeaderStart);
// The Data header contains details of the number of tables, rows, and free space in the block.
System.out.println("Data Header start :- " + DataHeaderStart);
System.out.println(" Flags :- " + String.format("%02X ", raf.readByte()));
NumTables = raf.readByte();
System.out.println(" Number of Tables :- " + NumTables);
System.out.println(" Number of Rows :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" Offset to Freespace Start :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" Offset to Freespace End :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" Available Space :- " + Short.reverseBytes(raf.readShort()));
System.out.println(" Available Space after Commit:- " + Short.reverseBytes(raf.readShort()));
TableDirectoryStart = DataHeaderStart + DataHeaderSize;
raf.seek(StartByte + TableDirectoryStart);
// The Table Directory has a list of tables in the block and row counts
System.out.println("Table Directory start :-" + TableDirectoryStart);
TotalTableRows = 0;
for(TableNum = 1; TableNum <= NumTables; TableNum++)
{
System.out.println(" Table :- " + TableNum);
System.out.println(" Offset :- " + Short.reverseBytes(raf.readShort()));
NumTableRows = Short.reverseBytes(raf.readShort());
TotalTableRows = TotalTableRows + NumTableRows;
System.out.println(" Number of Rows :- " + NumTableRows);
}
System.out.println(" Total Number of Rows :- " + TotalTableRows);
RowDirectoryStart = TableDirectoryStart + NumTables*TableDirectorySize;
raf.seek(StartByte + RowDirectoryStart);
// The Row Directory contains a list of rows in the block, with offsets to the first byte
System.out.println("Row Directory start :-" + RowDirectoryStart);
MinOffset = TotalTableRows*2;
MaxOffset = BlockSize - DataHeaderStart - 4;
RowOffset = 1;
OffsetCount = 0;
List RowOffsetList = new ArrayList();
while (RowOffset != 0 && OffsetCount < TotalTableRows)
{
RowOffset = Short.reverseBytes(raf.readShort());
if (RowOffset > MinOffset && RowOffset < MaxOffset)
{
OffsetCount = OffsetCount + 1;
System.out.println(" Row Offset " + OffsetCount + " :- " + RowOffset + "(+" + DataHeaderStart + ")");
RowOffsetList.add(RowOffset);
}
}
// Now go to each offset and get the row header which contains number of columns, row status etc
Iterator OffsetIterator = RowOffsetList.iterator();
RowNumber = 0;
while(OffsetIterator.hasNext())
{
RowOffset = (Short)OffsetIterator.next();
RowNumber = RowNumber + 1;
RowStart = DataHeaderStart + RowOffset;
raf.seek(StartByte + RowStart);
System.out.println(RowNumber + " Row Header start :-" + RowStart);
RowFlag = raf.readByte();
String RowFlagString = String.format("%02X ", RowFlag );
String RowFlagDecode = "";
// Translate the Row Flag to identify deleted rows, table data etc
switch (RowFlag) {
case 3: RowFlagDecode = "Junk ?";
break;
case 12: RowFlagDecode = "Chained ?";
break;
case 32: RowFlagDecode = "Chained ?";
break;
case 44: RowFlagDecode = "Table Data";
break;
case 60: RowFlagDecode = "Deleted";
break;
case 84: RowFlagDecode = "Cluster Key ?";
break;
case -84: RowFlagDecode = "Cluster Key ?";
break;
case 108: RowFlagDecode = "Cluster Data";
break;
case 124: RowFlagDecode = "Deleted";
break;
default: RowFlagDecode = "Invalid Flag";
break;
}
System.out.println(" Flags :- " + RowFlagString + " " + RowFlagDecode);
System.out.println(" Lock Status :- " + String.format("%02X ", raf.readByte()));
NumColumns = raf.readByte();
System.out.println(" Number of Columns :- " + NumColumns);
if (NumColumns == 0)
{
System.out.println("Integer :- " + Integer.reverseBytes(raf.readInt()));
System.out.println("Short :- " + Short.reverseBytes(raf.readShort()));
}
if (NumTables > 1)
{
System.out.println(" Cluster Byte :- " + String.format("%02X ", raf.readByte()));
}
if (RowFlag == 76)
{
System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
}
if (RowFlag == 12)
{
//System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
//System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
//System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
//System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
//System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
//System.out.println("Read Byte :- " + String.format("%02X ", raf.readByte()));
System.out.println("Integer :- " + Integer.reverseBytes(raf.readInt()));
System.out.println("Short :- " + Short.reverseBytes(raf.readShort()));
}
if (RowFlag != 3) {
for(ColNum = 1; ColNum <= NumColumns; ColNum++){
ColumnSize = raf.read();
if (ColumnSize == 255)
{
System.out.print("Column " + ColNum + " Bytes " + ColumnSize + " Data :- ");
System.out.println("NULL");
}
else if (ColumnSize == 254)
{
ColumnSize = Short.reverseBytes(raf.readShort());
System.out.print("Column " + ColNum + " Bytes " + ColumnSize + " Data :- ");
for(ByteNum = 0; ByteNum < ColumnSize; ByteNum++)
{
//System.out.print("" + String.format("%02X ", MyByte));
//MyByte = raf.readByte();
System.out.print("" + (char)raf.readByte());
}
System.out.println("");
}
else
{
System.out.print("Column " + ColNum + " Bytes " + ColumnSize + " Data :- ");
for(ByteNum = 0; ByteNum < ColumnSize; ByteNum++)
{
MyByte = raf.readByte();
//System.out.print("" + raf.readByte());
if (MyByte < 32 || MyByte > 126)
{
System.out.print("" + String.format("%02X ", MyByte));
}
else
{
MyChar = ((char)Integer.parseInt(String.format("%02X", MyByte), 16));
System.out.print("" + MyChar);
}
}
System.out.println("");
}
}
}
}
}
}
raf.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
Extracting data directly from Oracle datafiles using Java.
This allows me to extract any data I need direct from datafiles while bypassing audit and security at database level. Following is example output running the program to scan for object 10 which is a clustered table containing USER$ and extracting data including usernames and password hashes.
Being able to directly access data in blocks allows more detailed analysis and auditing, so I should be able to extract data for tables that have been truncated, extract deleted rows, or identify which blocks have changed since a specific SCN.
The source code can be found here :- http://blog.contractoracle.com/2013/02/java-program-to-extract-data-from.html
The output still contains a bit of Hex data, so more work is needed to convert to proper char and date formats.
Block 209 Contains Object ID 10Block Header start :- 0 Block Type 06-DATA :- 06 Block Format :- A2 Spare1 :- 00 Spare2 :- 00 Relative Block Address :- 4194513 SCN Base :- 954865 SCN Wrap :- 0 Sequence :- 01 Flag 01-NEW :- 06 CheckSum :- 10511 Spare3 :- 0
Transaction Header start :- 20 Type 01-DATA 02-INDEX :- 01 Spare 1 :- 00 Spare 2 :- 00 Spare 3 :- 00 Object ID :- 10 Cleanout SCN Base :- 954861 Cleanout SCN Wrap :- 0 Spare 4 :- 00 Spare 5 :- 00 ITL Slots :- 2 UNKNOWN Byte :- 00 Flag 00-FREE :- 03 ITL TX Feeelist Slot :- 00 Next Block On Free List :- 0
Data Header start :- 92 Flags :- 00 Number of Tables :- 2 Number of Rows :- 43 Offset to Freespace Start :- -1 Offset to Freespace End :- 108 Available Space :- 4711 Available Space after Commit:- 5519
Table Directory start :-106 Table :- 1 Offset :- 0 Number of Rows :- 21 Table :- 2 Offset :- 21 Number of Rows :- 22 Total Number of Rows :- 43
Row Directory start :-114 Row Offset 1 :- 8074(92) Row Offset 2 :- 8005(92) Row Offset 3 :- 7829(92) Row Offset 4 :- 7740(92) Row Offset 5 :- 7650(92) Row Offset 6 :- 7559(92) Row Offset 7 :- 7473(92) Row Offset 8 :- 7294(92) Row Offset 9 :- 7192(92) Row Offset 10 :- 7089(92) Row Offset 11 :- 6987(92) Row Offset 12 :- 6809(92) Row Offset 13 :- 6709(92) Row Offset 14 :- 6609(92) Row Offset 15 :- 6504(92) Row Offset 16 :- 6412(92) Row Offset 17 :- 6220(92) Row Offset 18 :- 6116(92) Row Offset 19 :- 6021(92) Row Offset 20 :- 5842(92) Row Offset 21 :- 5733(92) Row Offset 22 :- 5951(92) Row Offset 23 :- 4882(92) Row Offset 24 :- 7762(92) Row Offset 25 :- 7672(92) Row Offset 26 :- 7581(92) Row Offset 27 :- 7495(92) Row Offset 28 :- 4711(92) Row Offset 29 :- 7214(92) Row Offset 30 :- 7111(92) Row Offset 31 :- 7009(92) Row Offset 32 :- 5391(92) Row Offset 33 :- 6731(92) Row Offset 34 :- 6631(92) Row Offset 35 :- 6526(92) Row Offset 36 :- 6434(92) Row Offset 37 :- 6242(92) Row Offset 38 :- 6138(92) Row Offset 39 :- 6043(92) Row Offset 40 :- 5864(92) Row Offset 41 :- 5755(92) Row Offset 42 :- 5647(92) Row Offset 43 :- 5562(92)
23 Row Header start :-4974 Flags :- 108 6C Cluster Data Lock Status :- 00 Number of Columns :- 22 Cluster Byte :- 01 Column 1 Bytes 3 Data :- SYSColumn 2 Bytes 2 Data :- C1 02 Column 3 Bytes 16 Data :- DCB748A5BC5390F2Column 4 Bytes 1 Data :- 80 Column 5 Bytes 2 Data :- C1 04 Column 6 Bytes 7 Data :- xn03 1E 0B 08 7Column 7 Bytes 7 Data :- xq02 13 0B 3'Column 8 Bytes 7 Data :- xn03 1E 0C 07 9Column 9 Bytes 7 Data :- xn03 1E 0C 07 9Column 10 Bytes 1 Data :- 80 Column 11 Bytes 255 Data :- NULLColumn 12 Bytes 2 Data :- C1 02 Column 13 Bytes 255 Data :- NULLColumn 14 Bytes 255 Data :- NULLColumn 15 Bytes 1 Data :- 80 Column 16 Bytes 1 Data :- 80 Column 17 Bytes 22 Data :- DEFAULT_CONSUMER_GROUPColumn 18 Bytes 255 Data :- NULLColumn 19 Bytes 1 Data :- 80 Column 20 Bytes 255 Data :- NULLColumn 21 Bytes 255 Data :- NULLColumn 22 Bytes 62 Data :-S:CDD630F4165A338BF851D4552897EB41EB6A9FCF7587B366260E0352A7C7
28 Row Header start :-4803 Flags :- 108 6C Cluster Data Lock Status :- 02 Number of Columns :- 22 Cluster Byte :- 05 Column 1 Bytes 6 Data :- SYSTEMColumn 2 Bytes 2 Data :- C1 02 Column 3 Bytes 16 Data :- EED9B65CCECDB2E9Column 4 Bytes 1 Data :- 80 Column 5 Bytes 2 Data :- C1 04 Column 6 Bytes 7 Data :- xn03 1E 0B 08 8Column 7 Bytes 7 Data :- xq02 13 0B 3'Column 8 Bytes 7 Data :- xn03 1E 0C 07 9Column 9 Bytes 7 Data :- xn03 1E 0C 07 9Column 10 Bytes 1 Data :- 80 Column 11 Bytes 255 Data :- NULLColumn 12 Bytes 2 Data :- C1 02 Column 13 Bytes 255 Data :- NULLColumn 14 Bytes 255 Data :- NULLColumn 15 Bytes 1 Data :- 80 Column 16 Bytes 1 Data :- 80 Column 17 Bytes 22 Data :- DEFAULT_CONSUMER_GROUPColumn 18 Bytes 255 Data :- NULLColumn 19 Bytes 1 Data :- 80 Column 20 Bytes 255 Data :- NULLColumn 21 Bytes 255 Data :- NULLColumn 22 Bytes 62 Data :- S:83FDD7C19ABAB64415514C28AD798DDAA3FE619B10B1AEA79E8463DC2566
If anyone wants more details on the Oracle block structure you can check :-http://orafaq.com/papers/dissassembling_the_data_block.pdfand http://www.v3rity.com/OracleForensicsDataBlock.pdf
Oracle Control File Physical Structure
The Java program I wrote to extract the data below can be found here :- http://blog.contractoracle.com/2013/02/java-program-to-extract-records-from.html
The basics :-
- The controlfile is composed of blocks that are 2X database block size.
- Each block has a header of 18 bytes.
- Each block has a tail of 4 bytes.
- Each block is duplexed within the datafile for redundancy. E.g Block 31 is duplexed to Block 32.
Number of Blocks :- 595Block :- 2Tablespace Name :- WIN64
File information starts in block 31 and is contained in records of 524 bytes.The first 12 bytes is record header information, and the last 512 bytes is the file name. Byte 3 indicates the file type (3=LOG, 4=DATA, 7=TEMPFILE) and byte 5 appears to be file ID (unique for each file type)
Following is an example of extracting file details from a Control File :-
Number of Blocks :- 595Block :- 31Record :- 1Byte 1 :- 00 Byte 2 :- 00 File Type 3-LOG, 4-DATA, 7-TEMP :- 03 Byte 4 :- 00 File ID :- 3Byte 6 :- 00 Byte 7 :- 00 Byte 8 :- 00 Byte 9 :- 00 Byte 10 :- 00 Byte 11 :- 00 Byte 12 :- 00 File Name :- D:\ORACLE\WIN64\ONLINELOG\O1_MF_3_8L5T3995_.LOG
Record :- 2Byte 1 :- 00 Byte 2 :- 00 File Type 3-LOG, 4-DATA, 7-TEMP :- 03 Byte 4 :- 00 File ID :- 2Byte 6 :- 00 Byte 7 :- 00 Byte 8 :- 00 Byte 9 :- 00 Byte 10 :- 00 Byte 11 :- 00 Byte 12 :- 00 File Name :- D:\ORACLE\WIN64\ONLINELOG\O1_MF_2_8L5T38G3_.LOG
Record :- 3Byte 1 :- 00 Byte 2 :- 00 File Type 3-LOG, 4-DATA, 7-TEMP :- 03 Byte 4 :- 00 File ID :- 1Byte 6 :- 00 Byte 7 :- 00 Byte 8 :- 00 Byte 9 :- 00 Byte 10 :- 00 Byte 11 :- 00 Byte 12 :- 00 File Name :- D:\ORACLE\WIN64\ONLINELOG\O1_MF_1_8L5T37VY_.LOG
Record :- 4Byte 1 :- 00 Byte 2 :- 00 File Type 3-LOG, 4-DATA, 7-TEMP :- 04 Byte 4 :- 00 File ID :- 4Byte 6 :- 00 Byte 7 :- 00 Byte 8 :- 00 Byte 9 :- 00 Byte 10 :- 00 Byte 11 :- 00 Byte 12 :- 00 File Name :- D:\ORACLE\WIN64\DATAFILE\O1_MF_USERS_8L5T0ZC4_.DBF
Record :- 5Byte 1 :- 00 Byte 2 :- 00 File Type 3-LOG, 4-DATA, 7-TEMP :- 04 Byte 4 :- 00 File ID :- 3Byte 6 :- 00 Byte 7 :- 00 Byte 8 :- 00 Byte 9 :- 00 Byte 10 :- 00 Byte 11 :- 00 Byte 12 :- 00 File Name :- D:\ORACLE\WIN64\DATAFILE\O1_MF_UNDOTBS1_8L5T0ZBF_.DBF
Record :- 6Byte 1 :- 00 Byte 2 :- 00 File Type 3-LOG, 4-DATA, 7-TEMP :- 04 Byte 4 :- 00 File ID :- 2Byte 6 :- 00 Byte 7 :- 00 Byte 8 :- 00 Byte 9 :- 00 Byte 10 :- 00 Byte 11 :- 00 Byte 12 :- 00 File Name :- D:\ORACLE\WIN64\DATAFILE\O1_MF_SYSAUX_8L5T0ZB4_.DBF
Record :- 7Byte 1 :- 00 Byte 2 :- 00 File Type 3-LOG, 4-DATA, 7-TEMP :- 04 Byte 4 :- 00 File ID :- 1Byte 6 :- 00 Byte 7 :- 00 Byte 8 :- 00 Byte 9 :- 00 Byte 10 :- 00 Byte 11 :- 00 Byte 12 :- 00 File Name :- D:\ORACLE\WIN64\DATAFILE\O1_MF_SYSTEM_8L5T0Z83_.DBF
Record :- 8Byte 1 :- 00 Byte 2 :- 00 File Type 3-LOG, 4-DATA, 7-TEMP :- 07 Byte 4 :- 00 File ID :- 1Byte 6 :- 00 Byte 7 :- 00 Byte 8 :- 00 Byte 9 :- 00 Byte 10 :- 00 Byte 11 :- 00 Byte 12 :- 00 File Name :- D:\ORACLE\WIN64\DATAFILE\O1_MF_TEMP_8L5T3MYY_.TMP
Tablespace information starts at block 179 and is contained in 68 byte records. The first 8 bytes of the record are header information, with byte 3 being TABLESPACE#.
Following is an example of tablespace information extract from a Control File.
Block :- 179Record :- 1Byte 1 :- 00 Byte 2 :- 00 Tablespace ID :- 00 Byte 4 :- 00 Byte 5 :- 00 Byte 6 :- 00 Byte 7 :- 06 Byte 8 :- 00 Tablespace Name :- SYSTEM
Record :- 2Byte 1 :- 00 Byte 2 :- 00 Tablespace ID :- 01 Byte 4 :- 00 Byte 5 :- 00 Byte 6 :- 00 Byte 7 :- 06 Byte 8 :- 00 Tablespace Name :- SYSAUX
Record :- 3Byte 1 :- 00 Byte 2 :- 00 Tablespace ID :- 02 Byte 4 :- 00 Byte 5 :- 00 Byte 6 :- 00 Byte 7 :- 08 Byte 8 :- 00 Tablespace Name :- UNDOTBS1
Record :- 4Byte 1 :- 00 Byte 2 :- 00 Tablespace ID :- 04 Byte 4 :- 00 Byte 5 :- 00 Byte 6 :- 00 Byte 7 :- 05 Byte 8 :- 00 Tablespace Name :- USERS
Record :- 5Byte 1 :- 00 Byte 2 :- 00 Tablespace ID :- 03 Byte 4 :- 00 Byte 5 :- 00 Byte 6 :- 00 Byte 7 :- 04 Byte 8 :- 00 Tablespace Name :- TEMP
OTN Yathra - past the 1/2 way mark
I'm going to cheat tremendously and link to Lucas Jellema's blog about the tour, starting with http://technology.amis.nl/2013/01/27/otn-yathra-2013-the-six-city-oracle-tour-of-india/ as the overview.
Shamelessly linking Lucas' map of the Yathra.
I arrived in Chandigarh, Punjab on February 10, and was met by Oracle ACE and friend Aman Sharma. The trip: left Edmonton on Friday evening, landed London Heathrow Saturday morning, left LHR Saturday evening, arrived Delhi Sunday morning, left Delhi Sunday afternoon and into Chandigarh early evening.
On Monday and Tuesday, much to my delight, Aman showed me his home country and local community. And I did some shopping.
Wednesday, headed back to Delhi by taxi, and prepared for the gathering and presentations.
On Friday the team gathered at the Country Inn by Carlson in Saket. ACE Director team members are
Murali Vallath - our host and RAC ACED from India
Lucas Jellema - Development ACED, including SQL and Java
Edward Roske - BI/Hyperion ACED, presenting on Hyperion, Essbase and BI
Raj Mattamal - Development ACED, specializing in APEX
Hans Forbrich - Infrastructure ACED, presenting on LDAP, Cloud Control and Linux
In each city we have been fortunate to have ACEs such as Aman Sharma join us to present
Saturday was the first presentation day at the FMDI (Fertilizer Marketing Development Institute) which has wonderful classroom facilities.
Sunday, a quick tour of Delhi, off to the airport to Mumbai (formerly Bombay), and settle into the Holiday Inn.
Monday, presented using the facilities of the Women's Technical Institute, and then by car off to Pune.
Since we arrived in Pune Monday evening, Tuesday was a touring day out to Mahabalshawar and the head of the Krishna river.
Wednesday we presented in Pune at the Oracle offices.
Thursday by plane to Bangalore, and
Friday presented in the Oracle offices in Bangalore. Lucas has a fantastic half-way write-up at http://technology.amis.nl/2013/02/21/otn-yathra-2013-spreading-the-story-of-oracle-across-india-half-time/
Raj and Edward left today to return home.
Tomorrow Lucas and I will head to Hyderabad and continue this tour.
Using ODU to extract audit data from truncated AUD$ table.
### Evil DBA logs into the HR database, increases his salary, changes the password for SYS user, then logs into SYS and truncates the AUD$ table.
SQL> connect evil_dba/badpassword
Connected.
SQL> update hr.employees set salary = 100000.00 where first_name = 'EVIL' and last_name = 'DBA';
1 row updated.
SQL> alter user sys identified by badpassword;
User altered.
SQL> connect sys/badpassword as sysdba;
Connected.
SQL> select count(*) from dba_audit_trail;
COUNT(*)
----------
40
SQL> truncate table aud$;
Table truncated.
SQL> select count(*) from dba_audit_trail;
COUNT(*)
----------
0
SQL> exit
It is reported to the auditor that someone has hacked into the HR database, but no audit records can be found in AUD$ to identify the guilty person and nobody is sure of exactly what was changed. The Auditor could restore the database and recover it until the SCN prior to the truncate, but that may take a long time, so it would usually be faster to read data from the truncated table blocks if they are still in the datafile and have not been overwritten.
1. Configure control.txt so ODU has a list of datafiles
[oracle@rac1 odu]$ more control.txt
0 1 1 +DATA/test/datafile/system.266.807721615
1 2 2 +DATA/test/datafile/sysaux.273.807721615
2 3 3 +DATA/test/datafile/undotbs1.272.807721615
4 4 4 +DATA/test/datafile/users.271.807721615
6 5 5 +DATA/test/datafile/example.262.807721709
2. Configure asmdisk.txt so ODU has a list of ASM devices
[oracle@rac1 odu]$ more asmdisk.txt
0 /dev/sdb1 DATA1 4096 1048576
0 /dev/sdc1 FRA1 4096 1048576
3. Start ODU
[oracle@rac1 odu]$ ./odu
Oracle Data Unloader trial version 4.1.3
Copyright (c) 2008,2009,2010,2011 XiongJun. All rights reserved.
Web: http://www.oracleodu.com
Email: magic007cn@gmail.com
loading default config.......
byte_order little
block_size 8192
block_buffers 1024
error at line 3.
db_timezone -7
Invalid db timezone:-7
client_timezone 8
Invalid client timezone:8
asmfile_extract_path /asmfile
data_path data
lob_path /odu/data/lob
charset_name US7ASII
charset name 'US7ASII' not found,will use default charset ZHS16GBK
ncharset_name AL16UTF16
output_format text
lob_storage infile
clob_byte_order big
trace_level 1
delimiter |
unload_deleted no
file_header_offset 0
is_tru64 no
record_row_addr no
convert_clob_charset yes
use_scanned_lob yes
trim_scanned_blob yes
lob_switch_dir_rows 20000
db_block_checksum yes
db_block_checking yes
rdba_file_bits 10
compatible 10
load config file 'config.txt' successful
loading default asm disk file ......
grp# dsk# bsize ausize disksize diskname groupname path
---- ---- ----- ------ -------- --------------- --------------- -----------------
1 0 4096 1024K 10236 DATA1 DATA /dev/sdb1
2 0 4096 1024K 5114 FRA1 FRA /dev/sdc1
load asm disk file 'asmdisk.txt' successful
loading default control file ......
ts# fn rfn bsize blocks bf offset filename
---- ---- ---- ----- -------- -- ------ -----------------------------------------
0 1 1 8192 85760 N 0 +DATA/test/datafile/system.266.807721615
1 2 2 8192 60160 N 0 +DATA/test/datafile/sysaux.273.807721615
2 3 3 8192 5120 N 0 +DATA/test/datafile/undotbs1.272.807721615
4 4 4 8192 640 N 0 +DATA/test/datafile/users.271.807721615
6 5 5 8192 12800 N 0 +DATA/test/datafile/example.262.807721709
load control file 'control.txt' successful
loading dictionary data......done
loading scanned data......done
4. Unload the dictionary
ODU> unload dict
CLUSTER C_USER# file_no: 1 block_no: 208
TABLE OBJ$ file_no: 1 block_no: 240
CLUSTER C_OBJ# file_no: 1 block_no: 144
CLUSTER C_OBJ# file_no: 1 block_no: 144
found IND$'s obj# 19
found IND$'s dataobj#:2,ts#:0,file#:1,block#:144,tab#:3
found TABPART$'s obj# 576
found TABPART$'s dataobj#:576,ts#:0,file#:1,block#:3824,tab#:0
found INDPART$'s obj# 581
found INDPART$'s dataobj#:581,ts#:0,file#:1,block#:3872,tab#:0
found TABSUBPART$'s obj# 588
found TABSUBPART$'s dataobj#:588,ts#:0,file#:1,block#:3928,tab#:0
found INDSUBPART$'s obj# 593
found INDSUBPART$'s dataobj#:593,ts#:0,file#:1,block#:3968,tab#:0
found IND$'s obj# 19
found IND$'s dataobj#:2,ts#:0,file#:1,block#:144,tab#:3
found LOB$'s obj# 80
found LOB$'s dataobj#:2,ts#:0,file#:1,block#:144,tab#:6
found LOBFRAG$'s obj# 609
found LOBFRAG$'s dataobj#:609,ts#:0,file#:1,block#:4096,tab#:0
5. Check that the SYS.AUD$ table exists and record the TS# and OBJ#
ODU> desc sys.aud$
Object ID:383
Storage(Obj#=383 DataObj#=74574 TS#=0 File#=1 Block#=2296 Cluster=0)
NO. SEG INT Column Name Null? Type
--- --- --- ------------------------------ --------- ------------
1 1 1 SESSIONID NOT NULL NUMBER
2 2 2 ENTRYID NOT NULL NUMBER
3 3 3 STATEMENT NOT NULL NUMBER
4 4 4 TIMESTAMP# DATE
5 5 5 USERID VARCHAR2(30)
6 6 6 USERHOST VARCHAR2(128)
7 7 7 TERMINAL VARCHAR2(255)
8 8 8 ACTION# NOT NULL NUMBER
9 9 9 RETURNCODE NOT NULL NUMBER
10 10 10 OBJ$CREATOR VARCHAR2(30)
11 11 11 OBJ$NAME VARCHAR2(128)
12 12 12 AUTH$PRIVILEGES VARCHAR2(16)
13 13 13 AUTH$GRANTEE VARCHAR2(30)
14 14 14 NEW$OWNER VARCHAR2(30)
15 15 15 NEW$NAME VARCHAR2(128)
16 16 16 SES$ACTIONS VARCHAR2(19)
17 17 17 SES$TID NUMBER
18 18 18 LOGOFF$LREAD NUMBER
19 19 19 LOGOFF$PREAD NUMBER
20 20 20 LOGOFF$LWRITE NUMBER
21 21 21 LOGOFF$DEAD NUMBER
22 22 22 LOGOFF$TIME DATE
23 23 23 COMMENT$TEXT VARCHAR2(4000)
24 24 24 CLIENTID VARCHAR2(64)
25 25 25 SPARE1 VARCHAR2(255)
26 26 26 SPARE2 NUMBER
27 27 27 OBJ$LABEL RAW(255)
28 28 28 SES$LABEL RAW(255)
29 29 29 PRIV$USED NUMBER
30 30 30 SESSIONCPU NUMBER
31 31 31 NTIMESTAMP# TIMESTAMP(6)
32 32 32 PROXY$SID NUMBER
33 33 33 USER$GUID VARCHAR2(32)
34 34 34 INSTANCE# NUMBER
35 35 35 PROCESS# VARCHAR2(16)
36 36 36 XID RAW(8)
37 37 37 AUDITID VARCHAR2(64)
38 38 38 SCN NUMBER
39 39 39 DBID NUMBER
40 40 40 SQLBIND CLOB
41 41 41 SQLTEXT CLOB
42 42 42 OBJ$EDITION VARCHAR2(30)
6. Scan tablespace 0 for all extents associated with object 383
ODU> scan extent tablespace 0 object 383
scan extent start: 2013-02-18 15:32:18
scanning extent...
scanning extent finished.
scan extent completed: 2013-02-18 15:32:23
7. Unload records from the truncated table extents
ODU> unload table sys.aud$ object truncate
Auto mode truncated table.
Unloading table: AUD$,object ID: 383
Unloading segment,storage(Obj#=383 DataObj#=383 TS#=0 File#=1 Block#=2296 Cluster=0)
40 rows unloaded
The 40 records from truncated table AUD$ have now been extracted to file SYS_AUD$.txt and if we look at them we can see the last activity before the truncate. These records could be loaded into temporary tables using sqlldr.
20027|1|1||EVIL_DBA|rac1.test.com|pts/1|100|0||||||||||||||Authenticated by: DATABASE||oracle||||5||2013-02-18 07:22:50.091234|||0|24432|0000000000000000|||2105868124
20027|2|10||EVIL_DBA|rac1.test.com|pts/1|103|0|HR|EMPLOYEES|||||----------S-----|73953||||||||oracle||||49||2013-02-18 07:22:59.535459|||0|24432|0300150027030000||990832|2105868124||®<90><93>©¬<8f>
20027|3|11||EVIL_DBA|rac1.test.com|pts/1|43|0||SYS||||||||||||||oracle||||22||2013-02-18 07:23:25.381809|||0|24432|040014005B020000||990849|2105868124||<91>[¬ <84>®<92>ÃÃ
20027|4|1||EVIL_DBA|rac1.test.com|pts/1|101|0|||||||||1987|121|43|0|2013-02-18 07:23:37|||oracle|||||22|2013-02-18 07:23:37.142216|||0|24432||||2105868124
The records above relate to the actions of EVIL_DBA and the ACTION# can give a clue to their activities.
ACTION ACTION_NAME---------- ---------------------------- 100 LOGON 103 SESSION REC 43 ALTER USER 101 LOGOFF
So just from the text file we can see the EVIL_DBA connected to the database, did something to the HR.EMPLOYEES table, altered user SYS, then logged out. The extracted data contains additional details which can be mapped to other database tables (USER$, OBJ$ etc) to give a more comprehensive picture. There is also LOB data which can be loaded which contains the actual commands executed.
In this example ODU extracted audit data directly from truncated table blocks on ASM devices without logging in to the database, so this could also work if the database was shutdown, or corrupted.
Oracle database on ZFS - reduce costs through deduplication
I recently had a customer who wanted to reduce storage costs. They had a 13TB PROD database, and needed 40 copies of the database to support DEV and TEST activities, totalling over 500TB.
We did some diagnostics on the databases, and worked out that only about 10% of data blocks in a clone were ever updated, so deduplicating at filesystem or storage block level could result in 90% saving. The client was a large conservative bank, so the preference was to use existing products from large vendors. We evaluated products from Delphix, NetApp, and Oracle, and ended up choosing the NetApp product due to a combination of functionality, cost, and in-house skills.
Since then I have been wondering if we could have done it cheaper, and have done a quick evaluation of the ZFS filesystem which does allow deduplication at filesystem level.
ZFS is a "copy on write" filesystem which is included in the Solaris operating system, and has been ported to other operating systems.
You can find more information here :- http://docs.oracle.com/cd/E19253-01/819-5461/index.html
The following test was done using Oracle VirtualBox running Oracle Solaris 11 x86.
Create the "dbpool" disk pool using a storage device by running "zpool create"
root@sol11:~# zpool create dbpool /dev/dsk/c7t2d0p0
Configure dbpool with recordsize 8k which is optimal for databases with 8k block size, and set mount point to /u01/data
Refer to - http://www.oracle.com/technetwork/server-storage/solaris10/config-solaris-zfs-wp-167894.pdf
root@sol11:~# zfs create -o recordsize=8k -o mountpoint=/u01/data dbpool/data
root@sol11:~# zfs set logbias=throughput dbpool/data
Now confirm the settings using "zfs get"
root@sol11:~# zfs get primarycache,recordsize,logbias dbpool/data
NAME PROPERTY VALUE SOURCE
dbpool/data primarycache all default
dbpool/data recordsize 8K local
dbpool/data logbias throughput local
Then create 11g database "test" in /u01/data
oracle@sol11:/u01/data/test$ pwd/u01/data/testoracle@sol11:/u01/data/test$ ls -l
total 2720998
-rw-r----- 1 oracle oinstall 9748480 Feb 14 23:55 control01.ctl
-rw-r----- 1 oracle oinstall 9748480 Feb 14 23:55 control02.ctl
-rw-r----- 1 oracle oinstall 52429312 Feb 14 23:25 redo01.log
-rw-r----- 1 oracle oinstall 52429312 Feb 14 23:52 redo02.log
-rw-r----- 1 oracle oinstall 52429312 Feb 14 23:25 redo03.log
-rw-r----- 1 oracle oinstall 461381632 Feb 14 23:46 sysaux01.dbf
-rw-r----- 1 oracle oinstall 702554112 Feb 14 23:51 system01.dbf
-rw-r----- 1 oracle oinstall 20979712 Feb 14 23:25 temp01.dbf
-rw-r----- 1 oracle oinstall 31465472 Feb 14 23:51 undotbs01.dbf
-rw-r----- 1 oracle oinstall 5251072 Feb 14 23:25 users01.dbf
oracle@sol11:/u01/data/test$ du -k .1360500 .
oracle@sol11:/u01/data/test$ df -k .
Filesystem 1024-blocks Used Available Capacity Mounted on
dbpool/data 10257408 1360529 8896686 14% /u01/data
We can see that database test is using about 1.3GB of disk in /u01/data
The "zfs list" command confirms space used is 1.3GB.
oracle@sol11:~$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
dbpool 1.30G 8.48G 31K /dbpool
dbpool/data 1.30G 8.48G 1.30G /u01/data
Shutdown database test and run the "zfs snapshot" command to create a snapshot of the dbpool filesystem data called "backup"
root@sol11:/u01/data# zfs snapshot dbpool/data@backup
The contents of the snapshot can be viewed in the ".zfs" directory.
root@sol11:/u01/data/.zfs# find .
.
./snapshot
./snapshot/backup
./snapshot/backup/test
./snapshot/backup/test/redo02.log
./snapshot/backup/test/control02.ctl
./snapshot/backup/test/redo03.log
./snapshot/backup/test/sysaux01.dbf
./snapshot/backup/test/temp01.dbf
./snapshot/backup/test/system01.dbf
./snapshot/backup/test/control01.ctl
./snapshot/backup/test/users01.dbf
./snapshot/backup/test/redo01.log
./snapshot/backup/test/undotbs01.dbf
./shares
Clone the backup by running "zfs clone" to create "clone1"
root@sol11:/u01/data# zfs clone dbpool/data@backup dbpool/clone1
Mount clone1 to /u01/clone1 using "zfs set mountpoint"
root@sol11:/u01/data# zfs set mountpoint=/u01/clone1 dbpool/clone1
root@sol11:/u01/data/.zfs# cd /u01/clone1/test
root@sol11:/u01/clone1/test# ls
control01.ctl redo01.log redo03.log system01.dbf undotbs01.dbf control02.ctl redo02.log sysaux01.dbf temp01.dbf users01.dbf
Using "zfs list" we can see that the clone only uses 21K (for 1.3GB of files)
root@sol11:/u01/clone1/test# zfs list
NAME USED AVAIL REFER MOUNTPOINT
dbpool 1.30G 8.48G 31K /dbpool
dbpool/data 1.30G 8.48G 1.30G /u01/data
dbpool/clone1 21K 8.48G 1.30G /u01/clone1
Now we can prepare initclone1.ora, create audit file directory, and recreate the controlfile for the clone database.
oracle@sol11:~$ export ORACLE_SID=clone1
oracle@sol11:~$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Fri Feb 15 00:35:13 2013
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL> @recrctl_clone1.sql
ORACLE instance started.
Total System Global Area 1043886080 bytes
Fixed Size 2217624 bytes
Variable Size 603982184 bytes
Database Buffers 432013312 bytes
Redo Buffers 5672960 bytes
Control file created.
Database altered.
SQL> select name from v$database;
NAME
---------
CLONE1
SQL> select name from v$datafile;
NAME
-----------------------------------------------------------------
/u01/clone1/test/system01.dbf
/u01/clone1/test/sysaux01.dbf
/u01/clone1/test/undotbs01.dbf
/u01/clone1/test/users01.dbf
We now have databases test and clone1 running, with block level deduplication.
oracle@sol11:~$ ps -ef | grep pmon
oracle 2800 1 0 00:33:28 ? 0:00 ora_pmon_test oracle 2920 1 0 00:35:18 ? 0:00 ora_pmon_clone1
After the "open resetlogs" the clone is now using 120MB (for an additional 1300MB database)
oracle@sol11:~$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
dbpool 1.42G 8.36G 31K /dbpool
dbpool/data 1.30G 8.36G 1.30G /u01/data
dbpool/clone1 120M 8.36G 1.30G /u01/clone1
oracle@sol11:~$ df -k /u01/clone1
Filesystem 1024-blocks Used Available Capacity Mounted on
dbpool/clone1 10257408 1361036 8770425 14% /u01/clone1
oracle@sol11:~$ df -k /u01/data
Filesystem 1024-blocks Used Available Capacity Mounted on
dbpool/data 10257408 1360530 8770425 14% /u01/data
Oracle database on BTRFS - reduce costs through deduplication
We did some diagnostics on the databases, and worked out that only about 10% of data blocks in a clone were ever updated, so deduplicating at filesystem or storage block level could result in 90% saving. The client was a large conservative bank, so the preference was to use existing products from large vendors. We evaluated products from Delphix, NetApp SnapManager for Oracle, and Oracle ZFS appliance, and ended up choosing the NetApp product due to a combination of functionality, cost, and in-house skills.
Since then I have been wondering if we could have done it cheaper, and have done a quick evaluation of the BTRFS filesystem which does allow deduplication at filesystem level.
BTRFS is a "copy on write" filesystem and is listed as EXPERIMENTAL, so I don't think it is ready to use in a PROD environment, but is worth testing to support DEV and TEST environments depending on the cost of downtime.
You can find more information here :- http://btrfs.wiki.kernel.org
The following test was done using Oracle VirtualBox running Oracle Linux 6 2.6.39-300.28.1.el6uek.x86_64. BTRFS is included in the Linux kernel, but I needed to install package btrfs-progs :-
yum install btrfs-progs
Format the storage device using mkfs.btrfs :-
[root@btrfs ~]# mkfs.btrfs /dev/sdb
WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using
fs created label (null) on /dev/sdb
nodesize 4096 leafsize 4096 sectorsize 4096 size 12.00GB
Btrfs Btrfs v0.19
You can see that we formatted a 12GB device with devault 4K block size. Block size can impact performance, so if the BTRFS filesystem will only contain database files with 8K blocks, I suspect you would see better performance if the filesystem was also formatted to use 8K blocks.
Now mount the filesystem :-
[root@btrfs ~]# mount /dev/sdb /btrfs1
And add an entry to /etc/fstab :-
/dev/sdb /btrfs1 btrfs defaults 0 0
Using the "btrfs filesystem show" and "btrfs filesystem df" commands we can see the usage. Note that the filesystem has reserved 2GB out of 12GB.
[root@btrfs /]# btrfs filesystem show /dev/sdbLabel: none uuid: f6f7d6c7-14ca-4c01-8c33-3066fc7f4ad3 Total devices 1 FS bytes used 92.00KB devid 1 size 12.00GB used 2.04GB path /dev/sdb
Btrfs Btrfs v0.19
[root@btrfs /]# btrfs filesystem df /btrfs1Data: total=8.00MB, used=64.00KBSystem, DUP: total=8.00MB, used=4.00KBSystem: total=4.00MB, used=0.00Metadata, DUP: total=1.00GB, used=24.00KBMetadata: total=8.00MB, used=0.00
We can also see that usage at OS level.
[root@btrfs /]# df -k /btrfs1Filesystem 1K-blocks Used Available Use% Mounted on/dev/sdb 12582912 120 10457024 1% /btrfs1
Create some directories for the Oracle binaries and datafiles :-
[root@btrfs ~]# cd /btrfs1[root@btrfs u01]# mkdir app[root@btrfs u01]# mkdir data[root@btrfs u01]# chown oracle:dba app data
And now we can start using some of the BTRFS functionality. Create subvolumes for the source database binaries and datafiles. BTRFS allows us to take snapshots of the subvolumes at a later time, so it is best to plan subvolume layout before installing files.
Create subvolumes using "btrfs subvolume create" :-
[root@btrfs btrfs1]# su - oracle
[oracle@btrfs app]$ cd /btrfs1/app
[oracle@btrfs app]$ btrfs subvolume create source
Create subvolume './source'
[oracle@btrfs app]$ cd /btrfs1/data
[oracle@btrfs data]$ btrfs subvolume create source
Create subvolume './source'
Install the Oracle binaries and database in the subvolumes.
Oracle binaries in /btrfs1/app/source and create database "source" in /btrfs1/data/source.This install used about 5.6GB of disk.
[oracle@btrfs source]$ df /btrfs1
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb 12582912 5661424 4938480 54% /btrfs1
Database source currently has 1.1GB of datafiles, using total 1.3GB in the filesystem.
SQL> select sum(bytes)/1024/1024/1024 from dba_data_files;
SUM(BYTES)/1024/1024/1024------------------------- 1.11816406
[oracle@btrfs source]$ pwd
/btrfs1/data/source
[oracle@btrfs source]$ du -k .
1346188 .
[oracle@btrfs source]$ ls
control01.ctl redo01.log redo03.log system01.dbf undotbs01.dbf
control02.ctl redo02.log sysaux01.dbf temp01.dbf users01.dbf
If we want to take a cold backup of the source database before we run some tests, all we need to do is shutdown and create a snapshot. First check how much space is available in the filesystem and how much is used by the source database.
[oracle@btrfs data]$ pwd
/btrfs1/data
[oracle@btrfs data]$ df -k .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb 12582912 5661820 4938124 54% /btrfs1
[oracle@btrfs data]$ du -sk *
1346188 source
Create the snapshot using "btrfs subvolume snapshot".
[oracle@btrfs data]$ btrfs subvolume snapshot source backup1
Create a snapshot of 'source' in './backup1'
And we can see that the snapshot did not use any additional disk, but we now have a second set of datafiles in directory backup1.
[oracle@btrfs data]$ df -k .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb 12582912 5661820 4938124 54% /btrfs1
[oracle@btrfs data]$ du -sk *
1346188 source
1346188 backup1
We can now start the database and run our tests, knowing we have a backup available. Because BTRFS is a "copy on write" filesystem, any block that is changed in the source database will be copied to a new location, and the original left in place as part of the backup1 snapshot. Since the binaries are also installed in a BTRFS subvolume, we could also take a snapshot of them in case we wanted to test patching or relinking.
If we needed additional test environments we can use the files from a snapshot, recreate the controlfile, and open resetlogs.
To create a new clone database called "target", take a snapshot :-
[oracle@btrfs data]$ btrfs subvolume snapshot source targetCreate a snapshot of 'source' in './target'
Then create the parameter file inittarget.ora, audit directories, and script to recreate the controlfile.
[oracle@btrfs data]$ export ORACLE_SID=target[oracle@btrfs data]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Thu Feb 14 11:44:17 2013
Copyright (c) 1982, 2009, Oracle. All rights reserved.Connected to an idle instance.
SQL> @/home/oracle/recrctl_target.sqlORACLE instance started.
Total System Global Area 1570009088 bytesFixed Size 2213696 bytesVariable Size 922749120 bytesDatabase Buffers 637534208 bytesRedo Buffers 7512064 bytes
Control file created.Database altered.Tablespace altered.SQL> select name from v$database;
NAME---------TARGET
SQL> select name from v$datafile;
NAME-----------------------------------------------------------------/btrfs1/data/target/system01.dbf/btrfs1/data/target/sysaux01.dbf/btrfs1/data/target/undotbs01.dbf/btrfs1/data/target/users01.dbf
At the OS level we can see both databases running, against their own files, with very little additional storage consumed.
[oracle@btrfs data]$ ps -ef | grep pmonoracle 5581 1 0 11:44 ? 00:00:00 ora_pmon_targetoracle 5708 1 0 11:42 ? 00:00:00 ora_pmon_source
[oracle@btrfs data]$ df /btrfs1Filesystem 1K-blocks Used Available Use% Mounted on/dev/sdb 12582912 5792420 4808972 55% /btrfs1
[oracle@btrfs data]$ du -sk *1346188 source1346188 backup11346828 target
The result :-
Using BTRFS snapshots, we now have two running databases, and one cold backup, with deduplication at filesystem block level. If we had created the backup and clone in a normal filesystem (ext3 etc) we would have used almost 2.7GB of disk, but with deduplication we have only used about 0.1GB. We now have 4.8GB free on the filesystem, so could create many more databases to keep the developers happy without making the accountants unhappy. Disk usage will increase as blocks are updated, but unless 100% of the blocks are updated we will still see an advantage from deduplication.
Obviously this was a small example to demonstrate the use of BTRFS snapshots and deduplication, but the same will apply with large databases, with considerable savings. In the example above we shutdown the source database to take a snap, but if the database is in archivelog mode we could just take a hot backup, or we could run a Dataguard Standby, and defer recovery to take a snapshot. My tests were only run on a small VM which is not suitable for performance testing, so additional testing on physical hardware would be useful if database performance is important.
The benefits of block level deduplication are obvious, with significant cost savings on storage, but we should also consider the downside. There is an overhead to deduplication, so database users are likely to see a degrade in performance. It should also be noted that BTRFS is currently listed as experimental, so a bit more work needs to be done before it is ready for important systems, but it is worth testing to see what is possible. In addition to the deduplication features offered by BTRFS, the filesystem also supports compression which would further reduce disk usage.
linux servers that do not boot up on /dev/sda make me grumpy!
New Dell servers on Oracle Linux 6.2 ... using red hat compatible kernel aka:
2.6.32-220.el6.x86_64
The new dell boxes have an internal raid controller ( Perc H700 ? ) and are connected to EMC direct attached storage using emulex HBA's. All operating system and linux software installed on internal disks ( mirrored ) ... all database stuff going to be on EMC storage.
Our new servers had a very strange set of behaviors when booting up from internal disks. Most of the time they would boot up and see the first internal raid drive as /dev/sda ( so /boot partition is on /dev/sda1 ) ... but at other times they would see /boot on a different device ( for example /boot on /dev/sdi1 ).
The entries in /etc/fstab for 6.x systems now apparently use UUID entries ... ( for example ):
UUID=e6964e7e-62a9-450c-a66e-a411b40a4ed9 / ext4 defaults 1 1
So when the servers came up on a different boot drive they would run ok ... looking strange ... but we ran into a different problem using ( still trying to use ... don't get me started ) a backup linux imaging product ( Acronis ) that just did not understand at all backing up or restoring a system when it was not running from /dev/sda.
Logically it seemed pretty straight forward. Force a way somehow so that first internal drive is always on /dev/sda.
We pay Oracle for linux support so open a ticket with them. We now have a solution but it took a very very long time for oracle linux support to come up with solution. Might be a by product of working with a junior level person ... might be from a strange new problem. Tried all sorts of stuff initially with udev rules ... nope none of this worked at all.
Eventually the solution that is now deployed and working involved removing lpfc ( emulex ? HBA support ? ) modules from the initramfs image that is invoked on first boot up. Of course we run stuff on EMC storage and yes eventually after booting our HBA's are working just fine.
Anyway here is what we had to do to get this working in our 6.2 redhat compatible kernel environment. It is some low level pretty esoteric linux stuff and well beyond what I wanted to have to deal with ... but it is working nicely.
Step 1: get the latest available dracut rpm's and stick them into directory for updating:
dracut]# ls -ltr | more
total 140-rw-r--r-- 1 root root 114884 Jan 11 13:29 dracut-004-284.0.1.el6_3.1.noarch.rpm-rw-r--r-- 1 root root 21524 Jan 11 13:29 dracut-kernel-004-284.0.1.el6_3.1.noarch.rpm
Step2: Update to latest rpm's ... ( not sure why the 100% 50% 100% stuff gone from below )
rpm -Uvh dracut*.rpm | more
warning: dracut-004-284.0.1.el6_3.1.noarch.rpm: Header V3 RSA/SHA256 Signature,YPreparing... ##################################################dracut ##################################################dracut-kernel ##################################################
Step 3: Verify installation of new dracut rpms
# rpm -qa | grep dracut
dracut-kernel-004-284.0.1.el6_3.1.noarchdracut-004-284.0.1.el6_3.1.noarch
Step 4: Now change to the /boot directory and create a new initramfs image file.
Use this command: dracut --omit-drivers lpfc initramfs-$(uname -r)-no-lpfc.img
# dracut --omit-drivers lpfc initramfs-$(uname -r)-no-lpfc.img
Step 5: Check img file created ...
# ls -ltr *.img | more
-rw-r--r-- 1 root root 15875365 Jan 11 13:39 initramfs-2.6.32-220.el6.x86_64-no-lpfc.img
Step 6: Verify that no lpfc moduels are in the new initramfs image file
# zcat *no-lpfc.img | cpio -t | grep lpfc | more
87575 blocks
Agove output is correct ... if you see something like this ... lpfc is still in the img file:
lib/modules/2.6.32-220.el6.x86_64/kernel/drivers/scsi/lpfclib/modules/2.6.32-220.el6.x86_64/kernel/drivers/scsi/lpfc/lpfc.ko
Final step ... create an entry in /etc/grub.conf to point to the new initramfs img file.
Copy the current /etc/grub.conf to something else.
Change the default= value to point to new lines at the end of the /etc/grub.conf file. My change was to change default=1 to default=2.
Add in new lines at the end of grub.conf ... my entries looked like this ( this is just part of my grub.conf file ).
title Oracle Linux Server (2.6.32-220.el6.x86_64) root (hd0,0)kernel /vmlinuz-2.6.32-220.el6.x86_64 ro root=UUID=e6964e7e-62a9-450c-a66e-a411b40a4ed9 rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD quiet SYSFONT=latarcyrheb-sun16 rhgb crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DMinitrd /initramfs-2.6.32-220.el6.x86_64-no-lpfc.img
***
At this point the change should be complete ... start rebooting and test ... do we always come up on /dev/sda?
For me yes this finally fixed the problem.
My guess is that I will have to revisit all of this when doing next OL linux update. Probably going to sit out 6.3 and eventually move from 6.2 up to 6.4 ... probably will have to rebuild new initramfs image and of course test.
I hope this saves some other poor geek time ... it sure took us and oracle support a long time to get this working correctly!


