DBA Blogs

Battling Bigfile Backup Bottlenecks

Don Seiler - Wed, 2014-03-26 13:44
Last Friday I kicked off a database backup to an NFS destination, using the standard "backup as compressed backupset database" syntax. Loyal readers of this blog may recall that I'm the proud custodian of a 25 Tb database, so this backup normally takes a few days, with an expected completion on Monday morning. However it was still running on Wednesday, and reviewing the logs I saw that there was just 1 channel (of the original 8) still running. The backup file that this channel was writing happened to include our largest bigfile datafile, which weighs in at nearly 8 Tb. Reviewing my new backup script I realized that I had neglected to specify a SECTION SIZE parameter. An example of its usage is:

RMAN> backup as compressed backupset
2> section size 64G
3> database;

Without it, RMAN has decided to create a backup piece that bundled my 8 Tb datafile with a few others and then write it out to disk on one channel. Obviously this isn't what we wanted.

I'm not a big fan of bigfile tablespaces, primarily because you lose the benefits of parallelism when huge files can only be handled by a single channel, as with datafile copy operations and backups. In 11g, however, Oracle has introduced multi-section backups via the SECTION SIZE option for RMAN backups. This option tells RMAN to break a large file into sections of the specified size so that the work can be done in parallel. If the specified size is larger than the file, then it is simply ignored for that file.

There is a limitation in that the file can be split into a maximum of 256 sections. So, if you specify a section size that would result in more than 256 sections being created, Oracle RMAN will increase the size so that exactly 256 sections are created. This is still enforced today in Oracle 12c.

Another limitation in Oracle 11g is that multi-section backups cannot be done with image copy backups. Those must still be done as a whole file and so can still be a huge bottleneck. However this is no longer a problem in Oracle 12c, and multi-section image copy backups are possible. I'm looking forward to using this as we also use image copy backups as part of our recovery strategy.

To highlight the parallel benefits, I ran a compressed backup of a 1.5 Tb bigfile tablespace using 8 channels. The first one does NOT use section size and so only goes over one channel:

Starting backup at 2014/03/25 14:35:41
channel c1: backup set complete, elapsed time: 17:06:09
Finished backup at 2014/03/26 07:41:51

The second one uses a section size of 64 Gb (otherwise same command & file):

Starting backup at 2014/03/26 07:41:52
Finished backup at 2014/03/26 09:48:33

You can see the huge impact made by making use of the multi-section backup option. A single channel took over 17 hours to back it up. Using 8 channels with a section size of 64 Gb took only just over 2 hours. Eyeballing the log shows an average of around 6 minutes per section. Definitely much better than waiting for a single channel to do all the work when the rest of the system is waiting.
Categories: DBA Blogs

Why Is My MView Log Not Purging?

Don Seiler - Sun, 2014-03-16 15:50
A few weeks ago we saw one of our tablespaces growing at a rate much higher than the others. Taking a look we saw that the biggest users of space were two materialized view logs, one being 110 Gb and the other 60 Gb. These logs were in place to facilitate the fast refresh of two materialized views, one for each log/table. These materialized views did some aggregations (sum) throughout the day on some important base table data. The fast refreshes were completing successfully many times a day, but the logs were not being purged as expected.

In our case, there was only one mview performing a fast refresh on those base tables, so the mview logs should have been completely purged after each refresh. They certainly shouldn't be growing to over 100+ Gb. Looking at the data in the mview log, all records had a SNAPTIME$$ value of "4000/01/01 00:00:00", which is the default value for records in the mview log that have not been refreshed. Once they are refreshed, the SNAPTIME$$ value gets set to SYSDATE and can then be evaluated for purging.

But why was this value not being updated after refresh?

For those of you unfamiliar with the role of materialized view logs, I'll share this primer from Tim Hall via his excellent Oracle-Base article:

Since a complete refresh involves truncating the materialized view segment and re-populating it using the related query, it can be quite time consuming and involve a considerable amount of network traffic when performed against a remote table. To reduce the replication costs, materialized view logs can be created to capture all changes to the base table since the last refresh. This information allows a fast refresh, which only needs to apply the changes rather than a complete refresh of the materialized view.
Digging deeper led me to MOS DocId 236233.1, which tells us that Oracle compares the MLOG$_<TABLE_NAME>.SNAPTIME$$ value against the SYS.SLOG$SNAPTIME:

Rows in the MView log are unnecessary if their refresh timestamps MLOG$<table_name>.SNAPTIME$$ are older or equal than the oldest entry in SLOG$.SNAPTIME for this log.

Here's where we saw the real problem.

 2 from sys.slog$
 3 where mowner='FOO' and master='BAR';

 no rows selected

If the purge mechanism checks SLOG$.SNAPTIME then of course nothing is going to happen, as the materialized view is NOT registered in SYS.SLOG$!

We re-created the MVIEW from scratch on our development database and had the same results, which indicates it's something systemic in Oracle so we opened an SR. After the standard back-and-forth of trying the same things over and over, Oracle Support said that this was actually expected behavior:
This mview is defined as fast refreshable with aggregates. The mv log is defined with PRIMARY KEY INCLUDING NEW VALUES.

In order to support fast refresh the mv log should include ROWID as well. Please review the Restrictions on Fast Refresh on Materialized Views with Aggregates located here:
http://docs.oracle.com/cd/E11882_01/server.112/e25554/basicmv.htm#DWHSG8203 There are additional restrictions depending on the operations performed. As an example, SEQUENCE should also need to be added to the mv log if direct loads are performed on new_invoice_record.This turned out to be the case. We recreated the mview log with the ROWID specification, then re-created the materialized view and, sure enough, the mview was registered in SYS.SLOG$ and refreshes were purging the log as expected.

I was more than a little frustrated then that Oracle would let us create the MVIEW without any warnings or errors in the first place. The database obviously detected something wrong since it wouldn't register them in SYS.SLOG$. Their last response was that, since the MVIEW itself was refreshing successfully, no error should be reported. This fails to address the question for me, so I'm going to push back a little harder and will share what I find.

For now, though, we need to schedule a maintenance window to recreate these materialized views and their logs and see if we can reclaim some disk space afterward (perhaps a future post!).
Categories: DBA Blogs

What to Expect When You're Changing the DB_UNIQUE_NAME

Don Seiler - Wed, 2014-03-12 18:31
I recently had to change the db_unique_name of a database to make it jive with our typical database/DataGuard naming policy of appending the datacenter location. For the sake of this post let's say it was changed from ORCL to ORCL_NYC, since this database is in our fictional New York City datacenter.

I did a quick set of tests and thought I'd share the findings to save anyone any unpleasant surprises. Here are the things to expect when changing DB_UNIQUE_NAME.

Change the Parameter Value
First we obviously have to change the value. How we do so is important.
The command:
alter system set db_unique_name=orcl_nyc scope=spfile;

will result in a db_unique_name of ORCL_NYC, in all uppercase, which is used for the path changes we'll discuss later. However using quotes instead:
alter system set db_unique_name='orcl_nyc' scope=spfile;

will result in a lowercase orcl_nyc value used in the parameter and some paths. In either case, the fun begins when you next restart the instance!

ADR Location (i.e. alert log)
The ADR location appends the DB_UNIQUE_NAME to the location specified by the DIAGNOSTIC_DEST initialization parameter (defaulting to the $ORACLE_BASE environment variable value). When you restart after setting the DB_UNIQUE_NAME, your ADR location will be in a new location using the new DB_UNIQUE_NAME. Probably of most interest to you is that this means your alert log location will move, so any tools or scripts that referenced that file directly (e.g. error-scraping scripts or log-rotation jobs) will need to be updated. 

Regardless of quotes or not, the ADR path always used a lowercase string in the path.


SQL> show parameter db_unique_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      ORCL
SQL> show parameter background_dump_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_dump_dest                 string      /oracle/app/diag/rdbms/orcl/or


SQL> show parameter db_unique_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      ORCL_NYC
SQL> show parameter background_dump_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_dump_dest                 string      /oracle/app/diag/rdbms/orcl_ny

Datafile and Online Redo Log OMF Location
If you have the db_create_file_dest (and optionally db_create_online_log_dest_N) parameter set, then Oracle will use the DB_UNIQUE_NAME value in the OMF location for any new datafiles and redo logs created after the change, assuming a full path isn't specified.

SQL> alter tablespace users add datafile size 10m;

Tablespace altered.

SQL> select file_name from dba_data_files
  2  where tablespace_name='USERS';


In this case, Oracle will use an uppercase string regardless of whether or not the DB_UNIQUE_NAME is in upper or lower case. Note that existing files won't be affected, this will only apply to new files.

If this database is part of a DataGuard configuration, you'll want to be sure to update your db_file_name_convert and log_file_name_convert parameters to point to the new location.

FRA (Backups, Archivelogs, Flashback Logs)
In the same spirit of OMF, the FRA will also change locations. Similar to the previous case, the uppercase value of the DB_UNIQUE_NAME is used in the path, regardless of the original case. So, after a change and a couple of log switches, you would see something like this:

SQL> select name from v$archived_log;






Again, this will only affect newly created files. Existing backups, archivelogs and flashback logs will not be affected, they remain cataloged in their present locations and will be accessed just fine. RMAN will delete them when needed (or commanded) and then you could choose to delete the empty directories.

Oracle Wallet
I admit this one I didn't think of. If you use an Oracle Wallet, and do not specify a location in the sqlnet.ora file, Oracle looks in the default location of $ORACLE_BASE/admin/<DB_UNIQUE_NAME>/wallet/. Even more interesting, the DB_UNIQUE_NAME value is case sensitive. So you'll need to be aware of this when moving your wallet files to the new location. Here is a quick look at my findings on this matter:

-- db_unique_name set with no quotes
SQL> select wrl_parameter from v$encryption_wallet;


-- db_unique_name set with quotes, lowercase
SQL> select wrl_parameter from v$encryption_wallet;


I can't say if this makes a difference when running on Windows, which I believe is case-insensitive. But on Linux it makes a difference.

That wraps this one up. I hope it helps a few people out and saves them an hour or two of head-scratching or worse. If there's anything I forgot, please let me know and I'll update this post.
Categories: DBA Blogs

Oracle Direct NFS and Infiniband: A Less-Than-Perfect Match

Don Seiler - Wed, 2014-03-05 19:54
Readers of an earlier post on this blog will know about my latest forays into the world of Direct NFS. Part of that means stumbling over configuration hiccups or slamming into brick walls when you find new bugs.

To quickly re-set the table, my organization purchased the Oracle ZFS Storage Appliance (ZFSSA) 7420. Oracle sold us on the Infiniband connectivity as a way to make a possible future transition to Exadata easier. However the pre-sales POC testing was done over 10gb Ethernet (10gigE). So it was that everything (including their Infiniband switches and cables) arrived at the datacenter and was installed and connected by the Oracle technicians. There were a few initial hiccups and frustrating inconsistencies with their installation and configuration, but those are outside the scope of this post.

We decided to put a copy of our standby database on the ZFSSA and have it run as a second standby. The performance problems were quick to appear, and they weren't pretty.

Configuring SharesWe configured the ZFS project shares by the common Oracle best practices in terms ZFS recordsize and write bias. For example, datafile shares were set to an 8k recordsize (to match the db_block_size) and throughput write bias, where as redo log shares were set to 128k recordsize and latency bias. Note that with Oracle Database 12c, Direct NFS over NFSv4, and the more recent ZFSSA firmware, you gain the benefit of Oracle Intelligent Storage Protocol (OISP), which will determine the recordsize and write bias automatically based on the type of file it recognizes.

Copying the DatabaseTo start out we needed to get a copy of the database onto the ZFSSA shares. This was easily done with RMAN's backup as copy database command, specifying the ZFSSA mount as the format destination. We were fairly impressed with the Direct NFS transfer speed during the copy and so we were optimistic about how it would stand up with our production load.

Starting Recovery!
Once everything was set, we started managed recovery on the standby. Our earlier excitement gave way to a sort of soul-crushing disappointment as the recovery performance basically ground to a standstill and traffic to the ZFSSA went from hundreds of Mbps to barely a trickle. We could stop recovery and copy a big file with great speed, but something in managed recovery was not playing nicely.

We found that we could disable Direct NFS (requires a database restart and software relinking), and managed recovery would actually perform better over the kernel NFS, although still not nearly as well as we would need.

This started a blizzard of SR creations, including SRs being spawned from other SRs. We had SRs open for the ZFSSA team, the Direct NFS team, the Data Guard team, and even the Oracle Linux and Solaris teams, even though we were not on Oracle Linux or Solaris (we use RHEL). It came to a point where I had to tell our account manager to have support stop creating new SRs, since every new SR meant I had to explain the situation to a new technician all over again.

At this point we were having twice-daily conference calls with our account manager and technical leads from the various departments. Their minions were working hard on their end to replicate the problem and find a solution, but we were running into a 4th week of this craziness.

The Infiniband BanditAfter many frustrating weeks of changing configurations, cables, cards, and just generally grasping at straws, it was finally narrowed down to the Infiniband. Or rather, a bug in the open fabric (OFA) linux kernel module that dealt with Infiniband that was triggered when Direct NFS would fire off a whole lot of connections, like when DataGuard managed recover would fire up 80 parallel slaves. We tested out the 10gigE channel we had for the management UI and performance was like night and day with just the one channel.

Oracle Support suggested it might be related to bug 15824316, which also deals with dramatic performance loss with Direct NFS over Infiniband. The bug in the OFA kernel module was fixed in recent versions of Oracle Enterprise Linux (OEL) (specifically the UEK kernel), but Oracle is not sharing this fix with Red Hat (or anyone else, presumably). Since we're on RHEL, we had little choice but to send all the Infiniband networking hardware back and order up some 10gigE replacements.

We're still in the process of getting the 10gigE switches and cables all in place for the final production setup. If you're curious, it's 4 10gigE cards per server, bonded to a single IP to a 10gigE switch into the ZFSSA heads. This 10gigE network is dedicated exclusively to ZFSSA traffic.

So, in the end, if you're on (a recent version of) of OEL/UEK, you should have nothing to worry about. But if you're on RHEL and planning to use Direct NFS, you're going to want to use 10gigE and NOT Infiniband.

Update - 6 Mar 2014Some of have asked, and I want to re-iterate: Oracle have claimed that the OFA module was entirely re-written, and their fix is specific to OEL and is not covered by GPL or any similar license. We were told that they have no plans to share their code with RHEL. Also there is no MOS bug number for the OFA issue, it was apparently re-written from scratch with no bug to track the issue. If this all sounds rather dubious to you, join the club. But it's what our account manager told us at the end of last year.

Another Update - 6 Mar 2014Bjoern Rost and I discussed this privately and after quite a bit of research and analysis he shared this conclusion:

Oracle Support suggested that this issue would not occur with the OFA module used in Oracle Linux with the UEK kernel. RedHat changed their support in RHEL6 from shipping the whole openfabrics stack to just including the drivers that were also present in the upstream mainline kernel. This is RedHat’s policy to ensure stability in the version of the kernel they ship. Oracle offers an OFA package with some additional patches (all GPL/BSD license) for the UEKr1 and UEKr2 kernels. Unfortunately, these two different approaches make it very hard to pinpoint specific patches or create backports for the RedHat kernel version.
Categories: DBA Blogs

The Danger of Moving Incrementally Updated Datafile Copies

Don Seiler - Fri, 2014-02-28 11:20
When I sat down at my desk yesterday morning I was greeted with some disturbing email alerts notifying me that one of the NFS mounts on my standby database host was full. This was the NFS mount that held an image copy of my database that is updated daily from an incremental backup. The concept and an example can be found in the documentation. With a 25Tb database, waiting to restore from backups is not as appealing as simply switching to the copies and getting back to business.

We quickly saw that the reason that this mount was full was that RMAN had tried to make another set of image copies in the latest backup run rather than take an incremental backup for recovery. It does this when it finds no valid copy of the datafiles to increment, and the logs confirmed this to be the reason:

Starting backup at 2014/02/26 13:30:16
no parent backup or copy of datafile 792 found
no parent backup or copy of datafile 513 found
no parent backup or copy of datafile 490 found
no parent backup or copy of datafile 399 found

... and so on, for every datafile. However I knew that the copies that had been there (and been updated) every day were still there. So what was different?

It was then that I remembered my work from the day before. Doing a bit of re-organization, I renamed the directory where the datafile copies lived. However I made sure to re-catalog them and double-checked to make sure the backup tag was still there, which it was. I also crosschecked the copies to make the old entries as expired and then deleted them from the catalog. This turned out to be the cause of the problem.

When the original datafilecopy entries were removed from the catalog, RMAN didn't want to recognize the new entries as the right copies, even though they were literally the same file, with the same backup tag. And so RMAN printed the message you see above and dutifully began making new image copies until it filled up the mountpoint, which didn't have another spare 25 Tb handy.

Today I was able to duplicate the scenario on a (much smaller) sandbox with various sequences. Every time, once I crosschecked the original copies and deleted them as expired, RMAN would create a new copy on the next run. The sequence was basically this:
  1. Run backup-for-recovery and recover commands. First time will create datafile copies as expected.
  2. Run it again, this time it will create an incremental backup and then apply it to the copies made in the previous step.
  3. Move or rename the directory holding the copies.
  5. CATALOG START WITH '/path/to/new/location/';
  6. LIST DATAFILECOPY ALL; to verify that the copies are registered under the new location and the TAG is right.
  7. Run backup-for-recovery and recover commands (be sure to update the location). I would expect the same results as step 2, but instead new copies are created.
One thing that was very interesting was that if I just cataloged the new location, but did not crosscheck or delete the old entries (i.e. skipped step 4), then I could run the script and it would take an incremental backup as planned and recover the copies in the new location. But then if I later did the crosscheck and delete, it would not accept those copies and create new copies. And all this time I can "list datafilecopy all;" and see both copies with the same tags. Changing the order of steps 4 and 5 made no difference.

I'd be interesting to know what anyone else thinks about it. Personally it seems like a bug to me, so I've opened an SR. So far Oracle Support have confirmed what I've experienced, although have said there is no bug on file. They suggested I use Doc ID 1578701.1 to make another copy of the datafile with a new tag and use that new tag. However if I wanted to do that I would just create a new database copy and keep using the original tag, which is exactly what I've done.

I will be sure to update this post with anything I find. Until then, I wanted to share this experience for anyone else that might need or want to move their datafile copies if they are part of an incrementally-updated-backup strategy.
Categories: DBA Blogs

SQL Developer’s PL/SQL Debugger and Oracle 12c

Galo Balda's Blog - Mon, 2014-02-17 17:54

The PL/SQL Debugger works pretty much out of the box when used with a previous Oracle version. These are the things we needed in place before we could start debugging PL/SQL:

  1. A grant of the DEBUG CONNECT SESSION privilege.
  2. EXECUTE privilege on DBMS_DEBUG_JDWP.
  3. EXECUTE privilege on the stored procedure you want to debug.
  4. Make sure the stored procedure is “Compiled for Debug”.

Jeff Smith talks about it in this post.

But what happens when you use Oracle 12c? You still need all the stuff that I mentioned but that’s not enough. See the error I got when I was trying to debug a procedure.

ORA-24247: network access denied by access control list (ACL)
ORA-06512: at “SYS.DBMS_DEBUG_JDWP”, line 68
ORA-06512: at line 1
Process exited.
Disconnecting from the database SCOTT – ORA12CPDB1

Starting with Oracle 12c, if you want to debug PL/SQL stored procedures in the database through a Java Debug Wire Protocol (JDWP)-based debugger, such as SQL Developer or JDeveloper, then you must be granted the jdwp ACL privilege to connect your database session to the debugger at a particular host.

This is one way you can configure network access for JDWP operations:

 host => '',
 lower_port => null,
 upper_port => null,
 ace => xs$ace_type(privilege_list => xs$name_list('jdwp'),
 principal_name => 'scott',
 principal_type => xs_acl.ptype_db)
Host can can be a host name, domain name, IP address, or subnet.

Principal name in the access control entry (ACE) section is the schema that holds the stored procedures you want to debug.

Now the error goes away and you can start your debugging task.

Connecting to the database SCOTT – ORA12CPDB1.
Debugger accepted connection from database on port 49428.
Inside the Procedure
Process exited.
Disconnecting from the database SCOTT – ORA12CPDB1.
Debugger disconnected from database.

I hope this post saves you some time when you migrate to 12c.

Filed under: 12C, SQL Developer Tagged: 12C, SQL Developer
Categories: DBA Blogs

Kafka or Flume?

Chen Shapira - Tue, 2014-02-11 16:25

A question that keeps popping up is “Should we use Kafka or Flume to load data to Hadoop clusters?”

This question implies that Kafka and Flume are interchangeable components. It makes as much sense to me as “Should we use cars or umbrellas?”. Sure, you can hide from the rain in your car and you can use your umbrella when moving from place to place. But in general, these are different tools intended for different use-cases.

Flume’s main use-case is to ingest data into Hadoop. It is tightly integrated with Hadoop’s monitoring system, file system, file formats, and utilities such a Morphlines. A lot of the Flume development effort goes into maintaining compatibility with Hadoop. Sure, Flume’s design of sources, sinks and channels mean that it can be used to move data between other systems flexibly, but the important feature is its Hadoop integration.

Kafka’s main use-case is a distributed publish-subscribe messaging system. Most of the development effort is involved with allowing subscribers to read exactly the messages they are interested in, and in making sure the distributed system is scalable and reliable under many different conditions. It was not written to stream data specifically for Hadoop, and using it to read and write data to Hadoop is significantly more challenging than it is in Flume.

To summarize:
Use Flume if you have an non-relational data sources such as log files that you want to stream into Hadoop.
Use Kafka if you need a highly reliable and scalable enterprise messaging system to connect many multiple systems, one of which is Hadoop.

Categories: DBA Blogs

RMOUG Training Days 2014

Galo Balda's Blog - Sun, 2014-02-02 11:29


I’ll be presenting on February 6th. Here are the details of my session:

SQL Pattern Matching in Oracle 12c

Room 501, 11:15 – 12:15

Last year I had a great time, so I can’t wait to be back in Denver.

I hope to see you there!

Updated on 02/17/2014: The presentation is available on Slideshare

Filed under: 12C, RMOUG, SQL Tagged: 12C, RMOUG, SQL
Categories: DBA Blogs

My Latin American Tour in March

Oracle NZ - Thu, 2014-01-23 19:00

At March I will be presenting my very successful seminar “Mastering Backup and Recovery” in some countries of Latin America for the very first time. Thank you Ecuador,Panama, Chile and Brazil OUGs for inviting me to your amazing countries!


Please, use the following links for registration and also to find more information about the seminar:

  • Quito, Ecuador – March 13-15 (2014) Click Here
  • Panama City, Panama – March 17-18 (2014) Click Here
  • Porto Alegre, Brazil – March 20-21 (2014) Click Here
  • Sao Paulo, Brazil – March 24-25 (2014) Click Here
  • Santiago, Chile – March 27-28 (2014) Click Here



Note: the next 10 people on each country to register before February 15 will receive a free signed copy of my book “Oracle Database 12c Backup and Recovery Survival Guide”




Francisco Munoz Alvarez



Copyright © OracleNZ by Francisco Munoz Alvarez [My Latin American Tour in March], All Right Reserved. 2016.
Categories: DBA Blogs

Join me at the next OTN Virtual Developer Day

Oracle NZ - Thu, 2014-01-23 18:18



Hi All,

On February 4, 2014 at 9:30 am PT I will be talking on the next OTN Virtual Developer Day about Oracle VM and Oracle Database. Come and discover the answers for the following questions:

  • Does an Oracle Database perform well on a virtualized environment?
  • What virtualization technology is more stable and allows an Oracle database to perform faster?
  • What is the performance difference between using a bare metal and a virtualized guest?
  • Is it safe to run a production database in a virtualized environment?

Come and join me on this fantastic event. Registrations and more information here.



Francisco Munoz Alvarez



Copyright © OracleNZ by Francisco Munoz Alvarez [Join me at the next OTN Virtual Developer Day], All Right Reserved. 2016.
Categories: DBA Blogs

Problems when doing a V2V or P2V of a windows server to Oracle VM

Oracle NZ - Thu, 2014-01-23 18:06

Some time ago, I received the request to migrate some Windows 2008 Servers to my Oracle VM farm and after complete the P2V migration the newly create VM would not start. It would crash on boot with blue screen and error:
STOP: 0x0000007B (0xXXXXXXXX,0xXXXXXXXX,0×00000000,0×00000000)

This issue is related to the storage drivers needed for the guest VM, and to solve this issue you should do the following on your Windows Server before start the P2V or V2V migration:

  1. Extract the Atapi.sys, Intelide.sys, Pciide.sys, and Pciidex.sys files from the %SystemRoot%\Driver Cache\I386\Driver.cab file, copy the files to the %SystemRoot%\System32\Drivers folder.
  2. Merge this register file (NOTE: You would need access to My Oracle Support to be able to download it)
  3. Turn off the virtual machine, and then try import it to OracleVM again.

You can also find more information regarding this problem on the following Microsoft link:


Hope this would help you if having the same issue I had.




Francisco Munoz Alvarez



Copyright © OracleNZ by Francisco Munoz Alvarez [Problems when doing a V2V or P2V of a windows server to Oracle VM], All Right Reserved. 2016.
Categories: DBA Blogs

Quick Tip on Using Sqoop Action in Oozie

Chen Shapira - Wed, 2014-01-08 18:59

Another Oozie tip blog post.

If you try to use Sqoop action in Oozie, you know you can use the “command” format, with the entire Sqoop configuration in a single line:

<pre><workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
    <action name="myfirsthivejob">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <command>import  --connect jdbc:hsqldb:file:db.hsqldb --table TT --target-dir hdfs://localhost:8020/user/tucu/foo -m 1</command>
        <ok to="myotherjob"/>
        <error to="errorcleanup"/>

This is convenient, but can be difficult to read and maintain. I prefer using the “arg” syntax, with each argument in its own line:

<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
    <action name="myfirsthivejob">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
        <ok to="myotherjob"/>
        <error to="errorcleanup"/>

As you can see, each argument here is in its own “arg” tag. Even two arguments that belong together like “–table” and “TT” go in two separate tags.
If you’ll try to put them together for readability, as I did, Sqoop will throw its entire user manual at you. It took me a while to figure out why this is an issue.

When you call Oozie from the command line, all the arguments you pass are sent as a String[] array, and the spaces separate the arguments into array elements. So if you call Sqoop with “–table TT” it will be two elements, “–table” and “TT”.
When using “arg” tags in Oozie, you are basically generating the same array in XML. Oozie will turn the XML argument list into an array and pass it to Sqoop just the way it would in the command line. Then Sqoop parses it in exactly the same way.
So every item separated with space in the command line must be in separate tags in Oozie.

Its simple and logical once you figure out why:)
If you want to dig a bit more into how Sqoop parses its arguments, it is using Apache Commons CLI with GnuParser. You can read all about it.

Categories: DBA Blogs

Why Oozie?

Chen Shapira - Mon, 2013-12-09 13:04

Thats a really frequently asked question. Oozie is a workflow manager and scheduler. Most companies already have a workflow schedulers – Activebatch, Autosys, UC4, HP Orchestration. These workflow schedulers run jobs on all their existing databases – Oracle, Netezza, MySQL. Why does Hadoop need its own special workflow scheduler?

As usual, it depends. In general, you can keep using any workflow scheduler that works for you. No need to change, really.
However, Oozie does have some benefits that are worth considering:

  1. Oozie is designed to scale in a Hadoop cluster. Each job will be launched from a different datanode. This means that the workflow load will be balanced and no single machine will become overburdened by launching workflows. This also means that the capacity to launch workflows will grow as the cluster grows.
  2. Oozie is well integrated with Hadoop security. This is especially important in a kerberized cluster. Oozie knows which user submitted the job and will launch all actions as that user, with the proper privileges. It will handle all the authentication details for the user as well.
  3. Oozie is the only workflow manager with built-in Hadoop actions, making workflow development, maintenance and troubleshooting easier.
  4. Oozie UI makes it easier to drill down to specific errors in the data nodes. Other systems would require significantly more work to correlate jobtracker jobs with the workflow actions.
  5. Oozie is proven to scale in some of the world’s largest clusters. The white paper discusses a deployment at Yahoo! that can handle 1250 job submissions a minute.
  6. Oozie gets callbacks from MapReduce jobs so it knows when they finish and whether they hang without expensive polling. No other workflow manager can do this.
  7. Oozie Coordinator allows triggering actions when files arrive at HDFS. This will be challenging to implement anywhere else.
  8. Oozie is supported by Hadoop vendors. If there is ever an issue with how the workflow manager integrates with Hadoop – you can turn to the people who wrote the code for answers.

So, should you use Oozie? If you find these benefits compelling, then yes. Step out of your comfort zone and learn another new tool. It will be worth it.

Categories: DBA Blogs

Parameterizing Hive Actions in Oozie Workflows

Chen Shapira - Fri, 2013-12-06 12:30

Very common request I get from my customers is to parameterize the query executed by a Hive action in their Oozie workflow.
For example, the dates used in the query depend on a result of a previous action. Or maybe they depend on something completely external to the system – the operator just decides to run the workflow on specific dates.

There are many ways to do this, including using EL expressions, capturing output from shell action or java action.
Here’s an example of how to pass the parameters through the command line. This assumes that whoever triggers the workflow (Human or an external system) has the correct value and just needs to pass it to the workflow so it will be used by the query.

Here’s what the query looks like:

insert into test select * from test2 where dt=${MYDATE}

MYDATE is the parameter that allows me to run the query on a different date each time. When running this query in hive, I’d use something like “set MYDATE=’2011-10-10′” before running the query. But when I run it from Oozie, I need to pass the parameter to the Hive action.

Lets assume I saved the query in a file hive1.hql. Here’s what the Oozie workflow would look like:

<workflow-app name="cmd-param-demo" xmlns="uri:oozie:workflow:0.4">
	<start to="hive-demo"/>
	<action name="hive-demo">
		<hive xmlns="uri:oozie:hive-action:0.2">
		<ok to="end"/>
		<error to="kill"/>
	<kill name="kill">
		<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
	<end name="end"/>

The important line is “MYDATE=${MYDATE}”. Here I translate an Oozie parameter to a parameter that will be used by the Hive script. Don’t forget to copy hive-site.xml and hive1.hql to HDFS! Oozie actions can run on any datanode and will not read files on local file system.

And here’s how you call Oozie with the commandline parameter:
oozie job -oozie http://myserver:11000/oozie -config ~/workflow/job.properties -run -verbose -DMYDATE=’2013-11-15′

Thats it!

Categories: DBA Blogs

Joint Webinar with Oracle – Top 5 Key Reasons Why Oracle VM is Best for Oracle Database

Oracle NZ - Wed, 2013-12-04 14:50

Top 5 Key Reasons Why Oracle VM is Best for Oracle Database


When: Tuesday December 17th , 10am PST

Register Here:


Oracle VM application-driven architecture is designed for rapid deployment and ease of lifecycle management providing IT a highly scalable and cost effective virtualization solution for Oracle Database and business-critical applications.

Join us in this webcast featuring Francisco Munoz Alvarez, a seasoned expert, and Oracle ACE Director as he shares his performance benchmark findings highlighting key reasons why Oracle VM is the best virtualization technology for Oracle Databases.



Copyright © OracleNZ by Francisco Munoz Alvarez [Joint Webinar with Oracle - Top 5 Key Reasons Why Oracle VM is Best for Oracle Database], All Right Reserved. 2016.
Categories: DBA Blogs

Using Oozie in Kerberized Cluster

Chen Shapira - Fri, 2013-11-22 09:14

In general, most Hadoop ecosystem tools work rather transparently in a kerberized cluster. Most of the time things “just work”. This includes Oozie. Still, when things don’t “just work”, they tend to fail with slightly alarming and highly ambiguous error messages. Here are few tips for using Oozie when your Hadoop cluster is kerberized. Note that this is a client/user guide. I assume you already followed the documentation on how to configure the Oozie server in the kerberized cluster (or you are using Cloudera Manager, which magically configures it for you).

    1. As always, use “kinit” to authenticate with Kerberos and get your tgt before trying to run oozie commands. Verify with klist. Failure to do this will result in “No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)”
    2. I always enable security debug messages. This helps me troubleshoot, and also helps when I need to ask support/mailing list for help.
      export HADOOP_ROOT_LOGGER=TRACE,console;
      export HADOOP_JAAS_DEBUG=true;
      export HADOOP_OPTS="-Dsun.security.krb5.debug=true"
    3. Your Oozie command typically contains a URL. Something like “oozie -url http://myserver:11000/oozie -conf job.properties -run” The server name in the URL must match an existing principal name in Kerberos. If your principals are actually “myserver.mydomain.com” make sure you use that in the URL.
    4. If you decide to use CURL to connect to your Oozie server, either for troubleshooting or for using the REST API, don’t forget to use “–negotiate -u foo:bar”. The actual username and password don’t matter (you are authenticating with your Kerberos ticket), but CURL throws a fit if they don’t exist.
    5. If you have Hive action in your Oozie workflow, you need to define and use credentials. Here’s an example:
      <workflow-app xmlns="uri:oozie:workflow:0.2.5" name="example-wf">
                      <credential name='hive_credentials' type='hcat'>
      <start to="hive-example"/>
      <action name="hive-example" cred="hive_credentials">
              <hive xmlns="uri:oozie:hive-action:0.2">
              <ok to="end"/>
              <error to="fail"/>
      <kill name="fail">
              <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
      <end name="end"/>
    6. To make step #5 actually work (i.e. allow Oozie to run Hive actions), you will also need to do the following:
      In CM:
    7. go to “HDFS Service->Configuration->Service-Wide->Advanced->Cluster-wide Configuration Safety Valve for core-site.xml” and add:


      – go to “Oozie service->Configuration->Oozie Server(default)->Advanced-> Oozie Server Configuration Safety Valve for oozie-site.xml” and add:


      – Deploy client configuration and restart Hive service and Oozie service.

    8. Oozie doesn’t kinit a user for you on the node its launching the action on, and it doesn’t move principles and tickets around. Instead it uses delegation tokens. If you want to authenticate to Hadoop inside a shell or java action, you’ll need to use the same tokens.

      In a shell action, it will be something like:

      	hive -e "select x from test" -S
      	hive -e "SET mapreduce.job.credentials.binary=$HADOOP_TOKEN_FILE_LOCATION; select x from test" -S

      In Java it will be:

      if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") != null) {
                  jobConf.set("mapreduce.job.credentials.binary", System.getenv("HADOOP_TOKEN_FILE_LOCATION"));

    Hope this helps! Feel free to comment with questions, especially if you ran into errors that I did not address. I’ll be happy to add more tips to the list.

Categories: DBA Blogs

When error say nothing about real issue

Virag Sharma - Thu, 2013-10-24 22:18
We want restore backup from source. So we took backup from source and copied it to /dba/share/MYDB on target. My collouge sent me email saying  restore failed with following error

rman target /

Recovery Manager: Release - Production on Wed Sep 18 20:47:38 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database (not started)

RMAN> startup nomount

Oracle instance started

Total System Global Area    8551575552 bytes

Fixed Size                     2245480 bytes
Variable Size                503319704 bytes
Database Buffers            8036286464 bytes
Redo Buffers                   9723904 bytes

RMAN> set DBID=2387922???;

executing command: SET DBID

RMAN> restore controlfile from '/dba/share/MYDB/c-2387922???-20130910-00.ctl';

Starting restore at 18-SEP-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=391 device type=DISK

channel ORA_DISK_1: no AUTOBACKUP in 7 days found
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 09/18/2013 20:48:04
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece

We searched MOS and forums. Most of the search , pointing it is old backup ( more then 7 day ),  some doc say backup corrupted etc etc. Instead going for whole story that how reached to conclusion, let we tell you the finding.

We found the mount option for /dba/share/MYDB ( where we copied backup from source ) not correct
...that's why getting error. We changed it to

           rw,bg,intr,hard,timeo=600,wsize=32768,rsize=32768,nfsvers=3,tcp,noacl 0 0

After this change we able to restore control file and rest of the backup successfully

Categories: DBA Blogs

Three Things To Do Before Starting Hadoop Project

Chen Shapira - Sun, 2013-10-20 11:45

I spent the last 6 month helping customers design, implement and deploy successful Hadoop systems. Over time, you start seeing patterns. Certain things that if the customer gets right before the project event starts increase the probability that the project will finish successfully and on-time.

Lets assume you already took the most important step – you have an actual use-case or a problem to solve, and you think  Hadoop could be the right technology to use in this case. What now?

  1. Learn how to automate production deployments: 
    Yes, Cloudera Manager would do a lot for you. But occasionally you’ll need to run the same command on 20+ servers. Thats what large clusters are all about. So get used to the idea from the beginning. Learn how to write loops in bash, how to ssh into remote machines to run commands, how to distribute files and restart services.
    You can build your own tools (I had for years), but these days you can just pick your favorite automation tool and use that instead. My favorite is Ansible. It works exactly the way I would have written a cluster automation tool, so learning it never felt like an effort and its usage is never surprising or unintuitive to me. Others prefer Puppet, Chef or Cfengine. It doesn’t matter what you use, but when I show up at your office as your Cloudera Solutions Architect and ask you to update sysctl.conf on your 50 node cluster, I don’t want you too look surprised, alarmed or tell me it will take few hours.
  2. Don’t try to boil the ocean:
    Hadoop implementation is often the first chance the dev/ops teams get to do something completely new. There is a blank slate, white sheet of paper, and you can design the perfect system. Fixing every problem the old system had and building functionality you always dreamed of.Better security!  Machine learning! Open source!
    I say – Using Hadoop successfully is a large and challenging project. Changing organizational processes and culture toward better security and processes is a large and challenging project. Creating a data driven organization is a huge project. Mixing them doesn’t give you three projects for the price of one. It isn’t even just three times more challenging than one project, I’d say the risk is an order of magnitude higher, and the risk of just implementing Hadoop is high enough. Especially in the early stages. Which brings us to… 
  3. Do a POC:
     Pay Cloudera or do it yourself. Either way, you need a POC. 
    If you start with a 12 month project, you will have to do a lot of design upfront. At a time when you have too little information. At the beginning of a large project, you won’t know for sure how the system will be used and you probably won’t know enough about Hadoop. Sure, you can call Cloudera Services and discuss the design with us, but even we can sometimes (rarely!) get things wrong. With 12 month projects you will be very deep in the project before you’ll find that limitation we completely forgot to mention.
    Be agile (really agile, 6 month project with daily scrum doesn’t count): Solve the smallest useful problem first. Implement just a single workflow, single statistical analysis, parse and search data from one source. Whatever is useful for your users – do it first. Learn in the process and build from there. This will allow you to build experience and iron-out issues at the system’s usefulness, load and importance grow.
  4. (Bonus tip) Get the most out of the POC: 
    Not all POCs are created equal. Sometimes the customer hires us to “prove that Hadoop can do X”. We get very specific requirements and very short time-frame, and we need to build a system that does X. I can see why customers need proof that their vendor can deliver. But this approach is of limited value. Because at the end of the POC you are left with a non-production system and you don’t know more Hadoop than you knew before. I love teaching, but when the attitude is “prove us this works” and the requirements are inflexible, there isn’t much time for discussions and casual learning. Most of the knowledge transfer will only happen in the delivery document, which is not the same as lively discussions.
    Better POC happens when a customer brings in Cloudera to help them build their first Hadoop project. There are still time and scope constraints, but now the POC is not about “Prove us this works” but rather “Help us make it work”. We work together as a team. We will brainstorm design possibilities with you and share best practices. We will teach you how to build the system, how to configure it and troubleshoot it. You get a chance to learn all our little tricks of development and deployment that makes life easier. At the end of the POC, your team will have real Hadoop expertise, relevant to your specific system, problems, culture, data and requirements. I see this as the best investment you can make in Hadoop for your organization. 
    But I may be a bit biased.


Categories: DBA Blogs

Control Charts

Chen Shapira - Sun, 2013-10-13 22:43

Last week, while working on customer engagement, I learned a new method of quantifying behavior of time-series data. The method is called “Control Chart” and credit to Josh Wills, our director of data science, for pointing it out. I thought I’ll share it with my readers as its easy to understand, easy to implement, flexible and very useful in many situations.

The problem is ages old – you collect measurements over time and want to know when your measurements indicate abnormal behavior. “Abnormal” is not well defined, and thats on purpose – we want our method to be flexible enough to match what you define as an issue.

For example, lets say Facebook are interested in tracking usage trend for each user, catching those with decreasing use

There are few steps to the Control Chart method:

  1. Collect all relevant data points. In our case, number of minutes of Facebook use per day for each user.
  2. Calculate a baseline – this can be average use for each user or average use for similar demographics.  Even adaptive average of the type calculated by Oracle Enterprise Manager, to take into account decreased Facebook use over the weekend.
  3. Calculate “zones” one, two and three standard deviations around the baseline

Those zones can be used to define rules for normal and abnormal behaviors of the system. These rules are what makes the system valuable.
Examples of rules that define abnormal behavior can be:

  1. Any point 3 standard deviations above the baseline. This will indicate extreme sudden increase.
  2. 7 consecutive measurements more than one standard deviation over the baseline. This indicates a sustained increase.
  3. 9 consecutive measurement each higher than previous one. This indicates steady upward trend.
  4. 6 consecutive measurements each more than two standard deviations away from baseline, each one of different side of the baseline than the previous measurement. This indicates instability of the system.

There are even sets of standard rules used in various industries, best practices of sorts. Western Electric rules and Nelson rules are particularly well know.

Note how flexible the method is – you can use any combination of rules that will highlight abnormalities you are interested in highlighting.
Also note that while the traditional use indeed involves charts, the values and rules are very easy to calculate programmatically and visualization can be useful but not mandatory.
If you measure CPU utilization on few servers, visualizing the chart and actually seeing the server behavior can be useful. If you are Facebook and monitor user behavior, visualizing a time series of every one of their millions of users is hopeless. Calculating baselines, standard deviations and rules for each use is trivial.

Also note how this problem is “embarrassingly parallel“. To calculate behavior for each user, you only need to look at data for that particular user. Parallel, share-nothing platform like Hadoop can be used to scale the calculation indefinitely simply by throwing increasing number of servers on the problem. The only limit is the time it takes to calculate the rules for a single user.

Naturally, I didn’t dive into some of the complexities in using Control Charts.  Such as how to select a good baseline, how to calculate standard deviation (or whether to use another statistic to define zones) and how many measurements should be examined before a behavior signals a trend. If you think this tool is useful for you, I encourage you to investigate more deeply.


Categories: DBA Blogs


Subscribe to Oracle FAQ aggregator - DBA Blogs