Don Seiler

Subscribe to Don Seiler feed
Chronicling Life, Love, Linux and Database Administration.Don Seilernoreply@blogger.comBlogger389125
Updated: 15 hours 47 min ago

Battling Bigfile Backup Bottlenecks

Wed, 2014-03-26 13:44
Last Friday I kicked off a database backup to an NFS destination, using the standard "backup as compressed backupset database" syntax. Loyal readers of this blog may recall that I'm the proud custodian of a 25 Tb database, so this backup normally takes a few days, with an expected completion on Monday morning. However it was still running on Wednesday, and reviewing the logs I saw that there was just 1 channel (of the original 8) still running. The backup file that this channel was writing happened to include our largest bigfile datafile, which weighs in at nearly 8 Tb. Reviewing my new backup script I realized that I had neglected to specify a SECTION SIZE parameter. An example of its usage is:

RMAN> backup as compressed backupset
2> section size 64G
3> database;

Without it, RMAN has decided to create a backup piece that bundled my 8 Tb datafile with a few others and then write it out to disk on one channel. Obviously this isn't what we wanted.


I'm not a big fan of bigfile tablespaces, primarily because you lose the benefits of parallelism when huge files can only be handled by a single channel, as with datafile copy operations and backups. In 11g, however, Oracle has introduced multi-section backups via the SECTION SIZE option for RMAN backups. This option tells RMAN to break a large file into sections of the specified size so that the work can be done in parallel. If the specified size is larger than the file, then it is simply ignored for that file.

There is a limitation in that the file can be split into a maximum of 256 sections. So, if you specify a section size that would result in more than 256 sections being created, Oracle RMAN will increase the size so that exactly 256 sections are created. This is still enforced today in Oracle 12c.

Another limitation in Oracle 11g is that multi-section backups cannot be done with image copy backups. Those must still be done as a whole file and so can still be a huge bottleneck. However this is no longer a problem in Oracle 12c, and multi-section image copy backups are possible. I'm looking forward to using this as we also use image copy backups as part of our recovery strategy.

To highlight the parallel benefits, I ran a compressed backup of a 1.5 Tb bigfile tablespace using 8 channels. The first one does NOT use section size and so only goes over one channel:

Starting backup at 2014/03/25 14:35:41
...
channel c1: backup set complete, elapsed time: 17:06:09
Finished backup at 2014/03/26 07:41:51

The second one uses a section size of 64 Gb (otherwise same command & file):

Starting backup at 2014/03/26 07:41:52
...
Finished backup at 2014/03/26 09:48:33

You can see the huge impact made by making use of the multi-section backup option. A single channel took over 17 hours to back it up. Using 8 channels with a section size of 64 Gb took only just over 2 hours. Eyeballing the log shows an average of around 6 minutes per section. Definitely much better than waiting for a single channel to do all the work when the rest of the system is waiting.
Categories: DBA Blogs

Why Is My MView Log Not Purging?

Sun, 2014-03-16 15:50
A few weeks ago we saw one of our tablespaces growing at a rate much higher than the others. Taking a look we saw that the biggest users of space were two materialized view logs, one being 110 Gb and the other 60 Gb. These logs were in place to facilitate the fast refresh of two materialized views, one for each log/table. These materialized views did some aggregations (sum) throughout the day on some important base table data. The fast refreshes were completing successfully many times a day, but the logs were not being purged as expected.

In our case, there was only one mview performing a fast refresh on those base tables, so the mview logs should have been completely purged after each refresh. They certainly shouldn't be growing to over 100+ Gb. Looking at the data in the mview log, all records had a SNAPTIME$$ value of "4000/01/01 00:00:00", which is the default value for records in the mview log that have not been refreshed. Once they are refreshed, the SNAPTIME$$ value gets set to SYSDATE and can then be evaluated for purging.

But why was this value not being updated after refresh?

 
For those of you unfamiliar with the role of materialized view logs, I'll share this primer from Tim Hall via his excellent Oracle-Base article:

Since a complete refresh involves truncating the materialized view segment and re-populating it using the related query, it can be quite time consuming and involve a considerable amount of network traffic when performed against a remote table. To reduce the replication costs, materialized view logs can be created to capture all changes to the base table since the last refresh. This information allows a fast refresh, which only needs to apply the changes rather than a complete refresh of the materialized view.
Digging deeper led me to MOS DocId 236233.1, which tells us that Oracle compares the MLOG$_<TABLE_NAME>.SNAPTIME$$ value against the SYS.SLOG$SNAPTIME:

Rows in the MView log are unnecessary if their refresh timestamps MLOG$<table_name>.SNAPTIME$$ are older or equal than the oldest entry in SLOG$.SNAPTIME for this log.

MLOG$<table_name>.SNAPTIME$$ <= MIN (SLOG$.SNAPTIME)
Here's where we saw the real problem.

SQL> select MOWNER,MASTER,SNAPSHOT,SNAPID,SNAPTIME,SSCN,USER#
 2 from sys.slog$
 3 where mowner='FOO' and master='BAR';

 no rows selected

If the purge mechanism checks SLOG$.SNAPTIME then of course nothing is going to happen, as the materialized view is NOT registered in SYS.SLOG$!

We re-created the MVIEW from scratch on our development database and had the same results, which indicates it's something systemic in Oracle so we opened an SR. After the standard back-and-forth of trying the same things over and over, Oracle Support said that this was actually expected behavior:
This mview is defined as fast refreshable with aggregates. The mv log is defined with PRIMARY KEY INCLUDING NEW VALUES.

In order to support fast refresh the mv log should include ROWID as well. Please review the Restrictions on Fast Refresh on Materialized Views with Aggregates located here:
http://docs.oracle.com/cd/E11882_01/server.112/e25554/basicmv.htm#DWHSG8203 There are additional restrictions depending on the operations performed. As an example, SEQUENCE should also need to be added to the mv log if direct loads are performed on new_invoice_record.This turned out to be the case. We recreated the mview log with the ROWID specification, then re-created the materialized view and, sure enough, the mview was registered in SYS.SLOG$ and refreshes were purging the log as expected.

I was more than a little frustrated then that Oracle would let us create the MVIEW without any warnings or errors in the first place. The database obviously detected something wrong since it wouldn't register them in SYS.SLOG$. Their last response was that, since the MVIEW itself was refreshing successfully, no error should be reported. This fails to address the question for me, so I'm going to push back a little harder and will share what I find.

For now, though, we need to schedule a maintenance window to recreate these materialized views and their logs and see if we can reclaim some disk space afterward (perhaps a future post!).
Categories: DBA Blogs

What to Expect When You're Changing the DB_UNIQUE_NAME

Wed, 2014-03-12 18:31
I recently had to change the db_unique_name of a database to make it jive with our typical database/DataGuard naming policy of appending the datacenter location. For the sake of this post let's say it was changed from ORCL to ORCL_NYC, since this database is in our fictional New York City datacenter.

I did a quick set of tests and thought I'd share the findings to save anyone any unpleasant surprises. Here are the things to expect when changing DB_UNIQUE_NAME.

Change the Parameter Value
First we obviously have to change the value. How we do so is important.
The command:
alter system set db_unique_name=orcl_nyc scope=spfile;

will result in a db_unique_name of ORCL_NYC, in all uppercase, which is used for the path changes we'll discuss later. However using quotes instead:
alter system set db_unique_name='orcl_nyc' scope=spfile;

will result in a lowercase orcl_nyc value used in the parameter and some paths. In either case, the fun begins when you next restart the instance!



ADR Location (i.e. alert log)
The ADR location appends the DB_UNIQUE_NAME to the location specified by the DIAGNOSTIC_DEST initialization parameter (defaulting to the $ORACLE_BASE environment variable value). When you restart after setting the DB_UNIQUE_NAME, your ADR location will be in a new location using the new DB_UNIQUE_NAME. Probably of most interest to you is that this means your alert log location will move, so any tools or scripts that referenced that file directly (e.g. error-scraping scripts or log-rotation jobs) will need to be updated. 

Regardless of quotes or not, the ADR path always used a lowercase string in the path.

Before:

SQL> show parameter db_unique_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      ORCL
SQL> show parameter background_dump_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_dump_dest                 string      /oracle/app/diag/rdbms/orcl/or
                                                 cl/trace

After:

SQL> show parameter db_unique_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      ORCL_NYC
SQL> show parameter background_dump_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_dump_dest                 string      /oracle/app/diag/rdbms/orcl_ny
                                                 c/orcl/trace


Datafile and Online Redo Log OMF Location
If you have the db_create_file_dest (and optionally db_create_online_log_dest_N) parameter set, then Oracle will use the DB_UNIQUE_NAME value in the OMF location for any new datafiles and redo logs created after the change, assuming a full path isn't specified.

SQL> alter tablespace users add datafile size 10m;

Tablespace altered.

SQL> select file_name from dba_data_files
  2  where tablespace_name='USERS';

FILE_NAME
--------------------------------------------------------------------------------
/oracle/app/oradata/ORCL/datafile/data_D-ORCL_I-3995253326_TS-USERS_FNO-4_1r
p30nup
/oracle/app/oradata/ORCL_NYC/datafile/o1_mf_users_9l1r8dx9_.dbf

In this case, Oracle will use an uppercase string regardless of whether or not the DB_UNIQUE_NAME is in upper or lower case. Note that existing files won't be affected, this will only apply to new files.

If this database is part of a DataGuard configuration, you'll want to be sure to update your db_file_name_convert and log_file_name_convert parameters to point to the new location.

FRA (Backups, Archivelogs, Flashback Logs)
In the same spirit of OMF, the FRA will also change locations. Similar to the previous case, the uppercase value of the DB_UNIQUE_NAME is used in the path, regardless of the original case. So, after a change and a couple of log switches, you would see something like this:

SQL> select name from v$archived_log;

NAME
--------------------------------------------------------------------------------

/oracle/app/fast_recovery_area/ORCL/archivelog/2014_03_12/o1_mf_1_13_9l1sgt20_.a
rc

/oracle/app/fast_recovery_area/ORCL/archivelog/2014_03_12/o1_mf_1_14_9l1shfjo_.a
rc

/oracle/app/fast_recovery_area/ORCL_NYC/archivelog/2014_03_12/o1_mf_1_15_9l1sj59
6_.arc

/oracle/app/fast_recovery_area/ORCL_NYC/archivelog/2014_03_12/o1_mf_1_16_9l1sjw4
k_.arc

Again, this will only affect newly created files. Existing backups, archivelogs and flashback logs will not be affected, they remain cataloged in their present locations and will be accessed just fine. RMAN will delete them when needed (or commanded) and then you could choose to delete the empty directories.

Oracle Wallet
I admit this one I didn't think of. If you use an Oracle Wallet, and do not specify a location in the sqlnet.ora file, Oracle looks in the default location of $ORACLE_BASE/admin/<DB_UNIQUE_NAME>/wallet/. Even more interesting, the DB_UNIQUE_NAME value is case sensitive. So you'll need to be aware of this when moving your wallet files to the new location. Here is a quick look at my findings on this matter:

-- db_unique_name set with no quotes
SQL> select wrl_parameter from v$encryption_wallet;

WRL_PARAMETER
--------------------------------------------------------------------------------
/oracle/app/admin/ORCL_NYC/wallet

-- db_unique_name set with quotes, lowercase
SQL> select wrl_parameter from v$encryption_wallet;

WRL_PARAMETER
--------------------------------------------------------------------------------
/oracle/app/admin/orcl_nyc/wallet

I can't say if this makes a difference when running on Windows, which I believe is case-insensitive. But on Linux it makes a difference.

Conclusion
That wraps this one up. I hope it helps a few people out and saves them an hour or two of head-scratching or worse. If there's anything I forgot, please let me know and I'll update this post.
Categories: DBA Blogs

Oracle Direct NFS and Infiniband: A Less-Than-Perfect Match

Wed, 2014-03-05 19:54
Readers of an earlier post on this blog will know about my latest forays into the world of Direct NFS. Part of that means stumbling over configuration hiccups or slamming into brick walls when you find new bugs.

To quickly re-set the table, my organization purchased the Oracle ZFS Storage Appliance (ZFSSA) 7420. Oracle sold us on the Infiniband connectivity as a way to make a possible future transition to Exadata easier. However the pre-sales POC testing was done over 10gb Ethernet (10gigE). So it was that everything (including their Infiniband switches and cables) arrived at the datacenter and was installed and connected by the Oracle technicians. There were a few initial hiccups and frustrating inconsistencies with their installation and configuration, but those are outside the scope of this post.

We decided to put a copy of our standby database on the ZFSSA and have it run as a second standby. The performance problems were quick to appear, and they weren't pretty.



Configuring SharesWe configured the ZFS project shares by the common Oracle best practices in terms ZFS recordsize and write bias. For example, datafile shares were set to an 8k recordsize (to match the db_block_size) and throughput write bias, where as redo log shares were set to 128k recordsize and latency bias. Note that with Oracle Database 12c, Direct NFS over NFSv4, and the more recent ZFSSA firmware, you gain the benefit of Oracle Intelligent Storage Protocol (OISP), which will determine the recordsize and write bias automatically based on the type of file it recognizes.

Copying the DatabaseTo start out we needed to get a copy of the database onto the ZFSSA shares. This was easily done with RMAN's backup as copy database command, specifying the ZFSSA mount as the format destination. We were fairly impressed with the Direct NFS transfer speed during the copy and so we were optimistic about how it would stand up with our production load.

Starting Recovery!
Once everything was set, we started managed recovery on the standby. Our earlier excitement gave way to a sort of soul-crushing disappointment as the recovery performance basically ground to a standstill and traffic to the ZFSSA went from hundreds of Mbps to barely a trickle. We could stop recovery and copy a big file with great speed, but something in managed recovery was not playing nicely.

We found that we could disable Direct NFS (requires a database restart and software relinking), and managed recovery would actually perform better over the kernel NFS, although still not nearly as well as we would need.

This started a blizzard of SR creations, including SRs being spawned from other SRs. We had SRs open for the ZFSSA team, the Direct NFS team, the Data Guard team, and even the Oracle Linux and Solaris teams, even though we were not on Oracle Linux or Solaris (we use RHEL). It came to a point where I had to tell our account manager to have support stop creating new SRs, since every new SR meant I had to explain the situation to a new technician all over again.

At this point we were having twice-daily conference calls with our account manager and technical leads from the various departments. Their minions were working hard on their end to replicate the problem and find a solution, but we were running into a 4th week of this craziness.

The Infiniband BanditAfter many frustrating weeks of changing configurations, cables, cards, and just generally grasping at straws, it was finally narrowed down to the Infiniband. Or rather, a bug in the open fabric (OFA) linux kernel module that dealt with Infiniband that was triggered when Direct NFS would fire off a whole lot of connections, like when DataGuard managed recover would fire up 80 parallel slaves. We tested out the 10gigE channel we had for the management UI and performance was like night and day with just the one channel.

Oracle Support suggested it might be related to bug 15824316, which also deals with dramatic performance loss with Direct NFS over Infiniband. The bug in the OFA kernel module was fixed in recent versions of Oracle Enterprise Linux (OEL) (specifically the UEK kernel), but Oracle is not sharing this fix with Red Hat (or anyone else, presumably). Since we're on RHEL, we had little choice but to send all the Infiniband networking hardware back and order up some 10gigE replacements.

We're still in the process of getting the 10gigE switches and cables all in place for the final production setup. If you're curious, it's 4 10gigE cards per server, bonded to a single IP to a 10gigE switch into the ZFSSA heads. This 10gigE network is dedicated exclusively to ZFSSA traffic.

So, in the end, if you're on (a recent version of) of OEL/UEK, you should have nothing to worry about. But if you're on RHEL and planning to use Direct NFS, you're going to want to use 10gigE and NOT Infiniband.

Update - 6 Mar 2014Some of have asked, and I want to re-iterate: Oracle have claimed that the OFA module was entirely re-written, and their fix is specific to OEL and is not covered by GPL or any similar license. We were told that they have no plans to share their code with RHEL. Also there is no MOS bug number for the OFA issue, it was apparently re-written from scratch with no bug to track the issue. If this all sounds rather dubious to you, join the club. But it's what our account manager told us at the end of last year.

Another Update - 6 Mar 2014Bjoern Rost and I discussed this privately and after quite a bit of research and analysis he shared this conclusion:

Oracle Support suggested that this issue would not occur with the OFA module used in Oracle Linux with the UEK kernel. RedHat changed their support in RHEL6 from shipping the whole openfabrics stack to just including the drivers that were also present in the upstream mainline kernel. This is RedHat’s policy to ensure stability in the version of the kernel they ship. Oracle offers an OFA package with some additional patches (all GPL/BSD license) for the UEKr1 and UEKr2 kernels. Unfortunately, these two different approaches make it very hard to pinpoint specific patches or create backports for the RedHat kernel version.
Categories: DBA Blogs

The Danger of Moving Incrementally Updated Datafile Copies

Fri, 2014-02-28 11:20
When I sat down at my desk yesterday morning I was greeted with some disturbing email alerts notifying me that one of the NFS mounts on my standby database host was full. This was the NFS mount that held an image copy of my database that is updated daily from an incremental backup. The concept and an example can be found in the documentation. With a 25Tb database, waiting to restore from backups is not as appealing as simply switching to the copies and getting back to business.

We quickly saw that the reason that this mount was full was that RMAN had tried to make another set of image copies in the latest backup run rather than take an incremental backup for recovery. It does this when it finds no valid copy of the datafiles to increment, and the logs confirmed this to be the reason:

Starting backup at 2014/02/26 13:30:16
no parent backup or copy of datafile 792 found
no parent backup or copy of datafile 513 found
no parent backup or copy of datafile 490 found
no parent backup or copy of datafile 399 found

... and so on, for every datafile. However I knew that the copies that had been there (and been updated) every day were still there. So what was different?

It was then that I remembered my work from the day before. Doing a bit of re-organization, I renamed the directory where the datafile copies lived. However I made sure to re-catalog them and double-checked to make sure the backup tag was still there, which it was. I also crosschecked the copies to make the old entries as expired and then deleted them from the catalog. This turned out to be the cause of the problem.

When the original datafilecopy entries were removed from the catalog, RMAN didn't want to recognize the new entries as the right copies, even though they were literally the same file, with the same backup tag. And so RMAN printed the message you see above and dutifully began making new image copies until it filled up the mountpoint, which didn't have another spare 25 Tb handy.

Today I was able to duplicate the scenario on a (much smaller) sandbox with various sequences. Every time, once I crosschecked the original copies and deleted them as expired, RMAN would create a new copy on the next run. The sequence was basically this:
  1. Run backup-for-recovery and recover commands. First time will create datafile copies as expected.
  2. Run it again, this time it will create an incremental backup and then apply it to the copies made in the previous step.
  3. Move or rename the directory holding the copies.
  4. CROSSCHECK COPY; & DELETE EXPIRED COPY;
  5. CATALOG START WITH '/path/to/new/location/';
  6. LIST DATAFILECOPY ALL; to verify that the copies are registered under the new location and the TAG is right.
  7. Run backup-for-recovery and recover commands (be sure to update the location). I would expect the same results as step 2, but instead new copies are created.
One thing that was very interesting was that if I just cataloged the new location, but did not crosscheck or delete the old entries (i.e. skipped step 4), then I could run the script and it would take an incremental backup as planned and recover the copies in the new location. But then if I later did the crosscheck and delete, it would not accept those copies and create new copies. And all this time I can "list datafilecopy all;" and see both copies with the same tags. Changing the order of steps 4 and 5 made no difference.

I'd be interesting to know what anyone else thinks about it. Personally it seems like a bug to me, so I've opened an SR. So far Oracle Support have confirmed what I've experienced, although have said there is no bug on file. They suggested I use Doc ID 1578701.1 to make another copy of the datafile with a new tag and use that new tag. However if I wanted to do that I would just create a new database copy and keep using the original tag, which is exactly what I've done.

I will be sure to update this post with anything I find. Until then, I wanted to share this experience for anyone else that might need or want to move their datafile copies if they are part of an incrementally-updated-backup strategy.
Categories: DBA Blogs

Pages