Don Seiler

Subscribe to Don Seiler feed
IT/Databases/PostgreSQL Stuff Mostly. Maybe Oracle From Time-to-Time.Don Seilernoreply@blogger.comBlogger392125
Updated: 1 week 3 days ago

RMAN Redundancy is not a Viable Retention Policy

Tue, 2014-04-15 14:07

Originally posted by me on the Pythian blog. This is an older post that I somehow forgot to post on my own blog, but another recent redundancy foul-up reminded me of it.

The story you are about to read is based on actual events. Names and paths have been changed to protect the innocent. I call this scenario “The Perfect Storm” because it took just the right combination of events and configurations. Sadly, this doesn’t make it an unlikely occurrence, so I’m posting it here in hopes that you’ll be able to save yourselves before it’s too late.

I have always had a preternatural dislike for using REDUNDANCY as a retention policy for Oracle RMAN, greatly preferring RECOVERY WINDOW instead, simply because REDUNDANCY doesn’t really guarantee anything valuable to me, whereas RECOVERY WINDOW guarantees that I’ll be able to do a point-in-time recovery to anytime within the past x days. Plus, I had already been burned once by a different client using REDUNDANCY. With the story I’m about to tell, this dislike has turned into violent hatred. I’m going to be light on the technical details, but I hope you’ll still feel the full pain.

First some table setting:
  • Standalone instance (no RAC, no DataGuard/Standby)
  • RMAN retention policy set to REDUNDANCY 2
  • Backups stored in the Flash Recovery Area (FRA)
A few months ago, we had a datafile corruption on this relatively new instance (data had been migrated from an old server about a week prior). The on-call DBA followed up the page by checking for corruptions in the datafile with this command:

RMAN> backup check logical datafile '/path/to/foobar_data.dbf';

This, my friends, led to the major fall, though we did not know it for many hours. You see, the FRA was already almost full. This causes the FRA to automatically delete obsolete files to free up space. That last backup command, while only intended to check for logical corruption, did actually perform a backup of the file, and rendered the earliest backup of the file obsolete since there were two newer copies. That earliest file happened to be from the level 0 backup from which we would later want to restore.

Of course, at first we didn’t know why the file was missing. Logs showed that it was on disk no less than two hours before the problem started. Later, scanning the alert log for the missing backup filename yielded this:

Deleted Oracle managed file /path/to/flash_recovery_area/FOO_DB/backupset/2008_12_01/o1_xxxx.bkp

Oracle deleted the one backup file that we needed!

Even worse, it wasn’t until this time on a Monday night that we realized that the level 0 taken the previous weekend had failed to push the backup files to tape because of a failure on the NetBackup server. The problem was reported as part of Monday morning’s routine log checks, but the missing files had not yet been pushed to tape.

In the end, we were able to drop and restore the tablespace to a previous point in time on a test instance from another backup file and exp/imp data back over. It was ugly, but it got things back online. Many DBAs better than myself gave their all on this mission.

To summarize, the ingredients:
  • Oracle RMAN
  • Flash Recovery Area near full, obediently deleting obsolete files.
  • Tape backup failure
Add in an innocent backup command and . . . BOOM! Failure Surprise.

The two biggest points to take away are:

  • Tape backup failures are still serious backup failures and should be treated as such, even if you backup to disk first.
  • REDUNDANCY is not a viable retention policy. In my house, it is configuration non grata.
Categories: DBA Blogs

Migrating (and Upgrading!) Your EM12c Repository Database

Mon, 2014-04-07 08:00
This week I migrated our EM12c repository database to a new server as part of its promotion to production status. Just to make it a little more exciting, the migration also involved an in-flight upgrade from to Much of this post is directly inspired by Martin Bach's post on the same subject. I ran into a few other snags that weren't mentioned so I thought it would be worthwhile to document the experience here for your benefit.

I'm assuming you have all the software installed (and patched to the latest PSU, right?). Alright then, let's begin!

Stop OMS
We want to make sure there are no more changes coming, and nothing needs to access the repository database, so be sure to stop all OMS instances:

$ emctl stop oms -all

Backup PFILE
We need to get the pfile for the current repo and copy it into place on new host:

SQL> create pfile='/mnt/emrepo_backup/initemrepo.ora' from spfile;

I use /mnt/emrepo_backup here because that is the directory that I'll be backing the database up to and copying to the new host after. If you create your pfile somewhere else, be sure to copy it to the new host under $ORACLE_HOME/dbs/
Backup Repo Database
Next we backup the repo database. Here's a snippet from my ksh script that I used:



mkdir -p $BACKUPDIR

rman log=$LOGFILE <<EOF
connect target /
set echo on

run {

        allocate channel c1 device type disk format '$BACKUPDIR/%U';
        allocate channel c2 device type disk format '$BACKUPDIR/%U';
        allocate channel c3 device type disk format '$BACKUPDIR/%U';
        allocate channel c4 device type disk format '$BACKUPDIR/%U';

        backup as compressed backupset database
                include current controlfile
                plus archivelog;

When the backup is finished, review the RMAN log and make note of which backup piece contains the controlfile backup. We'll need to refer to it by name as part of the restore process.

If your backup directory is an NFS mount, then you can simply unmount it from here and mount it to the new server. Otherwise, be sure to copy the files there after the backup is complete, for example:

$ scp -r /mnt/emrepo_backup newhost:/path/to/emrepo_backup

After this, it should be safe to shutdown the old repository database.

$ sqlplus / as sysdba
SQL> shutdown immediate

If you use Oracle Restart:

$ srvctl stop database -d emrepo
$ srvctl disable database -d emrepo

Prepare New Host for Repo DB
Now we need to set things up on the new host for the emrepo DB.

Create oratab Entry
First let's create an entry in /etc/oratab for this DB under the new home. For example:


Edit PFILE and Create SPFILE
Then let's copy that parameter file into place.

$ . oraenv
ORACLE_SID = [oracle] ? emrepo
The Oracle base has been set to /oracle/app
$ cd $ORACLE_HOME/dbs/
$ cp /mnt/emrepo_backup/initemrepo.ora .

Now edit that file and make sure you update the parameters that require updating. In my case, I'm using Oracle Managed Files (OMF) so I set db_create_file_dest and db_create_online_log_dest_1. I also set db_recovery_file_dest for the FRA. I then set the control_files parameter to specify where I want the control file(s) restored to from the backup when I get to that point.

Now, Martin Bach noted in his blog post that he did not have to specify a db_file_name_convert or log_file_name_convert. I was having some difficulty during the restore phase, and added these parameters out of pure speculation. They didn't help the problem, but I left them in for the duration of my process. I only mention this as an FYI if you end up comparing your settings to mine.

Once you have all your parameters set as desired, create the SPFILE:

$ sqlplus / as sysdba
SQL> create spfile from pfile;

Now, let us restore ourselves the database.
Restore Repo DB on New Host
The restore was done largely as part of a ksh script, which I'll reference snippets of here. Let's start by defining some variables:


Restore Controlfile and Mount Database
From the script, we call RMAN to start the instance in nomount mode, restore the controlfile from the specified backuppiece and mount the database:

rman log=$LOGFILE <<EOF
connect target /
set echo on
startup force nomount;
restore controlfile from '$BACKUPDIR/1abcd123_1_1';
alter database mount;

catalog start with '$BACKUPDIR' noprompt;

We end by cataloging the backup files, as you can see.

Generate SET NEWNAME Script
Here I dip into sqlplus to generate an script for RMAN to call SET NEWNAME for each of the datafiles. Without this, RMAN would try to restore the datafiles to their old paths on the original host. Here I set them for the path that OMF will use:

sqlplus -s /nolog <<EOF
connect / as sysdba
set head off pages 0 feed off echo off verify off
set lines 200
spool rename_datafiles.rman
select 'set newname for datafile ' || FILE# || ' to ''' || '$DESTDIR/datafile/' || substr(name,instr(name,'/',-1)+1) || ''';' from v\$datafile;
select 'set newname for tempfile ' || FILE# || ' to ''' || '$DESTDIR/tempfile/' || substr(name,instr(name,'/',-1)+1) || ''';' from v\$tempfile;
spool off

Restore & Recover Database
Now we're ready to restore the database and perform recovery. Again, we call RMAN and run this:

run {
  allocate channel c1 device type disk;
  allocate channel c2 device type disk;
  allocate channel c3 device type disk;
  allocate channel c4 device type disk;
  restore database;
  switch datafile all;
  switch tempfile all;
  recover database;

At this point we're done with the restore and recovery. Normally I would OPEN RESETLOGS, but remember that we're restoring this to an home, so we still need to UPGRADE the database!

Open and Upgrade Database
First we still call OPEN RESETLOGS, but with the UPGRADE option. This replaces the "STARTUP UPGRADE" command you would find in the manual upgrade instructions.

$ sqlplus / as sysdba
SQL> alter database open resetlogs upgrade;

Now we follow the rest of the manual upgrade instructions, I'll just post the commands here, but you should definitely review the documentation:

$ cd $ORACLE_HOME/rdbms/admin
$ sqlplus / as sysdba
SQL> spool upgrade.log
SQL> @catupgrd.sql

-- Start database again
SQL> startup;

-- Check status of components, some will be fixed by utlrp.sql
SQL> @utlu112s.sql

-- Rebuild everything
SQL> @catuppst.sql
SQL> @utlrp.sql

-- Confirm everything is OK now
SQL> SELECT count(*) FROM dba_invalid_objects;
SQL> SELECT distinct object_name FROM dba_invalid_objects;
SQL> @utlu112s.sql

The utlu112s.sql should now report all components as VALID. If not, you'll want to refer to the upgrade documentation for troubleshooting.

At the point the database is upgraded and open. Make sure you have a listener running and that the new database is registered. The only thing  left is the tell your OMS servers to look for the repository database in its new location.

Update OMS Repository Settings
First we need to start just the administration server:

$ emctl start oms -admin_only

This is necessary if you used the "-all" option when stopping OMS earlier. If you did not use "-all" then the admin server should still be running.

Now, update the store_repos_details setting in the OMS configuration:

$ emctl config oms -store_repos_details -repos_port 1521 \
  -repos_sid emrepo -repos_host \
  -repos_user sysman -repos_pwd xxx

Repeat this step for all your OMS servers (emctl should remind you to do so when changing the config). Then on each, completely shutdown and restart OMS:

$ emctl stop oms -all
$ emctl start oms

And that should be it! Don't forget to drop/delete the database from the original server when you're comfortable doing so.
Categories: DBA Blogs

ORA-00600 [kkzlpllg:5] When Dropping MView Log

Sun, 2014-03-30 18:43
This week a co-worker and I have been doing some rapid-fire testing to improve fast-refresh performance on an old materialized view, which does some summary aggregations on a 1.9 billion row (and growing) master table. One of the things we tested was using the new-in-11gR2 COMMIT SCN feature. There is a great blog post describing the benefits of this feature by Alberto Dell'Era. To quickly summarize and over-simplify, it provides a much faster way to update rows in the materialized view log that are eligible for refresh and purging. This definitely sounds like something we'd want, so let's roll!

Well we quickly hit a snag when testing our creation script the second time around, when it wouldn't let us drop the materialized view log:
ERROR at line 1:
ORA-00600: internal error code, arguments: [kkzlpllg:5], [], [], [], [], [],[], [], [], [], [], []

We found that we also could no longer perform fast refreshes of the materialized view, getting the same ORA-00600 error. Our initial MOS search turned up Doc ID 14158012.8, which indicates Bug 14158012 (Orphan rows in SNAP_LOGDEP$ causes ORA-600 [kkzlpllg:5]). The bug description is:
With this bug, when there are orphan rows in SNAP_LOGDEP$ with RSCN=NULL,
a CREATE MATERIALIZED VIEW or DROP MATERIALIZED VIEW statement will report ORA-600 [kkzlpllg:5].

I verified that we did have such NULL RSCN values in SNAP_LOGDEP$ related to the master table here.

SQL> select d.tableobj#,
  2  from sys.snap_logdep$ d, sys.obj$ o
  3  where d.tableobj# = o.obj#
  4  and'BAR'
  5  and d.rscn is null;

---------- ------------------------------
    605668 BAR
    605668 BAR

Unfortunately, that doc also said there was no workaround other than to apply a one-off patch (otherwise fixed in and 12c)! Not exactly the quick fix we were hoping for. 

However, we did some more searching and found Doc 1310296.1: Internal Error ORA-00600 [kkzlpllg:5] When Executing Drop Materialized View Log

Pretty much the same thing, only this gives us a workaround:

Since the information stored in the Materialized View log is bogus in any case, the Materialized View Log is not useable anymore. The only option is to drop this object and re-create it. To make this possible, a dummy non-NULL value needs to be written into the RSCN column for that table.So we update those rows to set RSCN to a dummy value (9999):

SQL> UPDATE snap_logdep$
  2  SET RSCN = 9999
  3  WHERE tableobj# = 605668


And we were able to drop the materialized view log and resume testing afterwards.

Hopefully this article saves someone from hitting the same initial roadblock that I did. Especially nice to see the Oracle support docs contradicting themselves!

We also made another interesting find with this particular materialized view that I'll be blogging about later. Definitely an eye-opener, face-palmer and forehead-smacker all in one.

UPDATE - 8 April 2014

Nothing mind blowing, but a little time saver if you know the object name. This doesn't take the owner into account, so be careful out there.

update snap_logdep$
set rscn = 9999
where tableobj# = (
        select distinct d.tableobj#
        from sys.snap_logdep$ d, sys.obj$ o
        where d.tableobj# = o.obj#
        and d.rscn is null
and rscn is null;


Categories: DBA Blogs

Battling Bigfile Backup Bottlenecks

Wed, 2014-03-26 13:44
Last Friday I kicked off a database backup to an NFS destination, using the standard "backup as compressed backupset database" syntax. Loyal readers of this blog may recall that I'm the proud custodian of a 25 Tb database, so this backup normally takes a few days, with an expected completion on Monday morning. However it was still running on Wednesday, and reviewing the logs I saw that there was just 1 channel (of the original 8) still running. The backup file that this channel was writing happened to include our largest bigfile datafile, which weighs in at nearly 8 Tb. Reviewing my new backup script I realized that I had neglected to specify a SECTION SIZE parameter. An example of its usage is:

RMAN> backup as compressed backupset
2> section size 64G
3> database;

Without it, RMAN has decided to create a backup piece that bundled my 8 Tb datafile with a few others and then write it out to disk on one channel. Obviously this isn't what we wanted.

I'm not a big fan of bigfile tablespaces, primarily because you lose the benefits of parallelism when huge files can only be handled by a single channel, as with datafile copy operations and backups. In 11g, however, Oracle has introduced multi-section backups via the SECTION SIZE option for RMAN backups. This option tells RMAN to break a large file into sections of the specified size so that the work can be done in parallel. If the specified size is larger than the file, then it is simply ignored for that file.

There is a limitation in that the file can be split into a maximum of 256 sections. So, if you specify a section size that would result in more than 256 sections being created, Oracle RMAN will increase the size so that exactly 256 sections are created. This is still enforced today in Oracle 12c.

Another limitation in Oracle 11g is that multi-section backups cannot be done with image copy backups. Those must still be done as a whole file and so can still be a huge bottleneck. However this is no longer a problem in Oracle 12c, and multi-section image copy backups are possible. I'm looking forward to using this as we also use image copy backups as part of our recovery strategy.

To highlight the parallel benefits, I ran a compressed backup of a 1.5 Tb bigfile tablespace using 8 channels. The first one does NOT use section size and so only goes over one channel:

Starting backup at 2014/03/25 14:35:41
channel c1: backup set complete, elapsed time: 17:06:09
Finished backup at 2014/03/26 07:41:51

The second one uses a section size of 64 Gb (otherwise same command & file):

Starting backup at 2014/03/26 07:41:52
Finished backup at 2014/03/26 09:48:33

You can see the huge impact made by making use of the multi-section backup option. A single channel took over 17 hours to back it up. Using 8 channels with a section size of 64 Gb took only just over 2 hours. Eyeballing the log shows an average of around 6 minutes per section. Definitely much better than waiting for a single channel to do all the work when the rest of the system is waiting.
Categories: DBA Blogs

Why Is My MView Log Not Purging?

Sun, 2014-03-16 15:50
A few weeks ago we saw one of our tablespaces growing at a rate much higher than the others. Taking a look we saw that the biggest users of space were two materialized view logs, one being 110 Gb and the other 60 Gb. These logs were in place to facilitate the fast refresh of two materialized views, one for each log/table. These materialized views did some aggregations (sum) throughout the day on some important base table data. The fast refreshes were completing successfully many times a day, but the logs were not being purged as expected.

In our case, there was only one mview performing a fast refresh on those base tables, so the mview logs should have been completely purged after each refresh. They certainly shouldn't be growing to over 100+ Gb. Looking at the data in the mview log, all records had a SNAPTIME$$ value of "4000/01/01 00:00:00", which is the default value for records in the mview log that have not been refreshed. Once they are refreshed, the SNAPTIME$$ value gets set to SYSDATE and can then be evaluated for purging.

But why was this value not being updated after refresh?

For those of you unfamiliar with the role of materialized view logs, I'll share this primer from Tim Hall via his excellent Oracle-Base article:

Since a complete refresh involves truncating the materialized view segment and re-populating it using the related query, it can be quite time consuming and involve a considerable amount of network traffic when performed against a remote table. To reduce the replication costs, materialized view logs can be created to capture all changes to the base table since the last refresh. This information allows a fast refresh, which only needs to apply the changes rather than a complete refresh of the materialized view.
Digging deeper led me to MOS DocId 236233.1, which tells us that Oracle compares the MLOG$_<TABLE_NAME>.SNAPTIME$$ value against the SYS.SLOG$SNAPTIME:

Rows in the MView log are unnecessary if their refresh timestamps MLOG$<table_name>.SNAPTIME$$ are older or equal than the oldest entry in SLOG$.SNAPTIME for this log.

Here's where we saw the real problem.

 2 from sys.slog$
 3 where mowner='FOO' and master='BAR';

 no rows selected

If the purge mechanism checks SLOG$.SNAPTIME then of course nothing is going to happen, as the materialized view is NOT registered in SYS.SLOG$!

We re-created the MVIEW from scratch on our development database and had the same results, which indicates it's something systemic in Oracle so we opened an SR. After the standard back-and-forth of trying the same things over and over, Oracle Support said that this was actually expected behavior:
This mview is defined as fast refreshable with aggregates. The mv log is defined with PRIMARY KEY INCLUDING NEW VALUES.

In order to support fast refresh the mv log should include ROWID as well. Please review the Restrictions on Fast Refresh on Materialized Views with Aggregates located here: There are additional restrictions depending on the operations performed. As an example, SEQUENCE should also need to be added to the mv log if direct loads are performed on new_invoice_record.This turned out to be the case. We recreated the mview log with the ROWID specification, then re-created the materialized view and, sure enough, the mview was registered in SYS.SLOG$ and refreshes were purging the log as expected.

I was more than a little frustrated then that Oracle would let us create the MVIEW without any warnings or errors in the first place. The database obviously detected something wrong since it wouldn't register them in SYS.SLOG$. Their last response was that, since the MVIEW itself was refreshing successfully, no error should be reported. This fails to address the question for me, so I'm going to push back a little harder and will share what I find.

For now, though, we need to schedule a maintenance window to recreate these materialized views and their logs and see if we can reclaim some disk space afterward (perhaps a future post!).
Categories: DBA Blogs

What to Expect When You're Changing the DB_UNIQUE_NAME

Wed, 2014-03-12 18:31
I recently had to change the db_unique_name of a database to make it jive with our typical database/DataGuard naming policy of appending the datacenter location. For the sake of this post let's say it was changed from ORCL to ORCL_NYC, since this database is in our fictional New York City datacenter.

I did a quick set of tests and thought I'd share the findings to save anyone any unpleasant surprises. Here are the things to expect when changing DB_UNIQUE_NAME.

Change the Parameter Value
First we obviously have to change the value. How we do so is important.
The command:
alter system set db_unique_name=orcl_nyc scope=spfile;

will result in a db_unique_name of ORCL_NYC, in all uppercase, which is used for the path changes we'll discuss later. However using quotes instead:
alter system set db_unique_name='orcl_nyc' scope=spfile;

will result in a lowercase orcl_nyc value used in the parameter and some paths. In either case, the fun begins when you next restart the instance!

ADR Location (i.e. alert log)
The ADR location appends the DB_UNIQUE_NAME to the location specified by the DIAGNOSTIC_DEST initialization parameter (defaulting to the $ORACLE_BASE environment variable value). When you restart after setting the DB_UNIQUE_NAME, your ADR location will be in a new location using the new DB_UNIQUE_NAME. Probably of most interest to you is that this means your alert log location will move, so any tools or scripts that referenced that file directly (e.g. error-scraping scripts or log-rotation jobs) will need to be updated. 

Regardless of quotes or not, the ADR path always used a lowercase string in the path.


SQL> show parameter db_unique_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      ORCL
SQL> show parameter background_dump_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_dump_dest                 string      /oracle/app/diag/rdbms/orcl/or


SQL> show parameter db_unique_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      ORCL_NYC
SQL> show parameter background_dump_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_dump_dest                 string      /oracle/app/diag/rdbms/orcl_ny

Datafile and Online Redo Log OMF Location
If you have the db_create_file_dest (and optionally db_create_online_log_dest_N) parameter set, then Oracle will use the DB_UNIQUE_NAME value in the OMF location for any new datafiles and redo logs created after the change, assuming a full path isn't specified.

SQL> alter tablespace users add datafile size 10m;

Tablespace altered.

SQL> select file_name from dba_data_files
  2  where tablespace_name='USERS';


In this case, Oracle will use an uppercase string regardless of whether or not the DB_UNIQUE_NAME is in upper or lower case. Note that existing files won't be affected, this will only apply to new files.

If this database is part of a DataGuard configuration, you'll want to be sure to update your db_file_name_convert and log_file_name_convert parameters to point to the new location.

FRA (Backups, Archivelogs, Flashback Logs)
In the same spirit of OMF, the FRA will also change locations. Similar to the previous case, the uppercase value of the DB_UNIQUE_NAME is used in the path, regardless of the original case. So, after a change and a couple of log switches, you would see something like this:

SQL> select name from v$archived_log;






Again, this will only affect newly created files. Existing backups, archivelogs and flashback logs will not be affected, they remain cataloged in their present locations and will be accessed just fine. RMAN will delete them when needed (or commanded) and then you could choose to delete the empty directories.

Oracle Wallet
I admit this one I didn't think of. If you use an Oracle Wallet, and do not specify a location in the sqlnet.ora file, Oracle looks in the default location of $ORACLE_BASE/admin/<DB_UNIQUE_NAME>/wallet/. Even more interesting, the DB_UNIQUE_NAME value is case sensitive. So you'll need to be aware of this when moving your wallet files to the new location. Here is a quick look at my findings on this matter:

-- db_unique_name set with no quotes
SQL> select wrl_parameter from v$encryption_wallet;


-- db_unique_name set with quotes, lowercase
SQL> select wrl_parameter from v$encryption_wallet;


I can't say if this makes a difference when running on Windows, which I believe is case-insensitive. But on Linux it makes a difference.

That wraps this one up. I hope it helps a few people out and saves them an hour or two of head-scratching or worse. If there's anything I forgot, please let me know and I'll update this post.
Categories: DBA Blogs

Oracle Direct NFS and Infiniband: A Less-Than-Perfect Match

Wed, 2014-03-05 19:54
Readers of an earlier post on this blog will know about my latest forays into the world of Direct NFS. Part of that means stumbling over configuration hiccups or slamming into brick walls when you find new bugs.

To quickly re-set the table, my organization purchased the Oracle ZFS Storage Appliance (ZFSSA) 7420. Oracle sold us on the Infiniband connectivity as a way to make a possible future transition to Exadata easier. However the pre-sales POC testing was done over 10gb Ethernet (10gigE). So it was that everything (including their Infiniband switches and cables) arrived at the datacenter and was installed and connected by the Oracle technicians. There were a few initial hiccups and frustrating inconsistencies with their installation and configuration, but those are outside the scope of this post.

We decided to put a copy of our standby database on the ZFSSA and have it run as a second standby. The performance problems were quick to appear, and they weren't pretty.

Configuring SharesWe configured the ZFS project shares by the common Oracle best practices in terms ZFS recordsize and write bias. For example, datafile shares were set to an 8k recordsize (to match the db_block_size) and throughput write bias, where as redo log shares were set to 128k recordsize and latency bias. Note that with Oracle Database 12c, Direct NFS over NFSv4, and the more recent ZFSSA firmware, you gain the benefit of Oracle Intelligent Storage Protocol (OISP), which will determine the recordsize and write bias automatically based on the type of file it recognizes.

Copying the DatabaseTo start out we needed to get a copy of the database onto the ZFSSA shares. This was easily done with RMAN's backup as copy database command, specifying the ZFSSA mount as the format destination. We were fairly impressed with the Direct NFS transfer speed during the copy and so we were optimistic about how it would stand up with our production load.

Starting Recovery!
Once everything was set, we started managed recovery on the standby. Our earlier excitement gave way to a sort of soul-crushing disappointment as the recovery performance basically ground to a standstill and traffic to the ZFSSA went from hundreds of Mbps to barely a trickle. We could stop recovery and copy a big file with great speed, but something in managed recovery was not playing nicely.

We found that we could disable Direct NFS (requires a database restart and software relinking), and managed recovery would actually perform better over the kernel NFS, although still not nearly as well as we would need.

This started a blizzard of SR creations, including SRs being spawned from other SRs. We had SRs open for the ZFSSA team, the Direct NFS team, the Data Guard team, and even the Oracle Linux and Solaris teams, even though we were not on Oracle Linux or Solaris (we use RHEL). It came to a point where I had to tell our account manager to have support stop creating new SRs, since every new SR meant I had to explain the situation to a new technician all over again.

At this point we were having twice-daily conference calls with our account manager and technical leads from the various departments. Their minions were working hard on their end to replicate the problem and find a solution, but we were running into a 4th week of this craziness.

The Infiniband BanditAfter many frustrating weeks of changing configurations, cables, cards, and just generally grasping at straws, it was finally narrowed down to the Infiniband. Or rather, a bug in the open fabric (OFA) linux kernel module that dealt with Infiniband that was triggered when Direct NFS would fire off a whole lot of connections, like when DataGuard managed recover would fire up 80 parallel slaves. We tested out the 10gigE channel we had for the management UI and performance was like night and day with just the one channel.

Oracle Support suggested it might be related to bug 15824316, which also deals with dramatic performance loss with Direct NFS over Infiniband. The bug in the OFA kernel module was fixed in recent versions of Oracle Enterprise Linux (OEL) (specifically the UEK kernel), but Oracle is not sharing this fix with Red Hat (or anyone else, presumably). Since we're on RHEL, we had little choice but to send all the Infiniband networking hardware back and order up some 10gigE replacements.

We're still in the process of getting the 10gigE switches and cables all in place for the final production setup. If you're curious, it's 4 10gigE cards per server, bonded to a single IP to a 10gigE switch into the ZFSSA heads. This 10gigE network is dedicated exclusively to ZFSSA traffic.

So, in the end, if you're on (a recent version of) of OEL/UEK, you should have nothing to worry about. But if you're on RHEL and planning to use Direct NFS, you're going to want to use 10gigE and NOT Infiniband.

Update - 6 Mar 2014Some of have asked, and I want to re-iterate: Oracle have claimed that the OFA module was entirely re-written, and their fix is specific to OEL and is not covered by GPL or any similar license. We were told that they have no plans to share their code with RHEL. Also there is no MOS bug number for the OFA issue, it was apparently re-written from scratch with no bug to track the issue. If this all sounds rather dubious to you, join the club. But it's what our account manager told us at the end of last year.

Another Update - 6 Mar 2014Bjoern Rost and I discussed this privately and after quite a bit of research and analysis he shared this conclusion:

Oracle Support suggested that this issue would not occur with the OFA module used in Oracle Linux with the UEK kernel. RedHat changed their support in RHEL6 from shipping the whole openfabrics stack to just including the drivers that were also present in the upstream mainline kernel. This is RedHat’s policy to ensure stability in the version of the kernel they ship. Oracle offers an OFA package with some additional patches (all GPL/BSD license) for the UEKr1 and UEKr2 kernels. Unfortunately, these two different approaches make it very hard to pinpoint specific patches or create backports for the RedHat kernel version.
Categories: DBA Blogs

The Danger of Moving Incrementally Updated Datafile Copies

Fri, 2014-02-28 11:20
When I sat down at my desk yesterday morning I was greeted with some disturbing email alerts notifying me that one of the NFS mounts on my standby database host was full. This was the NFS mount that held an image copy of my database that is updated daily from an incremental backup. The concept and an example can be found in the documentation. With a 25Tb database, waiting to restore from backups is not as appealing as simply switching to the copies and getting back to business.

We quickly saw that the reason that this mount was full was that RMAN had tried to make another set of image copies in the latest backup run rather than take an incremental backup for recovery. It does this when it finds no valid copy of the datafiles to increment, and the logs confirmed this to be the reason:

Starting backup at 2014/02/26 13:30:16
no parent backup or copy of datafile 792 found
no parent backup or copy of datafile 513 found
no parent backup or copy of datafile 490 found
no parent backup or copy of datafile 399 found

... and so on, for every datafile. However I knew that the copies that had been there (and been updated) every day were still there. So what was different?

It was then that I remembered my work from the day before. Doing a bit of re-organization, I renamed the directory where the datafile copies lived. However I made sure to re-catalog them and double-checked to make sure the backup tag was still there, which it was. I also crosschecked the copies to make the old entries as expired and then deleted them from the catalog. This turned out to be the cause of the problem.

When the original datafilecopy entries were removed from the catalog, RMAN didn't want to recognize the new entries as the right copies, even though they were literally the same file, with the same backup tag. And so RMAN printed the message you see above and dutifully began making new image copies until it filled up the mountpoint, which didn't have another spare 25 Tb handy.

Today I was able to duplicate the scenario on a (much smaller) sandbox with various sequences. Every time, once I crosschecked the original copies and deleted them as expired, RMAN would create a new copy on the next run. The sequence was basically this:
  1. Run backup-for-recovery and recover commands. First time will create datafile copies as expected.
  2. Run it again, this time it will create an incremental backup and then apply it to the copies made in the previous step.
  3. Move or rename the directory holding the copies.
  5. CATALOG START WITH '/path/to/new/location/';
  6. LIST DATAFILECOPY ALL; to verify that the copies are registered under the new location and the TAG is right.
  7. Run backup-for-recovery and recover commands (be sure to update the location). I would expect the same results as step 2, but instead new copies are created.
One thing that was very interesting was that if I just cataloged the new location, but did not crosscheck or delete the old entries (i.e. skipped step 4), then I could run the script and it would take an incremental backup as planned and recover the copies in the new location. But then if I later did the crosscheck and delete, it would not accept those copies and create new copies. And all this time I can "list datafilecopy all;" and see both copies with the same tags. Changing the order of steps 4 and 5 made no difference.

I'd be interesting to know what anyone else thinks about it. Personally it seems like a bug to me, so I've opened an SR. So far Oracle Support have confirmed what I've experienced, although have said there is no bug on file. They suggested I use Doc ID 1578701.1 to make another copy of the datafile with a new tag and use that new tag. However if I wanted to do that I would just create a new database copy and keep using the original tag, which is exactly what I've done.

I will be sure to update this post with anything I find. Until then, I wanted to share this experience for anyone else that might need or want to move their datafile copies if they are part of an incrementally-updated-backup strategy.
Categories: DBA Blogs