Reconstruct Standby DB

After a recent power outage at our DR site, I discovered that a standby there had stopped applying logs. Apparently in the archived redo logs was a transaction which grew a datafile but the disk at the standby site did not have enough disk space to allow that transaction to complete. So the standby terminated managed recovery, as it should.

We normally keep the archived redo logs for 7 days. Unfortunately, by the time I discovered this situation, 15 days had passed and the archived redo logs were “missing”. With no archived redo logs to apply, the only solution was to rebuild the database from scratch. This database is approximately 7TB in size, so rebuilding from scratch is no trivial affair.

The primary is a 3-node RAC 11.2.0.2 database running on Linux. The standby is a two-node RAC database, obviously the same Oracle and OS versions.

Here is how we accomplished rebuilding the standby:

  1. We put the primary in hot backup mode and took a disk-based snapshot of the database.
  2. The snapshot was copied to external media. Note: shipping across the WAN was too time-consuming.
  3. The external media was hand carried to the DR site.
  4. The LOG_ARCHIVE_DEST_STATE_n for the standby was set to DEFER.
  5. The standby database was dropped from the DG Broker configuration:   REMOVE DATABASE standby PRESERVE DESTINATIONS;
  6. The standby database’s mount points were erased. After all, the database was essentially useless at this point.
  7. New mount points were created and the snapshot was written to the disk at the DR site.
  8. After the file transfers were complete (about 5 days), we told our storage to update the snapshot at the DR site with a more current snapshot. This was performed over the WAN since only the changes were sent, which was much, much smaller than the database.
  9. A standby controlfile was created:   ALTER DATABASE CREATE STANDBY CONTROLFILE AS ‘/dir/path’;
  10. To keep things simple, we wanted to use a single-instance standby until we got it up and running. So we created a PFILE from the standby’s RAC SPFILE and then used a text editor to modify the parameter file to remove any RAC-aware parameters. We removed CLUSTER_DATABASE, set an instance-specific UNDO_TABLESPACE parameter to be used for all instances “*.”, removed THREAD parameters, etc. Our normal standby database has two instances, STANDBY1 and STANDBY2. In node 1, we put the pfile in $ORACLE_HOME/dbs/initstandby.ora instead of initstandby1.ora so the single-instance database could find its parameter file. We did something similar for the password file.
  11. For its treatment, Kamagra purchase cheap levitra pills are used that help in regaining the confident and get rid of embarrassment. Prolonged continuation cialis india generic browse description of non-treatment can lead to high risk of contracting botulism from Botox, it must be remembered that it is a drug, not a cosmetic. Guys, not just women are at risk of ailments. sildenafil canada pharmacy http://foea.org/wp-content/uploads/2014/10/Comm-Garden-and-Police-House-june-2011.pdf How to utilize? It is essential to use this medicine after consulting medical practitioner. brand cialis 20mg

  12. We copied the standby control file from step 9 over the control files in the database snapshot.
  13. With the pfile and pswd file in place for a single instance database, we did STARTUP MOUNT.
  14. We created any standby redo logs we would need. In our case, the primary also has standby redo logs to facilitate switchover operations and the standby redo logs from the primary were not part of the snapshot. So we had to remove the SRL’s that did not make the trip.
  15. In the primary, set LOG_ARCHIVE_DEST_STATE_n to ENABLE.
  16. In the primary instances, performed ALTER SYSTEM SWITCH LOGFILE;
  17. Verified in both the primary’s and standby’s alert logs that the standby was receiving logs, i.e. verified that log transport was working.
  18. Turned on managed standby: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
  19. Verified in the standby’s alert log that the logs were being applied, i.e verified apply was now working.

At this point, we had a standby database back up and running. We created a simple table in the primary and inserted one row of data in it, performed the log switches again, and then opened the standby in READ ONLY mode to verify that the transaction was replayed in the standby as it should. Once we were satisfied that the standby database was working, we need to make it a RAC database. Well everything is already in place for this to be a RAC database because it once was. To finish the job, we just shutdown the single-instance standby database (SHUTDOWN ABORT) in SQL*Plus and then used srvctl to start up the standby as a RAC database:

srvctl start database -d standby -o mount

The only thing that remained at this point was to add the standby back to the DG Broker configuration (in DGMGRL):   ADD DATABASE standby

When this first happened, I was nervous how it would go being such a large database. None of the operations above are size-dependent other than copying the files to and from media. But it all went well.

To ensure we do not run into this situation in the future, we added alerting to our Oracle Enterprise Manager Grid Control. I will now receive a WARNING alert when log shipping or log apply is 12 hours behind and a CRITICAL alert when 24 hours behind. That should give us plenty of time to fix any issues before the archived redo logs are automatically removed after 7 days, or at the very least, change the process to hold more days worth of archived redo logs until we do rectify the situation.