May 15

ISS-Earth Flyover at Night

This video is really cool of the International Space Station flying over the Earth at night. This has nothing to do with databases, but it is still worth a look.

May 14

Recovery Requirements Before Backups

All too often, I see people asking questions about backup strategies they should employ for their databases. It never seems to fail, every question of this sort that I have come across in a variety of forums never once includes their recovery requirements. I have often stumped for considering your recovery requirements before designing your backup strategy. When pressed for requirements, I will often get backup requirements for example:

  • Backups must not introduce any downtime
  • Need to be able to backup archived redo logs
  • Backups must be written to tape

These requirements are well and good, but in my opinion, you should never design your backup strategy without first documenting your recovery requirements and obtaining management occurrence.

So here are some questions that I ask myself when designing a backup strategy. Notice that these questions are all focused on the recovery side of things.

  1. How much data loss can I afford when I restore the database? Zero data loss? Is one hour of data loss acceptable after recovery of the database?
  2. Do I need to roll forward any transactions, i.e. perform a point-in-time restore?
  3. Will I need to restore the contents of one schema while leaving the other schemas intact?
  4. How long do I have to get the database up and running after a failure?
  5. What kind of failures am I required to recover from? Obviously, I need to be able to restore from a complete server failure or loss of a disk. But do I need to be able to recover from human failures like someone who accidentally deleted a table?
  6. Will I be required to restore the backup to other servers as part of refreshing development or testing databases from a copy of production?

If you ask most people in the Oracle community these days, they will tell you to use RMAN to backup your database. RMAN is a great product and for many things, is better than the old-style User Managed hot or cold backups. Some Oracle DBA’s will tell you to use RMAN to perform hot backups and run your production database in archive log mode. By doing so, you will cover many of the recovery scenarios that you are likely to face. But what if your answer to question 4 is that you have 1 hour to get back up and running and your database is 10TB in size. Good luck in trying to do a complete restore of a 10TB database with RMAN. And RMAN will not be able to help with question 3 as RMAN does not backup on the schema-level.

There are many tools at the DBA’s disposal for backing up and recovering data in the database. Those tools include, but are not limited too:

  • Oracle’s Recovery Manager (RMAN)
  • Oracle User-managed backups
  • Oracle export/import or Data Pump
  • Disk based snapshots
  • Disk based replication
  • Oracle’s Data Guard

So which do you use? Well each has its pros and cons. Once you know your recovery requirements, the tools to backup your database start to become clearer. And you may need to employ more than one backup tool to meet all of your recovery requirements. If you use, as some suggestion, RMAN with Archive Log mode and nothing else, and your manager comes to you and says “you need to get this 10TB database back up and running in 1 hour!” your job may well be on the line.

Which leads to the next point, document your recovery requirements and get them into your Service Level Agreement (SLA). When writing and vetting the SLA, your management may say they want zero data loss. It is at this time that you can bring up the pros and cons to implementing a zero data loss solution…and also mention the costs! Many companies balk at the high cost of a zero data loss solution but for other companies, the cost is small when compared to the financial burden of losing any transaction. This is where the haggling and bartering come into play. If management insists on a zero data loss solution, then they have to come up with the funds to support it because RMAN (free) is not going to provide it. Once you have your recovery requirements documented in the SLA, then it will be difficult for management to ask you to recover something that your backup strategy is not designed to support. If the SLA is in place and they do ask you to recover every single transaction and your backup strategy won’t allow it, then you have a document which says that zero data loss was never required and this can help save your job.

That being said, once you have your recovery requirements documented in the SLA, make sure that your backup strategy will allow you to perform every recovery scenario that is documented in the SLA. Your job may depend on it. If the SLA says zero data loss and you do not implement Data Guard even though management was willing to fund it, then they might terminate you because you were not following through on your job.

Finally, no backup/recovery strategy is complete unless it is thoroughly tested. You should test every required recovery strategy to ensure that you can meet all requirements stated in the SLA. Testing should be performed no less than once a year for two reasons…one, ensures that changes to the system do not negatively impact your ability to perform a required recovery and two, keeps you up-to-date on how to perform recovery so that if you have to do it for real, you are not fumbling for the procedure. In testing, you may find that your backup methodology will support recovery scenarios that are required, but are nice to have if you need them.

And I cannot say it enough….Test, test, and test!

Apr 30

Error 404 Not Found with EM 12c

I have recently installed Oracle’s Enterprise Manager 12c for a new environment. Everything was working just fine when I left work last week but when I tried to connect to EM today, I got the following:

Error 404--Not Found
From RFC 2068 Hypertext Transfer Protocol -- HTTP/1.1:
10.4.5 404 Not Found

Huh?!?!? What do you mean the initial page isn’t found? It should be there.

So I attempted to restart EM when I got the following:

Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.1.0
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
Starting WebTier...
WebTier Successfully Started
Starting Oracle Management Server...
Oracle Management Server is Down

Notice that the OMS would not start.

I realized that the problem was due to the repository database being down. When I started the repository database, I was able to get OMS to start correctly.

Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.1.0
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.
Starting WebTier...
WebTier Successfully Started
Starting Oracle Management Server...
Oracle Management Server Successfully Started
Oracle Management Server is Up

This is a change for EM 12c. With EM11g, you could start OMS without the repository database up and running. But 12c must require the repository database to be up before OMS will start correctly.

Apr 30

_gc_fusion_compression

I found this very cool blog post which discusses how Oracle decides how much of a datablock to transmit across a RAC cluster interconnect:

http://orainternals.wordpress.com/2012/04/29/_gc_fusion_compression/

Apr 26

Blackouts with EMCLI

In my previous post, I talked about how to set a Enterprise Manager blackout with the Agent’s emctl utility on the command line. Everything was working fine until I tried to set a blackout with a clustered database. When trying to set a blackout on a clustered database target you will get an error similar to the following:

./emctl start blackout testblackout orcl
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
"orcl" is a cluster target, hence skipping it. Please use EM console.

From Metalink Note 419740.1, we find that we cannot use the Agent’s emctl utility to set blackouts on RAC databases. You do have the option of scheduling blackouts in Enterprise Manager, but I still wanted to do this via the command-line. The trick is to use EM’s emcli (Command Line Interface) utility on the OMS server.

export JAVA_HOME=/u01/app/oracle/product/middleware/oms11g
export PATH=$JAVA_HOME/jdk/bin:$PATH
cd /u01/app/oracle/product/middleware/oms11g/bin
./emcli login -username=bpeasland
./emcli create_blackout -name="Standby Snapshot" -add_targets="resp:cluster_database" -description="Standby taken down to perform cold snapshot" -schedule="duration::10" -reason="backup"
./emcli logout

Unfortunately, this option requires me to login and if I want to put this in a shell script, I would be required to code the password.

Apr 26

Blackouts with EMCTL

I have a need to take down a database to perform a cold snapshot of the database’s files. This database is monitored by Enterprise Manager and I do not want to receive an alert when the database is down. So I need to figure out how to start a blackout in Enterprise Manager from the command line which will be included in our backup script.

On the database server, do the following:

cd $AGENT_HOME/bin
./emctl start blackout TestBlackoutEMCTL fgodba.acme.com
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
Blackout TestBlackoutEMCTL added successfully
EMD reload completed successfully

We can verify the blackout is in effect in Enterprise Manager. Sign on to EM and click on Setup -> Blackouts. You should see something similar to the following:

Select the blackout radio button and click the View button.

We can verify that the blackout name contains the string that I denoted when I started the blackout with the date and time appended to it.
We can also query the status of the blackout on the command line:
./emctl status blackout
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
Blackoutname = TestBlackoutEMCTL
Targets = (fgodba.acme.com:oracle_database,)
Time = ({2012-04-26|13:27:32,|} )
Expired = False

It is now time to end the blackout.

./emctl stop blackout TestBlackoutEMCTL
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
Blackout TestBlackoutEMCTL stopped successfully
EMD reload completed successfully

This process is documented here:

http://docs.oracle.com/html/B12013_03/emctl.htm#i1030612

Apr 24

Installing Agent11g on Windows

Normally, to install the 11g version of the Oracle agent for Grid Control, one would push the agent from Grid Control the database server. However, when pushing from Grid Control to a Windows server, we receive an error telling us that Windows does not have SSH running on port 22. Windows servers normally do not have SSH running at all and rather than try to jump through hoops to get SSH running on Windows, we can perform a silent install of the Agent on the server and have the agent configuration contact Grid Control for final setup. This post details how to install Agent11g on Windows servers supporting Oracle databases.

To install Agent11g on Windows servers, follow these steps:
1. Download the Agent11g installer from http://www.oracle.com/technetwork/oem/grid-control/downloads/agentsoft-090381.html
2. Create a response file to be used for silent installation. You can find a sample response file in the \win32\response\additional_agent.rsp file included in the download.
3. Update the response file with specific config items in the following list:
a. OMS_HOST=”my-oms.acme.com”
b. OMS_PORT=1159
c. AGENT_REGISTRATION_PASSWORD=”password”
d. BASEDIR=”C:\oracle\”
e. SECURITY_UPDATES_VIA_MYORACLESUPPORT=false
f. DECLINE_SECURITY_UPDATES=true
g. ORACLE_AGENT_HOME_LOCATION=”C:\oracle\”
h. FROM_LOCATION=”C:\product_dir\win32\agent\stage\products.xml”
4. Run the installer with the above response file:
a. cd \product_dir\win32\agent
b. setup.exe –silent –responseFile C:\dir\agent_install.rsp
5. Verify the agent will start up on reboot.
a. Go to Control Panel –> Administrative Tools –> Services
b. Verify the Oracleagent11g1Agent is set to auto start and that it is currently running.

6. Verify the agent is currently running. In a DOS window:
a. cd \oracle\agent11g\bin
b. emctl status agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved.
—————————————————————
Agent Version : 11.1.0.1.0
OMS Version : 11.1.0.1.0
Protocol Version : 11.1.0.0.0
Agent Home : C:\oracle\agent11g\agent11g
Agent binaries : C:\oracle\agent11g\agent11g
Agent Process ID : 3708
Agent URL : https:……..
Repository URL : https:……..
Started at : 2011-07-14 13:19:18
Started by user : SYSTEM
Last Reload : 2011-07-14 13:19:18
Last successful upload : 2011-07-14 13:32:31
Total Megabytes of XML files uploaded so far : 16.18
Number of XML files pending upload : 0
Size of XML files pending upload(MB) : 0.00
Available disk space on upload filesystem : 53.42%
Last successful heartbeat to OMS : 2011-07-14 13:32:55
—————————————————————
Agent is Running and Ready
7. Complete agent config:
a. On the database server, sign on to the database and verify the DBSNMP account is unlocked:
i. SELECT account_status FROM dba_users WHERE username=’DBSNMP’;
ii. Unlock the account if necessary
1. ALTER USER dbsnmp ACCOUNT UNLOCK;
iii. Change the password if necessary
1. ALTER USER dbsnmp IDENTIFIED BY new_password;
b. Sign on to Grid Control
i. Click on the Hosts tab.
1. Verify the database server is listed and is up and running.
ii. Click on the Databases tab.
iii. Find the newly added database.
1. Select the radio button next to this database and press Configure
2. Enter the DBSNMP password in the appropriate box and press Next
3. Press Submit
iv. Verify the db shows as ‘up’ in the Databases tab.

Apr 23

Patch Policy

About a year and a half ago, I moved to a new company and started working as their DBA. The company did not previously apply any patches to any Oracle databases. Since I have been here, I have seen IT system security become more of a focus point and undergo a higher level of scrutiny that previously seen. Rather than wait for a directive to start implementing regular security patches for our Oracle databases, I decided to be proactive. The day will come when we are required to start patching our Oracle databases on a regular basis and I would like to say that we already have it implemented.

The Apr2012 CPU was just released last week and this is the first CPU that we will apply to our Oracle databases. Before I applied the first CPU, a little thought went into how to implement this change in our corporate environment. I decided to share a few of those changes in case anyone else gets to a similar situation in the future.

1. The 3 D’s: Before any patching began, a patch policy was Documented, Disseminated to IT staff and management, and Discussed. This document discussed at a high level, the need for regular security patches, when the security patches would come out, and how they would be applied to our systems to as to reduce risk to the database, the application, and the end users.
2. Patch Timeline – I have the luxury of having a clone of our production database just for the DBA’s use and no one else. My timeline starts there. Within one week of the CPU’s release, I am to apply the CPU to my DBA database and resolve any issues. With two weeks of the CPU release, I am to apply the patch to our development databases. Within one month of the CPU release, I am to apply the patch to Test and Stage database. And finally, within 6 weeks of the CPU release, I am to have applied the patch to production. This is just my timeline and what works in our environment. Your timeline may be different. But it is important that everyone understands the timeline and that the timeline does two seemingly contradictory things – 1) applies the patch slowly so that any database or application issues are sorted out before proceeding to the next step in the timeline. Once the patch hits production, there should be no surprises and confidence the patch will not break anything. 2) applies the patch fast enough so that security holes are plugged in a reasonable time. In my environment, six weeks to production is slow enough to catch issues but about as fast as we feel comfortable in going. Your environment may have other timelines.
3. Log It – I feel strongly that patches should be documented in some sort of change log. With the log, you should be able to go back and see exactly when each patch was applied to each database. This can go a long way in diagnosing if a patch was responsible for an issue. If I get a ticket that a procedure receives errors and the problem was first noted on May 1st, then I can look at the change log. If I applied the patch on April 30th, then the patch may have introduced the problem. But if I applied the patch on May 2nd and the problem existed a day earlier, then the patch is not the cause of the problem. Some organizations already have a Change Control mechanism in place and the Oracle patch log should fit within that structure.
4. Test/Test/Test – As a DBA, we have a duty to ensure that changes introduced into production have a high degree of confidence that the application will not break. It is vitally important to test your changes and patches are no different. If you do not go through your patch timeline, you will not have adequate time to test and if there is a problem that the patch has introduced to your environment, it would be career suicide if you did not adequately test before hitting production.
5. Backups – One must backup the database and the Oracle home directory being patched before applying the patch. You never know if you will have to go back to a previous point before the patch in one or both of those areas. One should occasionally test the restore methodology or backing out patches well before production. This testing may not necessarily need to be done once a quarter, but should be at least once a year.

I think those about cover the major thoughts I had on the subject. If you have questions/comments, let me know.

Apr 02

One Exabyte Per Day

I ran across this article which discusses the Square Kilometer Array (SKA). As a DBA, the thing that struck me was that this system will generate 1 exabyte of data per day that will have to be stored somewhere. That is challenge #1, getting the data into a data store of some type in a fast, efficient manner. Challenge #2 will be getting the data out of that data store in a fast, efficient manner so as to facilitate number crunching to provide meaningful analysis, ie glean knowledge from the data.

In my previous job, I often worked with large amounts of satellite data, on the order of 1 or 2 terabytes per day which would need to be ingested into an Oracle database. But the SKA project takes this to a whole new level!

Mar 29

OEL6 and Oracle 11g

I’ve never quite understood why Oracle released their own OS, Oracle Enterprise Linux 6 and then waited over a year for their flagship product, Oracle RDBMS 11g, to be certified on that OS platform. But the wait is over. You can now safely run Oracle 11g on Oracle Linux 6.

But wait…no so fast…While you can run the database software on this OS platform, good luck managing that database if you use Enterprise Manager Grid Control 11g or Cloud Control 12c. It seems that the Oracle Agent is not yet certified on Oracle Linux 6 and no word on when it will be certified. Many Oracle DBAs manage multiple Oracle databases and centralized management of those systems with Enterprise Manager is a must. If you fit into this category, then you will still have to wait before you can run Oracle RDBMS on Oracle Linux 6.

This reminds me of my conundrum with upgrading Enterprise Manager. I currently use Grid Control 11g and I would like to upgrade to Cloud Control 12c. But alas, I cannot if I want to be able to manage all of my database systems. You see, through the acquisition of another company, I inherited Oracle on Windows for some of the database systems that I have to manage. Oracle will not release Agent12c until Q4FY2012 for both 32-bit and 64-bit Windows platforms. Q4FY2012 for Oracle Corp is between March and May 2012 so it should come out before this summer. But since you must run Agent12c to be able to manage the database with Cloud Control 12c, I guess I will wait for a while longer before I upgrade my Enterprise Manager environments.

Older posts «