Aug 27

How to Backup RAC VM’s

Many of us now have Oracle RAC installed in virtual environments on our laptops and workstations. This blog post shows how I did just that on my MacBook Pro. Recently, I was re-installing a new RAC testbed on my laptop when I ran into a few issues that took me awhile to sort out. After I had everything up and running, I figured the proper course of action would be to take a good backup of my work. The video below shows how to backup RAC testbeds that use Virtual Box.

 

https://youtu.be/TK8kosRfup0

Aug 26

Can’t Compile GI 12.1.0.2 and Segmentation Fault

So I’ve been having a tough time of it lately on things I should know better. We’ve all been there before.

I’m trying to recreate a 2-node RAC testbed on my laptop. This will be Oracle 12.1.0.2 on Oracle Linux 6.5 with VirtualBox 5.0. Should be pretty simple, right? I even wrote instructions on how I did this back in April of this year. I’m following the same exact steps in my document on the same exact laptop but yet I still have problems. The only two things that have changed is I am going directly to 12.1.0.2 (my doc was 12.1.0.1 I believe) and I now have VB 5.0 on my laptop.

I have my 2 virtual nodes created and ready to go. I fire up the OUI to start my installation of Grid Infrastructure. When the OUI gets to the linking phase, I get an error message.

make_failure

 

 

 

 

 

 

Error in invoking target ‘irman ioracle’ of makefile. 

We’ve all been at a crossroads in life. You have a choice to make. Do I go left or do I go right? Unfortunately for me, I turned the wrong direction and wasted a few weeks of my spare time. At this point, I had a decision to make. Do I do exactly as the error said or do I rely on my experience? I blindly ignored the error and relied on my experience. Silly me.

Just two months ago I had the exact same error when I couldn’t get GI 12.1.0.2 to compile on a testbed. That testbed was running on VMWare ESX hosts. I filed a SR with Oracle Support and they let me know that my compile issue was because I did not have enough RAM devoted to each virtual machine. Being a virtual environment, this was easy enough to fix. I had my SysAdmin bump up the RAM and swap space on the virtual machines and GI 12.1.0.2 compiled exactly as promised. So I naturally assumed that I was running into the same issue here. On my laptop, I kept bumping up RAM. I expanded swap space. I even went so far as to rebuild the nodes from scratch. I spent the last two weeks down the road of experience and I found it to be bumpy, scratchy, dusty and very unpleasant.

The road I should have taken was to do explicitly as the pop-up box said…read the log file for more details. When I finally got past my stubbornness in thinking I knew the answer, I read the log file. I found the following messages towards the end.

 

INFO: - Linking Oracle
INFO: rm -f /u01/app/crs12.1.0.2/rdbms/lib/oracle
INFO: /usr/bin/ld: cannot find -ljavavm12
collect2: ld returned 1 exit status
INFO: make: *** [/u01/app/crs12.1.0.2/rdbms/lib/oracle] Error 1

 

Well its completely obvious now! A library file is missing. A quick check on the Internet easily led me to the solution. I’m not alone in this issue, but for some reason the OUI is not copying libjavavm12.a to $GRID_HOME/lib as it should be. With that pop-up box still sitting there, I issue the following on the node.

[oracle@host01 ~]$ export GRID_HOME=/u01/app/crs12.1.0.2
[oracle@host01 ~]$ cp $GRID_HOME/javavm/jdk/jdk7/lib/libjavavm12.a $GRID_HOME/lib

 

I then went back to the pop-up box and pressed the RETRY button. Sure enough, the damn thing compiled and the OUI finished up its work.

UPDATE: I had the same issue when installing the RDBMS 12.1.0.2 software on the cluster. I used the same exact fix to get oracle to compile correctly for the database software as well.

But I was not done. As anyone who has installed Grid Infrastructure knows, they need to run the $GRID_HOME/root.sh script on all nodes later on in the process. When I attempted this on the first node, I received a “segmentation fault” error. Turns out, there is a problem (and again, I’m not alone here) with perl in GI 12.1.0.2 installations. Even the following will receive a segmentation fault:

cd $GRID_HOME/perl/bin
./perl -v

The solution is to re-install Perl in $GRID_HOME. I found a blog entry by Laurent Leturgez which describes exactly how this is done. I never would have figured this out on my own. After re-installing Perl on all my nodes, the root.sh script ran just fine.

UPDATE: I had the same issue when installing the RDBMS 12.1.0.2 software on the cluster. I used the same exact fix to get perl to run without a segmentation fault.

Like all of us, I rely on my experience to save me lots of time. Things I learned a month or a year ago are applied today. Sometimes, experience gets in our way and takes us down a road we wished remained untravelled.

Aug 18

Video: Oracle 12c IDENTITY Column Performance on RAC

I’m trying to branch out a bit and craft a few videos. This one is my first attempt and I discuss the new 12c IDENTITY column performance, specifically for Oracle RAC databases.

Link to video on YouTube: https://youtu.be/aW_K_rEFkrI

Note: I don’t focus my energies on the planning side much. So these videos are not rehearsed. I will prepare a few scripts so that I don’t have to type as much, but otherwise, these videos will just be me being me.

Aug 13

ORA-27038 on 12.1.0.2 RAC Upgrade

I am upgrading a 2-node Oracle RAC testbed from Oracle 11.2.0.4 to Oracle 12.1.0.2. This is a simple testbed and the database is virtually empty. I’m using the DBUA to perform the upgrade. Should be simple, right?

After the database is upgraded, the DBUA is trying to move the RAC configuration to the new home. Early on, I get a “file exists error”. I check the $ORACLE_BASE/cfgtoollogs/dbua/upgrade1 directory and in the PostUpgrade.log file, I get this curious error:

[oracle@nau-rac01 upgrade4]$ cat PostUpgrade.log

create spfile=’/u01/app/oracle/product/11.2.0.4/dbs/spfileresp1.ora’ FROM pfile=’/u01/app/oracle/product/12.1.0.2/dbs/initresp.ora’

*

ERROR at line 1:

ORA-27038: created file already exists

Linux-x86_64 Error: 17: File exists

 

I’m not sure why 12.1.0.2 is trying to create a SPFILE in the old home for the upgrade. But for me, this is a simple fix. Remove file from this directory and re-run the upgrade. This isn’t an issue because this is an Oracle RAC database and the true SPFILE location is on shared disk, not in this unshared home, and is denoted in the OCR. Removing the file the DBUA was trying to create resolved my problem.

Jul 29

Memory Pressure Analysis Risk State

I have a Test database that is a 2-node RAC system. I am working towards the goal of getting the production database to Oracle 12.1.0.2 in about a month’s timeframe. This of course means that I have to get Grid Infrastructure upgraded prior to the db upgrade. I have upgraded GI on my standby cluster and on my Test database as well. The primary GI upgrade is scheduled for this evening.

Ever since I upgraded GI in Test a few weeks ago, I’ve been getting alerts from EM12c similar to the following:

Host=host01
Target type=Cluster
Target name=test-scan
Categories=Business
Message=Server is under elevated memory pressure and services on all instances on this server will be stopped
Severity=Warning
Event reported time=Jul 29, 2015 1:05:13 PM CDT
Operating System=Linux
Platform=x86_64
Event Type=Metric Alert
Event name=wlm_event:wlm_qosm_mpa_risk_state
Metric Group=QoS Events
Metric=Memory Pressure Analysis Risk State
Metric value=RED

Some of the alert details were removed for brevity.

So where is this coming from? Why does it mean to me?

This error is coming from Oracle’s Quality of Service (QoS) in Grid Infrastructure. It relies on Cluster Health Monitor (CHM) information. More specifically, this alert is coming from Memory Guard. For some information on Memory Guard, see this PDF, specifically the end of the second page.

Memory Guard is trying to save me from myself, and as we will see, it is doing a poor job of it. The idea is that when the server has memory pressure, Memory Guard will take any services on that node out-of-service. Allowing more connections would consume even more memory and could make the situation worse. New connection requests must go to another node in the cluster running that service. This is exactly what the Message value in the alert is telling me.

According to this EM 12c document, section 4.3.2, Memory Pressure Analysis Risk State, the alert text is supposed to contain the server name. Yet the message text above does not tell me which server is having the problem. Luckily for me, it’s only a 2-node RAC cluster, so I don’t have too many to examine.

When I do look at the CPU utilization, everything is fine. Swap usage is practically zero on both nodes. Free memory is more than 25% on both nodes. Curious…why the alert in the first place?

Everytime I get this alert, I can another email that says the condition is cleared up within a few minutes. So the issue is short lived. Yet the alerts keep coming.

It turns out, after some investigation, that Oracle made a change to Memory Guard in Grid Infrastructure 12.1.0.2. In earlier versions, Memory Guard only looked after policy-managed databases. In GI 12.1.0.2, Memory Guard started looking after admin-managed databases as well. And my RAC databases are typically admin-managed, which is one reason why I’m seeing this now.

To further add to the issue, apparently, GI 12.1.0.2 has known Bug 1582630 where the amount of free memory if calculated incorrectly. Note 1929994.1 lists a workaround and there is a patch as well. I applied the workaround and it resolved my problem. I’ll get the patch applied to Test before I proceed to production in the not-too-distant future.

Thankfully, I discovered this before my production GI upgrade later tonight. Otherwise I would have had upset end users that may have experienced issues connecting to the database. This is just one more example of why I have a good test platform with which to discover and resolve the issues before the change is made in production.

Jul 27

Remember RAC Instances in Perf Tools

I was using Lighty to examine performance of a new Oracle RAC database that went into service today. You can see something similar in Oracle Enterprise Manager 12c Cloud Control if you choose. Anyway, here is the performance graph of the RAC database in question.

Lighty_SatIO_01

 

 

 

 

It is common wisdom that if the chart is below the green line representing the CPU Limit, that we don’t have major contention. But this is an Oracle RAC database. The chart above is for all instances (this is a 2-node RAC db) as I selected both instances in Lighty.

Lighty_SatIO_02

 

 

 

But if I switch to one instance, I see a different story.

Lighty_SatIO_03

 

 

 

 

Notice that during the same time period, I have user I/O saturation.

The moral of the story is that when looking at Oracle RAC databases with performance tuning tools, you sometimes have to look at and individual instance. In my example, looking at the overall database (all instances), things looked fine but when I zoomed into a specific instance, the same performance tool painted a different picture.

Jul 27

The Data Guardian

This morning, I was reading this article about employees lack of understanding how critical it is to protect business data. The article got me to think about a few different points.

As a DBA, I’ve always tried to be my company’s Data Guardian. To me, the data inside the databases I manage is a valuable corporate asset. After all, the database exists for one reason, to provide efficient and timely access to the data. That data has value otherwise the company wouldn’t pay all the money it does to store it, to manage it, to make it available to end users. So in that vain, I’ve always strived to be the one that says we can’t just let people have unfettered access to the data. We must have proper security controls in place in our database. We must be vigilant in our security mechanism by making sure the database is patched regularly, is on current database versions, that we implement the concept of “least privilege”, etc.

Because being a DBA is my business and I’ve always been the Data Guardian, I’ve been involved in numerous conversations with people inside my company about protecting that valuable corporate asset, the data. Admittedly, most of these conversations are with people already in the company’s IT department. It is rare for me to have these types of conversations with people outside of IT. Yet as this article points out, 43% of respondants to a poll “admitted that they were either somewhat or completely ignorant about the value of corporate data.” I probably need to get out of the IT department once in awhile and have these same conversations with non-IT people.

Then the article discussed something that I hadn’t really considered before, at least not consciously, even is my role as the company’s data guardian. The article talks about how employees “were more worried about losing their personal data, than leaking out business data which could be far more damaging.”

Like many people in the 21st century, I too have been victim of my private, personal data being breached, data that was held by some other party. I was one of the individuals that was victim to the Target data breach in 2013. My bank quickly issued a new card since my card swiped at Target was at risk. Target even offered the victims some form of identity protection. More recently, I know that my private data has been breached in the US OPM systems hack. Me and 4.2 million other individuals. Since this data breach is more recent and does apply to me directly, I’ve been giving thought about data breaches in general and how they affect me.

But as this article points out, have you given thought to how much a data breach at our company could affect you personally? I certainly give thought to how much a data breach affects myself and my family when Target and OPM were hacked. But how does a data breach affect me and my family if the databases I’m in charge of at my job get hacked? The databases in my control contain very little of my Personally Identifiable Information (PII). But they do contain PII of our customers and clients, which I am very well aware of. If their PII gets out, how does it affect me?

Well certainly, if I am found at fault, my company could fire me. I could lose my job. I think that’s the most obvious affect of the data breach. But what else? I know of a data breach at a company that I used to work for in the past. That breach occurred well over a decade after I left that company so it had nothing to do with me. But I do personally know some of the individuals who were fired over the incident, people that I did work with and built relationships with. The company was sued over the data breach.  And some of my ex-coworkers were named in the lawsuit. Even if it is ultimately decided that these individuals are not at fault, they still have the nightmare of needing to defend themselves and their reputation. What other ways are there for me to be personally affected by my company’s data breaches?

Like many employees, I feel that I have a vested interest in how well the company performs. Many companies try to instill this mindset into their employees by offering stock options and bonuses tied to company performance. Can one metric of company performance can be the number of data breaches exposing valuable corporate data to hackers? If nothing else, our customers may lose confidence in our ability to keep their PII safe. Lost confidence means that our customers may quickly become some other company’s new customers.

So even if I’m only thinking about my personal, self-serving viewpoint, it is incumbent upon me to be more vigilant as the corporate Data Guardian. The more the company succeeds, the better it will be for me personally in the long run. After reading this article, I’m not sure that others in the company have the same values. Maybe I need to do more to spread the word. Most likely, I need to do more as the Data Guardian than sit in my office only talking to other IT staff about data security.

 

 

Jul 23

Lighty In A Nutshell

As you may know from a previous post on this blog, or if you follow me on Twitter (BPeaslandDBA), I like Lighty for Oracle, a product by Orachrome. I love this product and how easy it is to obtain performance diagnostic information. I’m very familiar with Oracle’s Enterprise Manager and I use EM12c all the time when I get calls about poor database performance. But I am now hoping that Lighty becomes another tool in my arsenal.

I don’t expect that Lighty will replace Enterprise Manager. EM12c does so much more than Lighty can, i.e. monitoring and alerting, and various database administration tasks. So I’ll be keeping EM12c thank you very much. But I will be relying on Lightly more for the one part of my job that Lighty does really well, provide information to aid in performance tuning efforts. Lighty is only for performance tuning and because it has a singular focus, it does the job very well.

I’ll try to show some highlights of why I think Lighty is a great tool to use. I’ll contrast with EM12c. For starters, in EM12c, I go to Performance –> Performance Home and I see a screen similar to the following.

Lighty01

Right away I can see that I have CPU resource contention. The main screen in Lighty looks like the following.

Lighty02

So we have very similar information. And that’s where the differences end for me…right at the start. In EM12c, I am forced to make a decision, do I drill down into the CPU utilization, or one of the wait classes like User I/O? Since the above shows CPU contention, in EM12c, I click on the CPU portion of the chart to drill down. At this point in EM12c, I am only considering CPU usage for my analysis. As we’ll see later, Lighty doesn’t require me to make that choice if I don’t want to. Below is my chart from EM12c.

Lighty03

The chart above is very common for me in my Oracle RAC database. The chart above is showing the CPU utilization for three instances of my clustered database. Which one is dominating the CPU utilization? Well they all look fairly equal. Yet EM12c forces me to choose one instance for analysis. I can’t examine all 3 instances in one screen. A lot of my work is with Oracle RAC performance tuning and EM12c’s limitations bother me here. I have to choose a wait class or CPU and then choose an instance before I can obtain information on the top SQL statements or top sessions. If I drill down far enough in EM12c, I can get a screen like the following.

Lighty04

I’ve finally gotten somewhere at long last! I can see the top SQL statements and the top sessions. Notice the highlighted part. This is detail for a 5 minute window. The five minute window is represented by the shaded box in the CPU utilization chart below for the instance.

Lighty05

In EM12c, I can drag the shaded box to a different time and the top SQL and top sessions information will change to match. But I can’t change this to a 10-minute interval or some other time period. I’m stuck at 5 minutes.

Now let’s contrast this to Lighty. We’ve already seen the performance chart just like EM12 has. Below that performance chart in Lighty is a detail pane on the top SQL statements. I didn’t have to click on anything to get the top SQL statements.

Lighty06

Right away, I see one difference between Lighty and EM12c. My top SQL statements are not solely restricted to CPU or a specific wait classIf I do want to restrict the information, I can choose from a menu box above the performance graph.

Lighty07

I typically have All selected, but I can choose to see just CPU if I choose. Lighty doesn’t limit me like EM12c does, unless I make the choice to subset the information.

Notice in the top SQL that I am presented a timeline of the events for that SQL. For example, we can see the top SQL statement in the list is using lots of CPU. Other SQL statements in the list start and stop their CPU usage, shown in light green. User I/O is simultaneously shown here in blue. If I expand a SQL statement, I can get a breakdown of that SQL statement.

Lighty08

Just by click on the Plus sign next to the SQL statement, I can see that this statement has two different execution plans and one of them is consuming 93.92% of the total CPU utilization. If I expand that plan, I can see where it is spending its time, broken down by wait event.

So let’s recap this. I started Lighty and was immediately presented with my top SQL statements and in a single click, I know that the top one has a CPU-consuming execution plan. That’s super quick and easy in my opinion. EM12c makes it harder to get to this information and EM12c will not show me when a SQL statement started consuming CPU and when it stopped like the graphs Lighty provides.

If I click on a SQL statement, much like I would in EM12c, Lighty will show me stats on the specific SQL. If I click on the Usage tab, I can see all of the sessions that ran this statement. Notice that the chart shows a timeline of when that session was consuming that resource for this SQL statement. EM12c won’t show me that level of detail.

Lighty09

With the above, I can easily see that it is multiple sessions executing the same statement at different times.

Remember that EM12c made me choose an instance for this Oracle RAC database? In Lighty, I don’t have to make that choice. But I can quite easily. Lighty is RAC-aware and detected the multiple instances. By default, I have all of the instances chosen. A simple pulldown menu lets me select one of them and the information on the screen automatically adjusts to just that instance.

Lighty10

Remember in EM12c where that shaded window represented a 5-minute period of time. In Lighty, the SQL and session details are for the entire graph. If you want a 5 minute window, you can choose the 5 minute option above the graph. By default, it has a 60 minute window chosen.

Lighty11

And I can choose other options as well, including a Custom date/time range. In EM12c, the graph is for 1 hour and I can’t change it.

EM12c has its ASH Analytics for letting me look at performance in the past. But I’ve always had problems with it. With ASH Analytics (which lets me view the Active Session History), I can change that gray window to be something other than 5 minutes. But for me, EM12c simply never returns the data. Maybe I have something wrong with my EM12c environment. But Lightly works out of the box with historical ASH data. Let’s say I got a call about a performance issue that occurred between 8:00 and 9:00 this morning. I simply choose a Custom time range.

Lighty12

In EM12c, I’m forced to go to a different web page, Performance –> ASH Analytics.

 

So far, you’ve probably latched onto the idea that I think that Lighty is a quick and easy way to obtain much of the same information I can get from EM12c. And you’d be right! Here is another area that I like about Lighty. If I go to Window –>Multiple Database Monitoring then I can see graphs of many different performance metrics for multiple databases. Here is an example.Lighty13At a glance, I can monitor multiple databases on one screen. EM12c won’t let me do that. I can see that the database on the far right is pretty idle while the database in the middle is getting hit pretty hard. Scrolling down the page, I get charts for many metrics:

  • Parses
  • User Calls
  • Transactions
  • Reads and Writes
  • Disk latency
  • Disk I/O throughput
  • Global Cache Transfer rates
  • Redo generation rates
  • Network Traffic

In EM12c, I would have to visit too many pages to obtain the same information that Lighty gives me in one page.

All of my examples are relying on the same Diagnostics and Tuning Packs that I have licensed. If you do not license those, Lighty provides similar functionality with Statspack and their version of L-ASH (Lighty ASH).

All in all, I think Lighty is a great tool for the DBA who is responsible for performance tuning. The cost is very reasonable as well. The more I use the tool, the more I come to appreciate how quickly I can get information. I can turn things on or off with no more than two clicks to be able to easily drill down to the root cause of my performance issues.I encourage all Oracle DBA’s to download the trial and see if this product is right for them. Lighty can be found here: http://www.orachrome.com/en/index.html

Jul 16

Lighty for Oracle

Not that long ago, I was pointed in the direction of Lighty for Oracle by orachrome. This is a wonderful performance tuning tool. I’m still using the 30 day trial and I hope to get my company to purchase it soon. As soon as I get time, I’ll try to post something which shows why I think this is a very cool tool!

Jul 16

Web Site Down and Twitter Feed

My web site was down for the last 4 days. My apologies. I had to spend a lot of time going back and forth between my ISP and my domain registrar. But it seems to be sorted out now.

So the good news, depending on your perspective, is that once my site came back up, I felt the need to spend time with it. I decided to do something that I’ve been meaning to get accomplished for quite some time. I added a widget to the right which shows my last 5 tweets. That widget probably only gets a smile out of me and not anyone who reads the blog. :)

Older posts «