Nov 28

Oracle Ace Changes

The Oracle Ace Program is changing the way members have to report their activities. Oracle Aces (which I’m using to generically refer to Aces Associates, Aces, and Ace Directors) have long known that they need to keep contributing to the Oracle community at large. Aces regularly have to let the program know of their outreach activities. How we notify the program is changing and with change comes the inevitable complaints that always proceeds any change. I waited until I reported my activities and read a few comments from other Aces before writing this blog post. These opinions are my own and not all Aces may agree with me.

When I first started hearing some of the grumblings, I was a bit concerned. There are some well known Aces that have decided they have had enough with the program. Its never a positive to see good people leave. They don’t like the new method of reporting activities, or they don’t like that they have to produce more content than they are currently able to do in order to generate enough points (more on that in a bit). For those who do not report their activities or do not contribute enough, they will lose the current Ace standing. Some of these Aces say they will still contribute to the community just like they always have, but they do not see a future in the Ace program for themselves. To me, it is a bit bothersome when an individual you know of and personifies “Oracle Ace” no longer wants to be part of the program simply because the reporting requirements have become too much work, and for no other reason.

That being said, the new reporting mechanism has made things much more transparent. In the past, there were loose guidelines as to what was needed to become and stay and Ace. Now, our contributions are given a point value and you need to have a certain number of points to maintain your standing. Write a blog post, and its X points. Give a presentation at a conference, and its Y points. Add them all up and make sure your over the threshold for your Ace level. Transparency is normally a good thing and the new changes have provided just that. My feeling is that this level of transparency is a good thing and will only help the program going forward.

The program has maintained since they announced these changes that the first go-round will be used to potentially adjust the points for certain activities. It looks to me like points do need to be adjusted in a number of categories. For example, to maintain Ace status (Ace and Ace Director), one would need a minimum of 10 presentations at conferences. Getting an abstract accepted is challenging with the level of competition for the few speaking slots available. Even if you could present two papers at a conference,  you’d need to speak at five different conferences over the course of a year. That’s a lot of travelling and a lot of work to present 10 different papers. I’d like to see the points in this category increased. Another one that could use a bump is book authorship. If you wrote one book, you would not maintain your Ace status. Writing a book requires a VERY LARGE commitment! It can take up to a year writing that book and you may not be able to do much else.

One does not have to produce content on one media type. You can do a combination like write blog posts, author a book or chapters in a book, be a technical editor of a book, create a podcast, write articles, present at a conference, and participate in the Oracle Communities. And there are non-content activities as well such as volunteering for a user group, organizing a meetup event, tweeting or being a cloud trial participant.

I think that overall, these changes are good and I do welcome them. But as with any change, there is going to be bumps in the road.

Oct 19

Slash or No Slash?

That is the question.


A recent post on the OTN forums asked about using semicolons and slashes as statement terminators. I dusted off an article I wrote for our development team over 4 years ago on this topic. This article received good reviews and is available on the OTN forums if desired. I thought I would also post it on my blog. Here is the article:


Slash or No Slash

by Brian Peasland

At our company, SQL scripts that are deployed are executed in Oracle’s SQL*Plus command-line utility, whereas many developers use a GUI tool like PL/SQL Developer or SQL Developer. The forward slash means something to SQL*Plus that is not needed in PL/SQL Developer or SQL Developer. As such, it can be confusing to know if you need to include a forward slash in your SQL scripts or not. Hopefully, this section will shed some light on what the forward slash does, when to use it, and when to not use it. Semi-colon Terminator For most SQL statements, the semi-colon is the statement terminator. For example, consider this simple SQL statement run in SQL*Plus:

SQL> select sysdate from dual;




When SQL*Plus sees the semi-colon, it knows the end of the SQL statement has been reached and it can now execute the command.

SQL*Plus Buffer

You may not be aware that SQL*Plus has a buffer for its commands. If I press the ‘l’ key for ‘list’, then I can see the command currently in my session’s buffer.

SQL> l

1* select sysdate from dual

Not surprisingly, there is the command I just executed. I then executed another SQL statement and here is how my buffer now looks:

SQL> l

1 select sysdate,user

2* from dual

As you can see, I now have two lines in my session’s SQL*Plus buffer.

Slash = Execute Buffer

The first rule to understand about the forward slash is that to SQL*Plus, the forward slash means to execute the contents of the buffer. To illustrate this concept, I will execute a SQL statement, wait a few seconds, and then just execute that same SQL statement again but just executing the buffer.

SQL> select to_char(sysdate,’MM/DD/YYYY HH24:MI:SS’) from dual;



06/18/2012 15:20:40

SQL> /


06/18/2012 15:21:17

SQL> /



06/18/2012 15:21:50

You can see that all I did the second and third time was to just type ‘/’ and hit enter and SQL*Plus executed the contents of its command buffer each time.

PL/SQL Blocks

The semi-colon statement terminator worked just fine by itself until Oracle introduced PL/SQL in Oracle version 7. The problem is that PL/SQL blocks can have multiple semi-colons to terminate the individual statements that make up that block. Consider this very simple PL/SQL block that does nothing:

SQL> begin

2 null;

3 null;

4 end;


Lines 2 and 3 contain perfectly valid statements that are each terminated with the semi-colon. And in line 4, we have the END keyword signifying the end of the PL/SQL block. If we were not allowed nested BEGIN/END pairs, then every time SQL*Plus sees “END;” it would know the end of the PL/SQL block has been reached, but we are allowed nested BEGIN/END pairs so the following is perfectly legal and valid:

SQL> begin

2 begin

3 null;

4 end;

5 null;

6 end;


You can tell from the above that just looking for “END;” is not enough because SQL*Plus would have tried to run the block after line 4. So how did Oracle decide to signify the that the PL/SQL block was ready to execute? The answer is by using the forward slash as you may already be aware. The second rule to understand is that all the forward slash is doing when you use it to end a PL/SQL block is to tell SQL*Plus to run what is in the buffer! This has not changed since before PL/SQL was created for Oracle 7. Consider the following sample:

SQL> begin

2 null;

3 end;

4 /

PL/SQL procedure successfully completed.

SQL> l

1 begin
2 null;

3* end;

On line 4, I typed the forward slash to execute the PL/SQL block. You can see that my block successfully completed. If we go back and look at the contents of my command buffer, you can see it contains everything but the forward slash. The forward slash is not part of the command buffer. So now, I’ll run a different PL/SQL block:

SQL> begin

2 dbms_output.put_line(‘Today is ‘||to_char(sysdate,’MM/DD/YYYY HH24:MI:SS’));

3 end;

4 /

Today is 06/18/2012 15:39:32

PL/SQL procedure successfully completed.

The forward slash said to SQL*Plus to run what is in its buffer, and the results are displayed. Now let’s type just the slash again and we should see our PL/SQL block get executed again.

SQL> /

Today is 06/18/2012 15:40:42

PL/SQL procedure successfully completed.

I did not have to type my PL/SQL block anew as it is currently in the command buffer.

PL/SQL and SQL Developer and PL/SQL Blocks

The biggest problem for most developers is that PL/SQL Developer and SQL Developer do not require you to use the forward slash. Why? Because you can hit Execute (F8) or Run Script (F5) to run your PL/SQL block. PL/SQL Developer knows that the moment you hit F8, you are intending to submit the PL/SQL block to be executed. In this case, F8 in PL/SQL Developer is doing the same job as the forward slash in SQL*Plus. Similarly, for F5 in SQL Developer.

The problem at my company is our team deploying code to production does not deploy code with PL/SQL Developer or SQL Developer. They use SQL*Plus because scripting multiple executions is easier with a command line tool. Many developers make the mistake of not including the forward slash for PL/SQL blocks in scripts because they do not need it, but if you want to deploy that code section in a SQL script, the forward slash is required at the end of each PL/SQL block.

When Not To Use Slash

So we’ve seen when and why we use the forward slash, but when is it bad to use it? The third rule to know is that it is bad to use the forward slash following a single SQL statement (not in a PL/SQL block), especially when that slash immediately follows a DML statement (INSERT, UPDATE, or DELETE). If my script contains the following:

select sysdate from dual;


Then I will get “double output” which is not what I normally intend to do in a script. I really only want one line returned, not two as the above script would do:

SQL> select sysdate from dual;



SQL> /




It is even worse when I use the forward slash following a DML statement because that statement will get executed twice. Consider the following script:

insert into test_tab values (10);


We now know that when I execute the two lines above in a script, SQL*Plus will execute it once due to the semi-colon statement terminator and then execute a second time because the forward slash tells SQL*Plus to run what is in the command buffer. When I execute the two-line script above, I get the following output:

SQL> insert into test_tab values (10);

1 row created.



insert into test_tab values (10) *

ERROR at line 1: ORA-00001: unique constraint (PEASLAND.SYS_C00767176) violated

Oops! The first insert worked (1 row created.) but when the forward slash was entered, SQL*Plus tried to insert the same data and I got caught on a unique constraint violation.


Hopefully, this page shows why the forward slash is needed, what is does, and when not to use it. To recap:

  • Include the forward slash at the end of each PL/SQL block
  • Do not include the forward slash after any SQL statements not in a PL/SQL block.
  • The forward slash after a single SQL statement will cause that SQL command to execute twice.

Sep 29

SQL Server Clustering from an Oracle RAC Perspective

Its no secret that I know Oracle’s database clustering solution pretty well. Recently, I completed a SQL Server clustering, high availability solution that took two years from initial design to final implementation. That process involved documenting requirements, determining the options, mapping requirements to implementation details, budgeting, procurement, installation, configuration, and testing.

Now that my project is complete, I thought I would give a few items about SQL Server’s clustering from the perspective of an Oracle RAC guy. We all know that SQL Server and Oracle are both RDBMS engines and they may have some things in common. But they are also completely different creatures too. So if you’re comfortable with Oracle’s Grid Infrastructure and RAC and Data Guard, and are looking at implementing a SQL Server HA solution, maybe this will provide some good information for you.

Our current production system is a 4-node Oracle RAC primary database. This provides high availability (and high performance) within our primary data center. We use Data Guard to transport redo to a 3-node RAC physical standby database. Even though SQL Server <> Oracle, I wanted to keep our configuration as similar as possible to ease administration. So we deployed a 2-node SQL Server Failover Cluster at our primary site and a 1-node “standby” database at our DR site.

Now on to my observations, in no particular order.

  • SQL Server’s HA clustering solution is Active/Passive. Oracle’s is Active/Active which to me is “better”, and yes…that’s a subjective term. For our Active/Passive implementation, I did not like the idea of two physical servers sitting there with one essentially idle all the time. So we have one physical server which is the ‘preferred’ node and one virtual server. If the physical server fails, clustering will automatically failover the SQL Server instance to the virtual server and we’re operational again. This Active/Passive cluster does nothign to address scalability like Oracle RAC does, but it does give me higher availability in our primary environment.
  • Implementing the clustering is super easy. Turn on clustering at the OS level. Because this is an entirely Microsoft stack, they built clustering into the OS. Its already there for you. You just need to turn it on. Then fire up Administrative Tools –> Failover Cluster Manager and wizards walk you through the setup. Its much easier than installing Grid Infrastructure. But Oracle does have to contend with different OS platforms which makes it harder there. It will be interesting to see how SQL Server 2016 on Linux handles Failover Clustering.
  • Oracle uses a Shared Disk model whereas SQL Server is Shared Nothing. But you do need to use “shared disk” in a way because the disk needs to be available on both nodes. However, MS Failover Clustering (MSFC) mounts the clustered disk on the active node. When SQL Server is moved to the other node, either automatically or manually, MSFC will unmount the disk on one node then mount it on the other. Its kinda strange to have a Windows Explorer window open and see the disk either appear or disappear during this transition.
  • Grid Infrastructure uses the Voting Disk for quorum operations. In MSFC, you can have a Quorum disk, use a file share, or configure with no quorum. If you go with the latter, you hamper your automatic failover capability.
  • I’m used to my primary having its own cluster and the standby its own cluster. With SQL Server, the primary nodes and the standby nodes need to be part of the same cluster. Thankfully, the cluster can cross subnets  which is different than Oracle GI. Adding the standby node was super easy, we just removed its voting rights and we did not configure the quorum disk for the standby node. This was fine with us as we want failover to the standby to be a manual operation.
  • For a standby database, you can use Database Mirroring, Log Shipping or AlwaysOn Availability Groups (AGs). The first two are on their way out so I went with the AGs. AGs require the standby node to be part of the same cluster as the primary. There’s a wizard to walk you through setting up the databases to participate in the AG. This is much easier than setting up an Oracle physical standby.
  • For those of you who hate the Oracle documentation, its time to be thankful. Many times during this process I found the MS documentation to be missing very big pieces. For example, I never did find out how to configure my standby node to have no voting rights. Luckily we were able to click our way through it.


When it was all said and done, getting the SQL Server solution implemented was not that tough. Sometimes I had to rely on my knowledge of clustering. Other times, Microsoft’s terminology got in the way. For example, the rest of the world calls it “split brain” but MS calls it “split cluster”. Sometimes getting over the lexicon differences was the biggest hurdle.

Sep 27

Agent Decomission in EM13c

I recently lost a clustered Oracle RAC system and had to blow away GRID_HOME and RDMBS_HOME. This meant my cluster-aware install of the Enterprise Manager 13c agents was lost as well. So I simply removed AGENT_HOME. I then removed the OraInventory and proceeded to start from scratch. Once I had Grid Infrastructure up and running and my Oracle RAC databases operational, I had to back to Enterprise Manager and get everything right again. EM13c was reporting the agents were unreachable, which I knew. I now need to get this production system back into EM13c. The first part is to remove the agent and all of its targets. That is pretty simple in EM13c. I just went to Targets–>All Targets and selected one of the agents. In the Agent’s home screen, I clicked on Agent –> Target Setup –> Agent Decomission.


It then prompted me to ensure the agent was down, which of course it is. I pressed Continue. The next screen shows me the targets monitored by that agents. I pressed Continue to confirm. Another confirmation asking me if I really want to do this and I pressed OK. After a few minutes, the agent and its targets are removed. Doing it this way is easier than removing the monitored targets first, then removing the agent.

Once all of the agents were removed from EM13c, I went to Targets –> All Targets to confirm I had no other targets missed by this process. I was then able to re-add the agents the usual way.

Sep 09

Archiver Hung due to COMPATIBLE ORA-16484

This morning I woke up to a few alerts from EM about my archiver being hung, similar to the following:

Target type=Database Instance 
Target name=orcl4 
Message=The archiver hung at time/line number: Fri Sep 09 06:07:22 2016/376. 

I used the DG Broker to stop and then restart log transport.

edit database orcl set state=transport-off;
edit database orcl set state=transport-on;

But the archiver would still get hung. So its off to the alert log to get more clues. I found this in the primary’s alert log:

TT00: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16484)
TT00: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Fri Sep 09 08:07:40 2016
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl4/trace/orcl4_tt00_16068.trc:
ORA-16484: compatibility setting is too low

The error message seems self explanatory. I have COMPATIBLE set too low. At this point, I remembered that I changed COMPATIBLE in the primary a month ago. I must have forgotten to also change this in the standby. A quick verfication proved my hypothesis. COMPATIBLE is set to in the primary but 11.2.0 in the standby. So there’s my problem. I changed COMPATIBLE in the standby, bounced it and then resumed log transport. Life was fine and everything was fixed.

If you remember right, I said that I changed COMPATIBLE in the primary a month ago. Why was this a problem today and not back then? To know that, you’d have to know the change history for this database. Last night, we released new code to production. Part of the code release was to include a new table that used Oracle 12c’s new IDENTITY column feature. This was the first 12c-only feature we deployed in our code base. The standby was attempting to create the table with the new feature but that operation could not complete because of the improper parameter setting. I’m still a little confused how this affected log transport. I would have expected only log apply to be broken, but this is how it manifested itself.


Sep 01

N+1 Redundancy and Server Consolidation

In a previous blog post, I talked about designing your RAC implementations with N+1 redundancy. This ensures the loss of a node does not overwhelm the remaining nodes should one node fail. Today, I’m going to discuss the impact of server consolidation, specifically CPU cores, and its potential N+1 impacts.

One of the main systems I work on today is a four node Oracle RAC cluster. Each nodes has 4 cores. I am licensed for all 16 cores (4 cores/node * 4 nodes = 16 total cores) in my environment. I designed this system with N+1 redundancy. If I lose a node, and lose 4 cores, I’m still good because my workload demands only require 12 cores to maintain normal, acceptable levels of performance.

When this system was initially designed, 4-core servers were commonplace. Today’s environment is different and it is getting harder to find 4-core servers. They still exist, but hardware vendors are pushing systems with greater core counts.

In a conversation with my SysAdmin recently, he wanted to order 6-core systems to replace our 3-year old servers. Well we can’t simply do that. My Oracle license is for 16 cores. I could deploy three 6-core systems, but then I’d have a total of 18 cores in the cluster and I’d need to be 2 more cores worth of Oracle licenses. If I deployed two 6-core systems, I’d have a total of 12 cores and have 4 cores worth of licenses sitting unused.

I also informed the SysAdmin about our N+1 design. The impact of moving to 6-core systems can have major impacts to the N+1 design. Remember that I said earlier that our workload demands require 12 cores to maintain normal operational levels. If we deploy 6-core machines, then two of them meet our requirements and another node, the “+1”, would be needed to ensure we can lose a node without major impacts to performance. We’d need to deploy three 6-core machines to keep our N+1 design goal intact. But this means we need to increase our license counts as I stated earlier.

At this point, my SysAdmin thought he had a good idea…we could buy two 8-core servers. That’s still 16 total cores and exactly what we are licensed for today. No increase in license fees. But if we lose a node, we go down to 8 total cores in operation which is less than I need. This brings up a very good point…

…Increasing the core count on your servers can have detrimental impacts on your N+1 redundancy.

Right now, there is no easy answer. We can still buy 4-core servers so that’s what we’ll replace the current ones with next year. But there will come a day when 4-core servers are impossible to find. We’ll need to have a plan in place by that time, keeping in mind our N+1 design goals.

If only I could hard-partition our Linux servers and let cores sit idle and still be in compliance with our Oracle license agreements.

Aug 22

opatch prereq

I was recently applying the July PSU to one of my RAC databases and I kept getting an error from OPatch that the prereq checkSystemSpace failed. It should be no surprise to anyone out there that OPatch goes through a number of pre-requisite checks before applying the patch. OPatch is trying to ensure a good chance of success and not bomb out mid-way through the patch process. 

Now checkSystemSpace is just one of the checks OPatch performs. And from its title, I can easily discern that I am short on space, probably disk space. I know this is on the disk mount where ORACLE_HOME resides as that is the only disk space I am modifying when applying the patch. But what OPatch was not telling me on the screen is how much space was needed. The OPatch log file did not tell me either. I had about 8GB of free space on that drive and the PSU was nowhere near that size. I did clean up some old files but OPatch still failed on this check.

To find out how much disk space was needed, I manually ran OPatch for this specific prerequisite check as follows:

opatch prereq checkSystemSpace -ph .

I am running a prerequisite check. The second parameter to OPatch tells it which check to run. The -ph directive tells OPatch which patch home directory to use and in my case, it is the current directory (denoted by the period). The output was similar to the following trimmed for brevity:

    Space Needed : 8780.128MB
    Space Usable : 8347.293MB
    Required amount of space(8780.128MB) is not available.
    Prereq "checkSystemSpace" failed.

By running the prereq check manually, I was able to obtain the information I needed. OPatch is looking for about 8.7GB of free space and I have 8.3GB available and I’m about 430MB short. So now I have a good idea how much space I need to clean up before this check will pass correctly.

OPatch has a number of other checks with it. To find out all of them, issue the following:

opatch prereq -h

This will not only give the names of the checks, but a short description of each. For example, you might be notified that the check  CheckApplicable has failed. What does that mean? The one paragraph description is:

              Check for the presence of the required components in
              the ORACLE_HOME and check if all the actions of the 
              given patch(es) are applicable.

So this check is ensuring that the Oracle home directory has the components needed to succeed. For example, you can’t apply a patch to fix a partitioning problem if the Partitioning option is not installed. Use OPatch to see the rest of the rereq checks OPatch is doing. Most of them go on in the background without notice so some of these might surprise you.

Next time you have a prereq check fail, try to run it manually to see if you have more information than what was initially provided when you tried to apply the patch.

Jul 21

July 2016 PSU fails to make isqora

When applying the latest PSU, I recieved the following errors from my “opatch apply” session:


Patching component oracle.odbc.ic,
Make failed to invoke "/usr/bin/make -f isqora 
   ORACLE_HOME=/u01/app/oracle/product/"....'/usr/bin/ld: cannot find -lodbcinst
collect2: error: ld returned 1 exit status
make: *** [/u01/app/oracle/product/] Error 1

The following make actions have failed :

Re-link fails on target "isqora".
Composite patch 23054246 successfully applied.
OPatch Session completed with warnings.
Log file location: /u01/app/oracle/product/
OPatch completed with warnings.



The patch was applied successfully, but the relinking did not work correctly. To fix this, I did the following:

cp $ORACLE_HOME/lib/ $ORACLE_HOME/odbc/lib/.
relink all

That’s all there was to it.

I did the copy and relink steps because I was trying to fix the error from OPatch. A better way to handle this is to do the copy first, then run ‘opatch apply’ and you won’t get any errors at all.


I see that Bug 24332805 was posted for this issue, but I’m not privileged enough to see the contents of that bug.

Jul 13

Fog Computing

Unless you’ve been living under a rock, you know that everything is about the Cloud now. At least that’s the hype vendors and the media want us to believe. Earlier this week, I was clued into a new term. Just as people are working towards Cloud Computing comes the newer Fog Computing.

So what is Fog Computing? Basically, its cloud computing that has been pushed to the data sources. In Cloud Computing, your devices and applications push the data into the proverbial Cloud. The cloud is a centralized location. The problem Fog Computing aims to address is with all of this data being pushed into the cloud, that centralized cloud compute power may be inundated with data that it has to process. Latency may be too great. Fog computing pushes processing back to the data source. Data is processed first and then what is hopefully a smaller amount of data is sent up to the cloud. Fog computing is very important when talking about the Internet of Things. Instead of many, many devices sending all that data back to one spot to be processed, only the processed data is forwarded.

When I first read about this concept, my immediate thought was how the more things change, the more they stay the same in our cyclical nature of IT. I’m old enough that in my early days in the IT field, I worked on mainframe systems. I remember this fancy new computing model that was going to take the world by storm called “Client/Server”. Mainframes couldn’t keep up with the ever-increasing demands of the day’s applications. So instead of processing all of the data in a central location, we’ll move the processing to the client side. Scalability is easier because we handle a larger workload by deploying more clients. The Internet and web computing changed things a bit because the browser became a thin client to centralized web/application servers, but we still had scalability by deploying multiple web servers behind a load balancer to make them appear as a singular unit. Cloud computing was going to be the newest version of sliced bread. Just push your processing to the cloud and you’ll sleep much better at night. But now that some in the industry are seeing some of the pitfalls to this approach, we’re pushing processing back to the edges again.

One day, you’ll hear the media and vendors espousing the benefits of fog computing and how those very vendors are there to help you handle your transition to the new promised land. And then along will come another way to do business. It goes in cycles. Mainframe (centralized) to Client/Server (distributed) to Cloud (centralized) to Fog (distributed) to ??????

On another note, I had an editor for some database-related outlet ask me to write some articles about databases in the cloud. The editor said that “cloud is THE hot topic” as we all know. Well I can only write so much about the cloud because I don’t have a ton of experience with it. My company has our internal private cloud implementation, but to be honest, I have to take a lot of liberties with the term “cloud” to call it a private cloud. When I told the editor I wasn’t sure how much I could contribute, they told me they were not finding enough experts to talk about the cloud because  “most people I talk to haven’t moved to the cloud.”  This begs the question, is it just the media and vendors espousing the cloud?

Jun 07

Oracle RAC VIP and ARP Primer

I’ve been running across a number of questions about Virtual IP (VIP) addresses for Oracle RAC lately. I hope this blog post can help shed some light on what a VIP is, how they work, and why Oracle RAC leverages them. Before I go further, I should explain that I am not a Network specialist. In fact, computer networking is probably my weakest point of everything that goes on in IT shops. So do not flame me if I do not get the networking stuff entirely, 100% accurate.  I will explain this in terms that have served me well in my DBA career, especially working with Oracle RAC.

Most people are familiar with connecting any application, SQL*Plus or others, to a single-instance database server. In the end, their connection request is sent to a specific IP address. In our diagram below, the end user wants to connect to to access the database. The network request gets routed to the network switch that that database server is connected to. This switch passes the request to the server that has the requested IP address.



Most Oracle DBAs do not have a problem understanding this concept. Life does get a little more complicated when RAC is deployed as there are multiple machines (nodes) in the cluster.  In the next diagram, we have a two-node RAC cluster, each node having a different IP address.


The end user doesn’t care which node his session is connected to. The end user just wants access to the cluster. Either node will suffice. The end user’s TNSNAMES.ORA configuration may say to try first and if that doesn’t work, try instead. In this way, Oracle RAC is providing a High Availability solution.

Now we come to the entire reason for Virtual IP addresses to be used. What if the end user is trying to access the first node ( but it is unavailable? The node is down for some reason. The end user could easily connect to the node. However, due to the way TCP/IP networks work, it can take up to ten minutes for the network connection to to timeout before will be accessed. The lengthy TCP/IP timeout wait is the sole reason for Oracle RAC to leverage VIPs. We simply want to reduce the time to access another node in the cluster should our requested node be unavailable.

A traditional IP is usually bound to the network card on the server. The IT admin will configure the server to always use that specific IP address and no other devices on the network will use the same IP.  Note: I’m trying to make this simple here and avoiding DHCP and lease registration for those that are familiar with the topics.

A virtual IP address is not bound to the network card. It isn’t even defined in the OS. The VIP is not a real IP address similar to the way a Virtual Machine is not a real system. It just appears to be real to those using it.  So let’s look at our two node cluster, but this time with VIP’s defined for them.


Our servers still have their regular IP addresses, and for NODE1 and NODE2 respectively. These two nodes also have VIPs associated with them. NODE1-VIP and NODE2-VIP are denoted as IP addresses and respectively. Each node in the RAC cluster has its regular IP address and a VIP. It may also be beneficial to know that the host name and the VIP names are often defined in DNS so that we do not have to remember the IP addresses specifically.

Notice that the end user is now requesting to access one of the VIPs. The only people who should be using these traditional IP addresses are IT administrators who need to perform work on the server. End users and any and all applications should connect with the VIP.

Remember that I said earlier that the VIP isn’t even defined in the OS? Well if that’s the case, then how does everything know that the VIP is assigned to that node? This is all handled by Grid Infrastructure (GI). When GI is installed, one of the Oracle Universal Installer (OUI) screens will ask for the names of the nodes in the cluster (the host names) along with the virtual hostname. The screen shot below shows how the 11g GI installation looked when asking for that information (screen shot from Oracle documentation).


The public hostname is configured in the OS by the administrator. The Virtual IP is not configured in the OS but Grid Infrastructure knows about it. To understand how this works, we need to digress a bit and understand Address Resolution Protocol (ARP).

When a server is started up and its networking components are initiated, Address Resolution Protocol is the mechanism that tells the switch in front of the server to route all traffic for its IP address to the MAC address of its network card. The OS, through ARP, tells the switch to go to NODE1 for requests.

When Grid Infrastructure starts, one of its startup tasks is to do something similar. GI, through ARP, tells the switch to go to NODE1 for all NODE1-VIP ( requests. Until GI starts the VIP that VIP address is un-routable.

Now here’s the magic part…when NODE1 goes down, GI on another node in the cluster will detect the outage. GI will then perform a new ARP operation that informs the switch to route the VIP to another node in the cluster. Because the VIP is virtual, it can be re-routed on the fly. In the diagram below, NODE1 has failed. Its traditional IP is no longer available as well. GI has re-ARPed the VIP to the remaining node in the cluster.


The re-ARPing of the VIP can be accomplished in seconds. The end user may experience a brief pause in their network communication between the application and the database instance, but this is much, much less than if we waited for TCP/IP timeouts.

Oracle 11gR2 introduced the SCAN Listeners. An Oracle RAC cluster can have at most three SCAN Listeners. The SCAN name is still in DNS but DNS will round-robin the SCAN name resolution to one of up to three different IP addresses.

In the diagram below, our two-node cluster now has two SCAN listeners. The end user makes a connection request to and DNS resolves the name to either or


As is shown above, those two SCAN VIPs are assigned to different nodes in the cluster. If NODE1 goes down, Grid Infrastructure will relocated both NODE1-VIP and MY-SCAN ( to a surviving node in the cluster, through the same re-ARP operation we talked about earlier. The newer SCAN listeners and their VIPs are handled the same way as the old-style VIPs.

To recap, Virtual IP addresses are used to provide quicker failover of network communications between the application and the nodes in the cluster. The OS uses Address Resolution Protocol to let the network switch know to route connections to host. Grid Infrastructure users the same ARP operations to let the network switch know where to route traffic for the VIP and the SCAN VIP. Should a node go down, GI will re-ARP the VIP and SCAN VIP to another node in the cluster.