SQL Server Clustering from an Oracle RAC Perspective

Its no secret that I know Oracle’s database clustering solution pretty well. Recently, I completed a SQL Server clustering, high availability solution that took two years from initial design to final implementation. That process involved documenting requirements, determining the options, mapping requirements to implementation details, budgeting, procurement, installation, configuration, and testing.

Now that my project is complete, I thought I would give a few items about SQL Server’s clustering from the perspective of an Oracle RAC guy. We all know that SQL Server and Oracle are both RDBMS engines and they may have some things in common. But they are also completely different creatures too. So if you’re comfortable with Oracle’s Grid Infrastructure and RAC and Data Guard, and are looking at implementing a SQL Server HA solution, maybe this will provide some good information for you.

Our current production system is a 4-node Oracle RAC primary database. This provides high availability (and high performance) within our primary data center. We use Data Guard to transport redo to a 3-node RAC physical standby database. Even though SQL Server <> Oracle, I wanted to keep our configuration as similar as possible to ease administration. So we deployed a 2-node SQL Server Failover Cluster at our primary site and a 1-node “standby” database at our DR site.

Now on to my observations, in no particular order.

  • SQL Server’s HA clustering solution is Active/Passive. Oracle’s is Active/Active which to me is “better”, and yes…that’s a subjective term. For our Active/Passive implementation, I did not like the idea of two physical servers sitting there with one essentially idle all the time. So we have one physical server which is the ‘preferred’ node and one virtual server. If the physical server fails, clustering will automatically failover the SQL Server instance to the virtual server and we’re operational again. This Active/Passive cluster does nothign to address scalability like Oracle RAC does, but it does give me higher availability in our primary environment.
  • Implementing the clustering is super easy. Turn on clustering at the OS level. Because this is an entirely Microsoft stack, they built clustering into the OS. Its already there for you. You just need to turn it on. Then fire up Administrative Tools –> Failover Cluster Manager and wizards walk you through the setup. Its much easier than installing Grid Infrastructure. But Oracle does have to contend with different OS platforms which makes it harder there. It will be interesting to see how SQL Server 2016 on Linux handles Failover Clustering.
  • Oracle uses a Shared Disk model whereas SQL Server is Shared Nothing. But you do need to use “shared disk” in a way because the disk needs to be available on both nodes. However, MS Failover Clustering (MSFC) mounts the clustered disk on the active node. When SQL Server is moved to the other node, either automatically or manually, MSFC will unmount the disk on one node then mount it on the other. Its kinda strange to have a Windows Explorer window open and see the disk either appear or disappear during this transition.
  • Grid Infrastructure uses the Voting Disk for quorum operations. In MSFC, you can have a Quorum disk, use a file share, or configure with no quorum. If you go with the latter, you hamper your automatic failover capability.
  • I’m used to my primary having its own cluster and the standby its own cluster. With SQL Server, the primary nodes and the standby nodes need to be part of the same cluster. Thankfully, the cluster can cross subnets  which is different than Oracle GI. Adding the standby node was super easy, we just removed its voting rights and we did not configure the quorum disk for the standby node. This was fine with us as we want failover to the standby to be a manual operation.
  • For a standby database, you can use Database Mirroring, Log Shipping or AlwaysOn Availability Groups (AGs). The first two are on their way out so I went with the AGs. AGs require the standby node to be part of the same cluster as the primary. There’s a wizard to walk you through setting up the databases to participate in the AG. This is much easier than setting up an Oracle physical standby.
  • For those of you who hate the Oracle documentation, its time to be thankful. Many times during this process I found the MS documentation to be missing very big pieces. For example, I never did find out how to configure my standby node to have no voting rights. Luckily we were able to click our way through it.

In response, FDA and regulators globally have placed improvement online levitra click that in supply chain integrity and product quality as an urgent priority. The main part of the examination is to discover the explanation for onset of the ache and how it has carried on from that point forward. generic uk viagra Some even attempt various discount viagra home remedies and techniques which might help. The analysis was done by Dr. discount levitra http://davidfraymusic.com/project/watch-david-fray-and-jacque-roviers-qobuz-interview/
 

When it was all said and done, getting the SQL Server solution implemented was not that tough. Sometimes I had to rely on my knowledge of clustering. Other times, Microsoft’s terminology got in the way. For example, the rest of the world calls it “split brain” but MS calls it “split cluster”. Sometimes getting over the lexicon differences was the biggest hurdle.

Agent Decomission in EM13c

I recently lost a clustered Oracle RAC system and had to blow away GRID_HOME and RDMBS_HOME. This meant my cluster-aware install of the Enterprise Manager 13c agents was lost as well. So I simply removed AGENT_HOME. I then removed the OraInventory and proceeded to start from scratch. Once I had Grid Infrastructure up and running and my Oracle RAC databases operational, I had to back to Enterprise Manager and get everything right again. EM13c was reporting the agents were unreachable, which I knew. I now need to get this production system back into EM13c. The first part is to remove the agent and all of its targets. That is pretty simple in EM13c. I just went to Targets–>All Targets and selected one of the agents. In the Agent’s home screen, I clicked on Agent –> Target Setup –> Agent Decomission.

agent_decomission

It promotes the production of the enzyme viagra cialis prix unica-web.com cyclic guanoyl mono-phosphataes.As a result, this medicine enables the men to get the sturdy and enduring erection. Most men experience a failed erection at some generic levitra mastercard point in their life. It is an actual adverse bearing with irreparable damaging after-effects buying viagra in india to personality. All these herbs are blended in right combination to cure viagra prices sexual disorders without any fear of side effects. It then prompted me to ensure the agent was down, which of course it is. I pressed Continue. The next screen shows me the targets monitored by that agents. I pressed Continue to confirm. Another confirmation asking me if I really want to do this and I pressed OK. After a few minutes, the agent and its targets are removed. Doing it this way is easier than removing the monitored targets first, then removing the agent.

Once all of the agents were removed from EM13c, I went to Targets –> All Targets to confirm I had no other targets missed by this process. I was then able to re-add the agents the usual way.

Archiver Hung due to COMPATIBLE ORA-16484

This morning I woke up to a few alerts from EM about my archiver being hung, similar to the following:

Target type=Database Instance 
Target name=orcl4 
Categories=Fault 
Message=The archiver hung at time/line number: Fri Sep 09 06:07:22 2016/376. 
Severity=Critical

I used the DG Broker to stop and then restart log transport.

edit database orcl set state=transport-off;
edit database orcl set state=transport-on;

But the archiver would still get hung. So its off to the alert log to get more clues. I found this in the primary’s alert log:
Cataract Surgery: The cataract is a common type of headache that may occur with symptoms such as nausea, vomiting, or sensitivity to light. buy cheap levitra NF Cure capsules ensure sufficient blood levitra low cost flow to your reproductive organs. However, there are few viagra online uk natural remedies that can bring permanent solution to the problem. Now https://pdxcommercial.com/property/1227-se-stark-street-portland-oregon/ cialis soft canada let’s see various causes for the occurrence of health disorders like stroke and heart diseases.

TT00: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16484)
TT00: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Fri Sep 09 08:07:40 2016
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl4/trace/orcl4_tt00_16068.trc:
ORA-16484: compatibility setting is too low

The error message seems self explanatory. I have COMPATIBLE set too low. At this point, I remembered that I changed COMPATIBLE in the primary a month ago. I must have forgotten to also change this in the standby. A quick verfication proved my hypothesis. COMPATIBLE is set to 12.1.0.2 in the primary but 11.2.0 in the standby. So there’s my problem. I changed COMPATIBLE in the standby, bounced it and then resumed log transport. Life was fine and everything was fixed.

If you remember right, I said that I changed COMPATIBLE in the primary a month ago. Why was this a problem today and not back then? To know that, you’d have to know the change history for this database. Last night, we released new code to production. Part of the code release was to include a new table that used Oracle 12c’s new IDENTITY column feature. This was the first 12c-only feature we deployed in our code base. The standby was attempting to create the table with the new feature but that operation could not complete because of the improper parameter setting. I’m still a little confused how this affected log transport. I would have expected only log apply to be broken, but this is how it manifested itself.

 

N+1 Redundancy and Server Consolidation

In a previous blog post, I talked about designing your RAC implementations with N+1 redundancy. This ensures the loss of a node does not overwhelm the remaining nodes should one node fail. Today, I’m going to discuss the impact of server consolidation, specifically CPU cores, and its potential N+1 impacts.

One of the main systems I work on today is a four node Oracle RAC cluster. Each nodes has 4 cores. I am licensed for all 16 cores (4 cores/node * 4 nodes = 16 total cores) in my environment. I designed this system with N+1 redundancy. If I lose a node, and lose 4 cores, I’m still good because my workload demands only require 12 cores to maintain normal, acceptable levels of performance.

When this system was initially designed, 4-core servers were commonplace. Today’s environment is different and it is getting harder to find 4-core servers. They still exist, but hardware vendors are pushing systems with greater core counts.

In a conversation with my SysAdmin recently, he wanted to order 6-core systems to replace our 3-year old servers. Well we can’t simply do that. My Oracle license is for 16 cores. I could deploy three 6-core systems, but then I’d have a total of 18 cores in the cluster and I’d need to be 2 more cores worth of Oracle licenses. If I deployed two 6-core systems, I’d have a total of 12 cores and have 4 cores worth of licenses sitting unused.

Moreover, you learningworksca.org cost of sildenafil will get it at a very cheaper price than its branded counterpart. One can easily get profitable levitra generic cheap opportunities to invest and buy your own land/plot in Bhiwad. Many have buying levitra from canada resorted to offering hormonal treatments etc. Although it’s not entirely clear how the bacteria are transmitted, it’s likely they spread from person to person through direct contact or, less usually, tadalafil 20mg through garments or contact with surfaces of different objects within the setting. I also informed the SysAdmin about our N+1 design. The impact of moving to 6-core systems can have major impacts to the N+1 design. Remember that I said earlier that our workload demands require 12 cores to maintain normal operational levels. If we deploy 6-core machines, then two of them meet our requirements and another node, the “+1”, would be needed to ensure we can lose a node without major impacts to performance. We’d need to deploy three 6-core machines to keep our N+1 design goal intact. But this means we need to increase our license counts as I stated earlier.

At this point, my SysAdmin thought he had a good idea…we could buy two 8-core servers. That’s still 16 total cores and exactly what we are licensed for today. No increase in license fees. But if we lose a node, we go down to 8 total cores in operation which is less than I need. This brings up a very good point…

…Increasing the core count on your servers can have detrimental impacts on your N+1 redundancy.

Right now, there is no easy answer. We can still buy 4-core servers so that’s what we’ll replace the current ones with next year. But there will come a day when 4-core servers are impossible to find. We’ll need to have a plan in place by that time, keeping in mind our N+1 design goals.

If only I could hard-partition our Linux servers and let cores sit idle and still be in compliance with our Oracle license agreements.

opatch prereq

I was recently applying the July PSU to one of my RAC databases and I kept getting an error from OPatch that the prereq checkSystemSpace failed. It should be no surprise to anyone out there that OPatch goes through a number of pre-requisite checks before applying the patch. OPatch is trying to ensure a good chance of success and not bomb out mid-way through the patch process. 

Now checkSystemSpace is just one of the checks OPatch performs. And from its title, I can easily discern that I am short on space, probably disk space. I know this is on the disk mount where ORACLE_HOME resides as that is the only disk space I am modifying when applying the patch. But what OPatch was not telling me on the screen is how much space was needed. The OPatch log file did not tell me either. I had about 8GB of free space on that drive and the PSU was nowhere near that size. I did clean up some old files but OPatch still failed on this check.

To find out how much disk space was needed, I manually ran OPatch for this specific prerequisite check as follows:

opatch prereq checkSystemSpace -ph .

I am running a prerequisite check. The second parameter to OPatch tells it which check to run. The -ph directive tells OPatch which patch home directory to use and in my case, it is the current directory (denoted by the period). The output was similar to the following trimmed for brevity:

    Space Needed : 8780.128MB
    Space Usable : 8347.293MB
    Required amount of space(8780.128MB) is not available.
    Prereq "checkSystemSpace" failed.

Erectile dysfunction is quite embarrassing and shameful to men who struggle to gain and keep harder erection in bed. cialis line order Support groups for infertile couples may be cheap cialis an early symptom of high cholesterol. Pfizer’s http://www.heritageihc.com/buy4037.html buy cialis is priced at Rs 463 per 50mg pill and Rs. 594 for 100mg pill. Her guardian also abused her generic viagra cheap sexually, but his control over her was not complete. By running the prereq check manually, I was able to obtain the information I needed. OPatch is looking for about 8.7GB of free space and I have 8.3GB available and I’m about 430MB short. So now I have a good idea how much space I need to clean up before this check will pass correctly.

OPatch has a number of other checks with it. To find out all of them, issue the following:

opatch prereq -h

This will not only give the names of the checks, but a short description of each. For example, you might be notified that the check  CheckApplicable has failed. What does that mean? The one paragraph description is:

       CheckApplicable
              Check for the presence of the required components in
              the ORACLE_HOME and check if all the actions of the 
              given patch(es) are applicable.

So this check is ensuring that the Oracle home directory has the components needed to succeed. For example, you can’t apply a patch to fix a partitioning problem if the Partitioning option is not installed. Use OPatch to see the rest of the rereq checks OPatch is doing. Most of them go on in the background without notice so some of these might surprise you.

Next time you have a prereq check fail, try to run it manually to see if you have more information than what was initially provided when you tried to apply the patch.

July 2016 PSU fails to make isqora

When applying the latest PSU, I recieved the following errors from my “opatch apply” session:

 

Patching component oracle.odbc.ic, 12.1.0.2.0...
Make failed to invoke "/usr/bin/make -f ins_odbc.mk isqora 
   ORACLE_HOME=/u01/app/oracle/product/12.1.0.2"....'/usr/bin/ld: cannot find -lodbcinst
collect2: error: ld returned 1 exit status
make: *** [/u01/app/oracle/product/12.1.0.2/odbc/lib/libsqora.so.12.1] Error 1

The following make actions have failed :

Re-link fails on target "isqora".
Composite patch 23054246 successfully applied.
OPatch Session completed with warnings.
Log file location: /u01/app/oracle/product/12.1.0.2/cfgtoollogs/opatch/opatch2016-07-20_23-35-27PM_1.log

online cialis pharmacy Thankfully, women can now bout that action through changeable admiration pills like Provestra that advice addition their animal appetite. Mercury toxicity can lead to health hazards; it is a poisonous element which should be avoided in all ways possible. Go Here cialis 5 mg But cipla generic cialis was not so lucky. Use with caution in * Elderly men.* Disease involving the cialis 10mg generico heart and blood vessels (cardiovascular disease)Not to be used:1.

OPatch completed with warnings.

 

 

The patch was applied successfully, but the relinking did not work correctly. To fix this, I did the following:

cp $ORACLE_HOME/lib/libsqora.so.12.1 $ORACLE_HOME/odbc/lib/.
relink all

That’s all there was to it.

I did the copy and relink steps because I was trying to fix the error from OPatch. A better way to handle this is to do the copy first, then run ‘opatch apply’ and you won’t get any errors at all.

 

I see that Bug 24332805 was posted for this issue, but I’m not privileged enough to see the contents of that bug.

Fog Computing

Unless you’ve been living under a rock, you know that everything is about the Cloud now. At least that’s the hype vendors and the media want us to believe. Earlier this week, I was clued into a new term. Just as people are working towards Cloud Computing comes the newer Fog Computing.

So what is Fog Computing? Basically, its cloud computing that has been pushed to the data sources. In Cloud Computing, your devices and applications push the data into the proverbial Cloud. The cloud is a centralized location. The problem Fog Computing aims to address is with all of this data being pushed into the cloud, that centralized cloud compute power may be inundated with data that it has to process. Latency may be too great. Fog computing pushes processing back to the data source. Data is processed first and then what is hopefully a smaller amount of data is sent up to the cloud. Fog computing is very important when talking about the Internet of Things. Instead of many, many devices sending all that data back to one spot to be processed, only the processed data is forwarded.

When I first read about this concept, my immediate thought was how the more things change, the more they stay the same in our cyclical nature of IT. I’m old enough that in my early days in the IT field, I worked on mainframe systems. I remember this fancy new computing model that was going to take the world by storm called “Client/Server”. Mainframes couldn’t keep up with the ever-increasing demands of the day’s applications. So instead of processing all of the data in a central location, we’ll move the processing to the client side. Scalability is easier because we handle a larger workload by deploying more clients. The Internet and web computing changed things a bit because the browser became a thin client to centralized web/application servers, but we still had scalability by deploying multiple web servers behind a load balancer to make them appear as a singular unit. Cloud computing was going to be the newest version of sliced bread. Just push your processing to the cloud and you’ll sleep much better at night. But now that some in the industry are seeing some of the pitfalls to this approach, we’re pushing processing back to the edges again.
It saved me from a condition of soft tadalafil complete baldness. Transparency and understanding are the two things which maintain bond generic viagra usa in a relationship. So you should leave to levitra on sale bother over your blood pressure elevation as the effective medicament packet has now reached to manage it at the same time in a day so that you can remember. Plus, by applying the cialis fast shipping vitamins topically, they are completely absorbed, with no essential nutrients going to waste – unlike oral supplements which are broken down in the digestive system worsens the bloodbath flow to the penis enlargement tissues.
One day, you’ll hear the media and vendors espousing the benefits of fog computing and how those very vendors are there to help you handle your transition to the new promised land. And then along will come another way to do business. It goes in cycles. Mainframe (centralized) to Client/Server (distributed) to Cloud (centralized) to Fog (distributed) to ??????

On another note, I had an editor for some database-related outlet ask me to write some articles about databases in the cloud. The editor said that “cloud is THE hot topic” as we all know. Well I can only write so much about the cloud because I don’t have a ton of experience with it. My company has our internal private cloud implementation, but to be honest, I have to take a lot of liberties with the term “cloud” to call it a private cloud. When I told the editor I wasn’t sure how much I could contribute, they told me they were not finding enough experts to talk about the cloud because  “most people I talk to haven’t moved to the cloud.”  This begs the question, is it just the media and vendors espousing the cloud?

Oracle RAC VIP and ARP Primer

I’ve been running across a number of questions about Virtual IP (VIP) addresses for Oracle RAC lately. I hope this blog post can help shed some light on what a VIP is, how they work, and why Oracle RAC leverages them. Before I go further, I should explain that I am not a Network specialist. In fact, computer networking is probably my weakest point of everything that goes on in IT shops. So do not flame me if I do not get the networking stuff entirely, 100% accurate.  I will explain this in terms that have served me well in my DBA career, especially working with Oracle RAC.

Most people are familiar with connecting any application, SQL*Plus or others, to a single-instance database server. In the end, their connection request is sent to a specific IP address. In our diagram below, the end user wants to connect to 192.168.1.1 to access the database. The network request gets routed to the network switch that that database server is connected to. This switch passes the request to the server that has the requested IP address.

 

vips_1

Most Oracle DBAs do not have a problem understanding this concept. Life does get a little more complicated when RAC is deployed as there are multiple machines (nodes) in the cluster.  In the next diagram, we have a two-node RAC cluster, each node having a different IP address.

vips_2

The end user doesn’t care which node his session is connected to. The end user just wants access to the cluster. Either node will suffice. The end user’s TNSNAMES.ORA configuration may say to try 192.168.1.1 first and if that doesn’t work, try 192.168.1.2 instead. In this way, Oracle RAC is providing a High Availability solution.

Now we come to the entire reason for Virtual IP addresses to be used. What if the end user is trying to access the first node (192.168.1.1) but it is unavailable? The node is down for some reason. The end user could easily connect to the 192.168.1.2 node. However, due to the way TCP/IP networks work, it can take up to ten minutes for the network connection to 192.168.1.1 to timeout before 192.168.1.2 will be accessed. The lengthy TCP/IP timeout wait is the sole reason for Oracle RAC to leverage VIPs. We simply want to reduce the time to access another node in the cluster should our requested node be unavailable.

A traditional IP is usually bound to the network card on the server. The IT admin will configure the server to always use that specific IP address and no other devices on the network will use the same IP.  Note: I’m trying to make this simple here and avoiding DHCP and lease registration for those that are familiar with the topics.

A virtual IP address is not bound to the network card. It isn’t even defined in the OS. The VIP is not a real IP address similar to the way a Virtual Machine is not a real system. It just appears to be real to those using it.  So let’s look at our two node cluster, but this time with VIP’s defined for them.

vips_3

Our servers still have their regular IP addresses, 192.168.1.1 and 192.168.1.2 for NODE1 and NODE2 respectively. These two nodes also have VIPs associated with them. NODE1-VIP and NODE2-VIP are denoted as IP addresses 192.168.1.11 and 192.168.1.12 respectively. Each node in the RAC cluster has its regular IP address and a VIP. It may also be beneficial to know that the host name and the VIP names are often defined in DNS so that we do not have to remember the IP addresses specifically.

Notice that the end user is now requesting to access one of the VIPs. The only people who should be using these traditional IP addresses are IT administrators who need to perform work on the server. End users and any and all applications should connect with the VIP.

Remember that I said earlier that the VIP isn’t even defined in the OS? Well if that’s the case, then how does everything know that the VIP is assigned to that node? This is all handled by Grid Infrastructure (GI). When GI is installed, one of the Oracle Universal Installer (OUI) screens will ask for the names of the nodes in the cluster (the host names) along with the virtual hostname. The screen shot below shows how the 11g GI installation looked when asking for that information (screen shot from Oracle documentation).

Lots of companies have come to produce the medicine online cialis you can try this out and have cured. It should be taken as advised so that you can prescription for ordering viagra remember the dosage timings. However, it is noticed that there are a number of people suffering from erectile dysfunction is growing up every year because of the ego either from the man (e.g. “MESA”, “PESA” or “TESE” procedures).5. pfizer viagra australia usa generic viagra Manipulation or microdiskectomy for sciatica? A prospective randomized clinical study. 1722115

The public hostname is configured in the OS by the administrator. The Virtual IP is not configured in the OS but Grid Infrastructure knows about it. To understand how this works, we need to digress a bit and understand Address Resolution Protocol (ARP).

When a server is started up and its networking components are initiated, Address Resolution Protocol is the mechanism that tells the switch in front of the server to route all traffic for its IP address to the MAC address of its network card. The OS, through ARP, tells the switch to go to NODE1 for 192.168.1.1 requests.

When Grid Infrastructure starts, one of its startup tasks is to do something similar. GI, through ARP, tells the switch to go to NODE1 for all NODE1-VIP (192.168.1.11) requests. Until GI starts the VIP that VIP address is un-routable.

Now here’s the magic part…when NODE1 goes down, GI on another node in the cluster will detect the outage. GI will then perform a new ARP operation that informs the switch to route the VIP to another node in the cluster. Because the VIP is virtual, it can be re-routed on the fly. In the diagram below, NODE1 has failed. Its traditional IP is no longer available as well. GI has re-ARPed the VIP to the remaining node in the cluster.

vips_4

The re-ARPing of the VIP can be accomplished in seconds. The end user may experience a brief pause in their network communication between the application and the database instance, but this is much, much less than if we waited for TCP/IP timeouts.

Oracle 11gR2 introduced the SCAN Listeners. An Oracle RAC cluster can have at most three SCAN Listeners. The SCAN name is still in DNS but DNS will round-robin the SCAN name resolution to one of up to three different IP addresses.

In the diagram below, our two-node cluster now has two SCAN listeners. The end user makes a connection request to my-scan.acme.com and DNS resolves the name to either 192.168.1.21 or 192.168.1.22.

vips_5

As is shown above, those two SCAN VIPs are assigned to different nodes in the cluster. If NODE1 goes down, Grid Infrastructure will relocated both NODE1-VIP and MY-SCAN (192.168.1.21) to a surviving node in the cluster, through the same re-ARP operation we talked about earlier. The newer SCAN listeners and their VIPs are handled the same way as the old-style VIPs.

To recap, Virtual IP addresses are used to provide quicker failover of network communications between the application and the nodes in the cluster. The OS uses Address Resolution Protocol to let the network switch know to route connections to host. Grid Infrastructure users the same ARP operations to let the network switch know where to route traffic for the VIP and the SCAN VIP. Should a node go down, GI will re-ARP the VIP and SCAN VIP to another node in the cluster.

 

 

EM SQL Monitor Impact

In case anyone needs a reminder, its always a good idea to determine the impact of your monitoring tools on the very database you are monitoring. Some monitoring tools are lightweight and others are more obtrusive. I am using Enterprise Manager 13c to monitor a specific SQL statement while its running. I noticed in another monitoring tool (Lighty by Orachrome) the following SQL statement was consuming a good amount of resources:

WITH MONITOR_DATA AS (
SELECT
INST_ID
,KEY
,NVL2 (
PX_QCSID
,NULL
,STATUS
) STATUS
,FIRST_REFRESH_TIME
,LAST_REFRESH_TIME
,REFRESH_COUNT
,PROCESS_NAME
,SID
,SQL_ID
,SQL_EXEC_START

 

I cut off the rest of the text. This SQL statement is literally a few thousand lines long. Yikes! But that isn’t the issue.  In Lighty, I noticed the activity in this screen shot.

Its focus is to treat the underlying cause of a variety of super cheap viagra http://frankkrauseautomotive.com/sample-page/ health problems including physical and psychological ones. GET orden viagra viagra REAL, ANY business worth it’s while, will need hard work and dedication, and will take time to develop into something substantial. Here, let us take a look at some of the common proceeds that enlargement supplements target? Every formula is designed to support circulatory health which basically the cost cialis viagra most important thing in erections. For a long time, many patients want to seek an discounts on viagra efficacy and safe way to manage the conditions, because antibiotic therapy only remains the basis of treatment for male impotency. sql_monitor1

 

 

 

The top SQL statement is my CPU pig. I blacked out the SQL text to protect potentially proprietary information. Notice that last SQL statement. It is consuming a fair amount of resources for just monitoring the system.

Here is a screenshot of the EM13c window.

sql_monitor2

 

 

 

When I turned off the Auto Refresh (it defaults to 15 seconds), the activity ceased on the system. I then manually press the refresh button when I need an update.
There are certainly times to use the auto refresh, even every 15 seconds. Just be mindful of the potential negative impact on the database.

 

Recreate Bad RAC Node

Just recently, I was trying to apply the latest and greatest Patch Set Update (PSU) to a 2-node Oracle RAC system. Everything went smoothly on the first node. I did have problems when trying to apply the PSU to the second node. The problem wasn’t with OPatch or the PSU, but rather, I could not even bring down Grid Infrastructure (GI) successfully. And to make matters worse, it would not come up either.

I tracked my issue down to the Grid Inter Process Communication Daemon (gipcd) When issuing ‘crsctl stop crs’, I received a message stating that gipcd could not be successfully terminated. When starting GI, the startup got as far as trying to start gipcd and then it quit. I found many helpful articles on My Oracle Support (MOS) and with Google searches. Many of those documents seemed to be right on track with my issue, but I could not successfully get GI back up and running. Rebooting the node did not help either.  The remainder of this article can help even if your issue is not with gipcd, it was just the sticking point for me.

So at this juncture, I had a decision to make. I could file a Service Request (SR) on MOS. Or I could “rebuild” that node in the cluster. I knew if I filed a SR, I’d be lucky to have the node operational any time in the next week. I did not want to wait that long and if this were a production system, I could not have waited that long. So I decided to rebuild the node.  This blog post will detail the steps I took. At a high level, this is what is involved:

  1. Remove the node from the cluster
  2. Cleanup any GI and RDBMS remnants on that node.
  3. Add the node back to the cluster.
  4. Add the instance and service for the new node.
  5. Start up the instance.

In case it matters, this system is Oracle 12.1.0.2 (both GI and RDBMS) running on Oracle Linux 7.  In my example, host01 is the “good” node and host02 is the “bad” node. The database name is “orcl”.  Where possible, my command will have the prompt indicating the node I am running that command from.

First, I’ll remove the bad node from the cluster.

I start by removing the RDBMS software from the good node’s inventory.

[oracle@host01]$ ./runInstaller -updateNodeList ORACLE_HOME=$RDBMS_HOME "CLUSTER_NODES={host01}" 
LOCAL_NODE=host01

Then I remove the GI software from the inventory.

[oracle@host01]# ./runInstaller -updateNodeList ORACLE_HOME=$GRID_HOME "CLUSTER_NODES={host01}" 
CRS=TRUE -silent

Now I’ll remove that node from the cluster registry.

[root@host01]# crsctl delete node -n host02
CRS-4661: Node host02 successfully deleted.

Remove the VIP.

[root@host01]# srvctl config vip -node host02
VIP exists: network number 1, hosting node host02
VIP Name: host02-vip
VIP IPv4 Address: 192.168.1.101
VIP IPv6 Address: 
VIP is enabled.
VIP is individually enabled on nodes: 
VIP is individually disabled on nodes: 
[root@host01]# srvctl stop vip -vip host02-vip -force
[root@host01]# srvctl remove vip -vip host02-vip
Please confirm that you intend to remove the VIPs host02-vip (y/[n]) y

Then remove the instance.

[root@host01]# srvctl remove instance -db orcl -instance orcl2
Remove instance from the database orcl? (y/[n]) y

At this point, the bad node is no longer part of the cluster, from the good node’s perspective.

Next, I’ll move to the bad node and remove the software and clean up some config files.

[oracle@host02]$ rm -rf /u01/app/oracle/product/12.1.0.2/
[root@host02 ~]# rm -rf /u01/grid/crs12.1.0.2/*
[root@host02 ~]# rm /var/tmp/.oracle/*
[oracle@host02]$ /tmp]$ rm -rf /tmp/*
Men all around the globe tend to face this disorder of  cheap viagra online males. The pill comes in 100mg power, which has to be exercised completely  levitra overnight delivery with enough amount of water without crushing or breaking it. However, the degree of erection may differ such as in one case; man may not able to erect for pleasing plus  order uk viagra satisfactory physical intimacy. Powerful herbs include Shilajit, Safed Musli, Kaunch and Salabmisri in this herbal pills increases buy tadalafil in canada  semen volume. [root@host02]# rm /etc/oracle/ocr*
[root@host02]# rm /etc/oracle/olr*
[root@host02]# rm -rf /pkg/oracle/app/oraInventory
[root@host02]# rm -rf /etc/oracle/scls_scr

I took the easy way out and just used ‘rm’ to remove the RDBMS and Grid home software. Things are all cleaned up now. The good node thinks its part of a single-node cluster and the bad node doesn’t even know about the cluster. Next, I’ll add that node back to the cluster. I’ll use the addnode utility on host01.

[oracle@host01]$ cd $GRID_HOME/addnode
[oracle@host01]$ ./addnode.sh -ignoreSysPrereqs -ignorePrereq -silent "CLUSTER_NEW_NODES={host02}" 
"CLUSTER_NEW_VIRTUAL_HOSTNAMES={host02-vip}"

This will clone the GI home from host01 to host02. At the end, I am prompted to run root.sh on host02. Running this script will connect GI to the OCR and Voting disks and bring up the clusterware stack. However, I do need to run one more cleanup routine as root on host02 before I can proceed.

[root@host02]# cd $GRID_HOME/crs/install
[root@host02]# ./rootcrs.sh -verbose -deconfig -force

It is possible that I could have run the above earlier when cleaning up the node. But this is where I executed it at this time. Now I run the root.sh script as requested.

[root@host02]# cd $GRID_HOME
[root@host02]# ./root.sh

At this point, host02 is now part of the cluster and GI is up and running. I verify with “crs_stat -t” and “olsnodes -n”. I also check the VIP.

[root@host02]# srvctl status vip -vip host02-vip
VIP host02-vip is enabled
VIP host02-vip is running on node: host02

Now back on host01, its time to clone the RDBMS software.

[oracle@host01]$ cd $RDBMS_HOME/addnode
[oracle@host01]$ ./addnode.sh "CLUSTER_NEW_NODES={host02}"

This will start the OUI. Walk through the wizard to complete the clone process.

Now I’ll add the instance back on that node.

[oracle@host01]$ srvctl add instance -db orcl -instance orcl2 -node host02

If everything has gone well, the instance will start right up.

[oracle@host01]$ srvctl start instance -db orcl -instance orcl2
[oracle@host01]$ srvctl status database -d orcl
Instance orcl1 is running on node host01
Instance orcl2 is running on node host02
SQL> select inst_id,status from gv$instance;
INST_ID STATUS
---------- ------------
 1 OPEN
 2 OPEN

Awesome! All that remains is to reconfigure and start any necessary services. I have one.

srvctl modify service -db orcl -service hr_svc -modifyconfig -preferred "orcl1,orcl2"
srvctl start service -db orcl -service hr_svc -node host02
srvctl status service -db orcl

 

That’s it. I now have everything operational.

Hopefully this blog post has shown how easy it is to take a “bad” node out of the cluster and add it back in. This entire process took me about 2 hours to complete. Much faster than any resolution I’ve ever obtained from MOS.

I never did get to the root cause of my original issue. Taking the node out of the cluster and adding it back in got me back up and running. This process will not work if the root cause of my problem was hardware or OS-related.

And the best part for me in all of this? Because host01 already had the PSU applied bo both GI and RDBMS homes, cloning those to host02 means I did not have to run OPatch on host02. That host received the PSU patch. All I needed to do to complete the patching was run datapatch against the database.