Feb 23

Oracle RAC on Third-Party Clouds

Yesterday, I came across this white paper from Oracle Corp about Oracle RAC Support on Third-Party Clouds. The paper is definitely a must-read for those who want to run Oracle RAC on AWS, Google or Azure cloud providers. The first paragraph was promising where it stated:

Oracle also maintains an open policy for Third-Party Cloud vendors to support the Oracle Database in their infrastructure.

This sounds great, but the quote is talking about the Oracle Database and not about the RAC option. I found it interesting that the paper spends a great deal of time talking about AWS and Azure but never mentions Google’s cloud.

Here is the bottom line on Oracle’s position for supporting RAC on other cloud offerings:

Oracle RAC is supported on all cloud environments supported by the Oracle Database, as long as the environment is able to provide the hardware, storage, and networking requirements as specified in the Oracle RAC and Grid Infrastructure documentation. With the exception of the Oracle Cloud, Oracle has not tested nor certified Oracle RAC in these environments.

The paper goes to great length to illustrate how AWS muddies the waters with respect to the storage and networking requirements needed to run RAC in AWS.

It should be obvious that Oracle is trying to steer its customers away from running RAC in AWS by reading this paper. Amazon has put out information letting us know how easy it is to run RAC in AWS. But after reading this Oracle white paper, I would have some concern on how well the AWS environment would work for mission-critical RAC deployments.

AWS provides shared storage for RAC by using iSCSI targets on VMs, which I would never use for anything other than a testbed. One issue is redundancy at the storage level. In order to have redundancy at the storage level:

 

Amazon’s proposed solution is “deploying two or three of these NAS instances and mirroring the storage between them using ASM Normal or High redundancy.” However, should both storage instances fail or be taken down (e.g. for maintenance) at the same time, there will be a complete data loss.

 

For the private network, AWS does not support multi-casting, a requirement for the Grid Infrastructure Cluster Interconnect. AWS gets around this by using a point-to-point VPN network using n2n ntop. But from ntop’s own website, there is no further development on this in the last two years.

Don’t get me wrong. I think that AWS is a great cloud solution for lots of different things. Admittedly, I have not run RAC on AWS, at least not yet. But if I were looking to move my company’s RAC database infrastructure to the cloud, I would seriously investigate the claims in this Oracle white paper before committing to the AWS solution. That last sentence is the entire point of this blog post.

 

Jan 27

Free ES for 11.2.0.4 and 12.2 Release Date

The free Extended Support period for Oracle 11.2.04 has been extended for the second time. Oracle will now offer Extended Support for free to its customers with a current maintenance agreement up until Dec 31, 2018. This is almost a full two years away. Those running on Oracle 11.2.0.4 can breath a sigh of relieft that they have more time to get upgraded to 12c.

I will admit I am a bit surprised that Oracle is extending free ES for 11.2.0.4 another time. But maybe it shouldn’t surprise me because now it makes my statements in this Search Oracle article inaccurate. http://searchoracle.techtarget.com/tip/Ten-reasons-to-do-an-Oracle-database-upgrade

Today we also learned that Oracle 12.2.0.1 now has a release date for on-premise deployments. Oracle 12cR2 was released in 2016, but as a cloud-first release. This was an incentive from Oracle to drive you to their cloud infrastructure. Those not on the cloud yet have been eagerly awaiting the ability to download the newest version. You will be able to do so on March 15, 2017 for Intel and Solaris platforms. Other platforms will follow later.

For more details, refer to MOS Note 742060.1 and this blog article: https://blogs.oracle.com/UPGRADE/entry/release_dates_oracle_database_12

Jan 03

Database Upgrades – Flu Shots or Flu?

The new year is upon us and people are starting to ramp up their IT projects for 2017. Is a database upgrade on  your list of projects? Unless you’re running Oracle 12.1.0.2 across the board in  your data center, it should be!

I was answering some Oracle upgrade questions today when it occurred to me that database upgrades are either a Flu Shot or a Flu decision. I see people all the time that refuse to upgrade their Oracle databases on a regular basis. I still field questions from people running Oracle 10.2, or (shudder) 9i, or (VERY BIG SHUDDER) 8i! Things are working great, database upgrades are a big pain to perform, so they leave it alone. Then the next thing they know, they are lagging very far behind. So what does this have to do with the Flu?

Every year, you can get a flu shot. Sure, you get poked with a needle and experience some pain. You may even have a few minor flu-like symptoms causing you some discomfort. But people put up with the flu shot and all its complications because getting the flu itself is so much worse.

If you upgrade regularly, its like getting a flu shot. At my company, they were running Oracle 11.1.0.7 when I started working here. I upgraded everything to 11.2.0.2 (skipped 11.2.0.1) and then to 11.2.0.3 and then to 11.2.0.4 and then to 12.1.0.2 (skipped 12.1.0.1). Yes, each upgrade had its share of pain points, but they were minor compared to upgrading from 9.2 to 12.1.0.2. Such a big upgrade is like getting the flu. Keeping regular with your database versions is like getting the flu shot.

You should either be running Oracle 12.1.0.2 in  your data center across the board or making plans to do so. Notice I said “in your data center” and that’s because 12.2 is only available in the cloud as I write this. Oracle currently supports 11.2.0.4 and 12.1.0.2 for on-premise deployments. But 11.2.0.4 is currently on free Extended Support and that free period expires in less than 5 months. It can take that long to roll the upgrade through development, into test, and into production, and sort out any issues you have.

So begin upgrade plans today. Work towards getting all Oracle databases running the 12.1.0.2 version in your data center. If you’re upgrading to 12.1.0.2, then you can only perform a direct upgrade from 10.2.0.5, 11.1.0.7, 11.2.0.2 or later, or 12.1.0.1 otherwise direct upgrade is not supported.

Once you get to 12.1.0.2, look at how your organization can benefit from regular flu shots. If 12.1.0.3 is released,  you can be sure that it will be on my radar to move towards.

Nov 30

verify_queryable_inventory returned ORA-20008: Timed out

I have one system that always receives the same error when I apply the PSU. When the PSU is being applied, the execution of datapatch fails. I get the following on my screen:

 

Queryable inventory could not determine the current opatch status.

Execute 'select dbms_sqlpatch.verify_queryable_inventory from dual'

and/or check the invocation log

/u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_25889_2016_11_09_22_26_12/
    sqlpatch_invocation.log

for the complete error.

 

In the sqlpatch_invocation log file, the following is shown:

 

verify_queryable_inventory returned ORA-20008: Timed out, Job 
Load_opatch_inventory_1execution time is
more than 120Secs


Queryable inventory could not determine the current opatch status.

 

Datapatch is trying to ensure that DBMS_QOPATCH is installed correctly. But the check times out. There are a number of reasons for this. It takes some investigation to check system resource utilization to see if there is a bottleneck. One might also need to check database performance while this is being run.

 

In my case, I know I have a very slow VM. I’m working with our server team to ensure that VM is configured correctly. In the meantime, I can get past the timeout by doing the following:

 

alter system set events '18219841 trace name context forever';

$ORACLE_HOME/OPatch/datapatch -verbose

alter system set events '18219841 trace name context off';

 

Setting the event lets datapatch complete successfully.

Nov 28

Oracle Ace Changes

The Oracle Ace Program is changing the way members have to report their activities. Oracle Aces (which I’m using to generically refer to Aces Associates, Aces, and Ace Directors) have long known that they need to keep contributing to the Oracle community at large. Aces regularly have to let the program know of their outreach activities. How we notify the program is changing and with change comes the inevitable complaints that always proceeds any change. I waited until I reported my activities and read a few comments from other Aces before writing this blog post. These opinions are my own and not all Aces may agree with me.

When I first started hearing some of the grumblings, I was a bit concerned. There are some well known Aces that have decided they have had enough with the program. Its never a positive to see good people leave. They don’t like the new method of reporting activities, or they don’t like that they have to produce more content than they are currently able to do in order to generate enough points (more on that in a bit). For those who do not report their activities or do not contribute enough, they will lose the current Ace standing. Some of these Aces say they will still contribute to the community just like they always have, but they do not see a future in the Ace program for themselves. To me, it is a bit bothersome when an individual you know of and personifies “Oracle Ace” no longer wants to be part of the program simply because the reporting requirements have become too much work, and for no other reason.

That being said, the new reporting mechanism has made things much more transparent. In the past, there were loose guidelines as to what was needed to become and stay and Ace. Now, our contributions are given a point value and you need to have a certain number of points to maintain your standing. Write a blog post, and its X points. Give a presentation at a conference, and its Y points. Add them all up and make sure your over the threshold for your Ace level. Transparency is normally a good thing and the new changes have provided just that. My feeling is that this level of transparency is a good thing and will only help the program going forward.

The program has maintained since they announced these changes that the first go-round will be used to potentially adjust the points for certain activities. It looks to me like points do need to be adjusted in a number of categories. For example, to maintain Ace status (Ace and Ace Director), one would need a minimum of 10 presentations at conferences. Getting an abstract accepted is challenging with the level of competition for the few speaking slots available. Even if you could present two papers at a conference,  you’d need to speak at five different conferences over the course of a year. That’s a lot of travelling and a lot of work to present 10 different papers. I’d like to see the points in this category increased. Another one that could use a bump is book authorship. If you wrote one book, you would not maintain your Ace status. Writing a book requires a VERY LARGE commitment! It can take up to a year writing that book and you may not be able to do much else.

One does not have to produce content on one media type. You can do a combination like write blog posts, author a book or chapters in a book, be a technical editor of a book, create a podcast, write articles, present at a conference, and participate in the Oracle Communities. And there are non-content activities as well such as volunteering for a user group, organizing a meetup event, tweeting or being a cloud trial participant.

I think that overall, these changes are good and I do welcome them. But as with any change, there is going to be bumps in the road.

Oct 19

Slash or No Slash?

That is the question.

 

A recent post on the OTN forums asked about using semicolons and slashes as statement terminators. I dusted off an article I wrote for our development team over 4 years ago on this topic. This article received good reviews and is available on the OTN forums if desired. I thought I would also post it on my blog. Here is the article:

 

Slash or No Slash

by Brian Peasland

At our company, SQL scripts that are deployed are executed in Oracle’s SQL*Plus command-line utility, whereas many developers use a GUI tool like PL/SQL Developer or SQL Developer. The forward slash means something to SQL*Plus that is not needed in PL/SQL Developer or SQL Developer. As such, it can be confusing to know if you need to include a forward slash in your SQL scripts or not. Hopefully, this section will shed some light on what the forward slash does, when to use it, and when to not use it. Semi-colon Terminator For most SQL statements, the semi-colon is the statement terminator. For example, consider this simple SQL statement run in SQL*Plus:

SQL> select sysdate from dual;

SYSDATE

———

18-JUN-12

When SQL*Plus sees the semi-colon, it knows the end of the SQL statement has been reached and it can now execute the command.

SQL*Plus Buffer

You may not be aware that SQL*Plus has a buffer for its commands. If I press the ‘l’ key for ‘list’, then I can see the command currently in my session’s buffer.

SQL> l

1* select sysdate from dual

Not surprisingly, there is the command I just executed. I then executed another SQL statement and here is how my buffer now looks:

SQL> l

1 select sysdate,user

2* from dual

As you can see, I now have two lines in my session’s SQL*Plus buffer.

Slash = Execute Buffer

The first rule to understand about the forward slash is that to SQL*Plus, the forward slash means to execute the contents of the buffer. To illustrate this concept, I will execute a SQL statement, wait a few seconds, and then just execute that same SQL statement again but just executing the buffer.

SQL> select to_char(sysdate,’MM/DD/YYYY HH24:MI:SS’) from dual;

TO_CHAR(SYSDATE,’MM

——————-

06/18/2012 15:20:40

SQL> /
TO_CHAR(SYSDATE,’MM

——————-

06/18/2012 15:21:17

SQL> /

TO_CHAR(SYSDATE,’MM

——————-

06/18/2012 15:21:50

You can see that all I did the second and third time was to just type ‘/’ and hit enter and SQL*Plus executed the contents of its command buffer each time.

PL/SQL Blocks

The semi-colon statement terminator worked just fine by itself until Oracle introduced PL/SQL in Oracle version 7. The problem is that PL/SQL blocks can have multiple semi-colons to terminate the individual statements that make up that block. Consider this very simple PL/SQL block that does nothing:

SQL> begin

2 null;

3 null;

4 end;

5

Lines 2 and 3 contain perfectly valid statements that are each terminated with the semi-colon. And in line 4, we have the END keyword signifying the end of the PL/SQL block. If we were not allowed nested BEGIN/END pairs, then every time SQL*Plus sees “END;” it would know the end of the PL/SQL block has been reached, but we are allowed nested BEGIN/END pairs so the following is perfectly legal and valid:

SQL> begin

2 begin

3 null;

4 end;

5 null;

6 end;

7

You can tell from the above that just looking for “END;” is not enough because SQL*Plus would have tried to run the block after line 4. So how did Oracle decide to signify the that the PL/SQL block was ready to execute? The answer is by using the forward slash as you may already be aware. The second rule to understand is that all the forward slash is doing when you use it to end a PL/SQL block is to tell SQL*Plus to run what is in the buffer! This has not changed since before PL/SQL was created for Oracle 7. Consider the following sample:

SQL> begin

2 null;

3 end;

4 /

PL/SQL procedure successfully completed.

SQL> l

1 begin
2 null;

3* end;

On line 4, I typed the forward slash to execute the PL/SQL block. You can see that my block successfully completed. If we go back and look at the contents of my command buffer, you can see it contains everything but the forward slash. The forward slash is not part of the command buffer. So now, I’ll run a different PL/SQL block:

SQL> begin

2 dbms_output.put_line(‘Today is ‘||to_char(sysdate,’MM/DD/YYYY HH24:MI:SS’));

3 end;

4 /

Today is 06/18/2012 15:39:32

PL/SQL procedure successfully completed.

The forward slash said to SQL*Plus to run what is in its buffer, and the results are displayed. Now let’s type just the slash again and we should see our PL/SQL block get executed again.

SQL> /

Today is 06/18/2012 15:40:42

PL/SQL procedure successfully completed.

I did not have to type my PL/SQL block anew as it is currently in the command buffer.

PL/SQL and SQL Developer and PL/SQL Blocks

The biggest problem for most developers is that PL/SQL Developer and SQL Developer do not require you to use the forward slash. Why? Because you can hit Execute (F8) or Run Script (F5) to run your PL/SQL block. PL/SQL Developer knows that the moment you hit F8, you are intending to submit the PL/SQL block to be executed. In this case, F8 in PL/SQL Developer is doing the same job as the forward slash in SQL*Plus. Similarly, for F5 in SQL Developer.

The problem at my company is our team deploying code to production does not deploy code with PL/SQL Developer or SQL Developer. They use SQL*Plus because scripting multiple executions is easier with a command line tool. Many developers make the mistake of not including the forward slash for PL/SQL blocks in scripts because they do not need it, but if you want to deploy that code section in a SQL script, the forward slash is required at the end of each PL/SQL block.

When Not To Use Slash

So we’ve seen when and why we use the forward slash, but when is it bad to use it? The third rule to know is that it is bad to use the forward slash following a single SQL statement (not in a PL/SQL block), especially when that slash immediately follows a DML statement (INSERT, UPDATE, or DELETE). If my script contains the following:

select sysdate from dual;

/

Then I will get “double output” which is not what I normally intend to do in a script. I really only want one line returned, not two as the above script would do:

SQL> select sysdate from dual;

SYSDATE

———

18-JUN-12
SQL> /

SYSDATE

———

18-JUN-12

It is even worse when I use the forward slash following a DML statement because that statement will get executed twice. Consider the following script:

insert into test_tab values (10);

/

We now know that when I execute the two lines above in a script, SQL*Plus will execute it once due to the semi-colon statement terminator and then execute a second time because the forward slash tells SQL*Plus to run what is in the command buffer. When I execute the two-line script above, I get the following output:

SQL> insert into test_tab values (10);

1 row created.

SQL>

/

insert into test_tab values (10) *

ERROR at line 1: ORA-00001: unique constraint (PEASLAND.SYS_C00767176) violated

Oops! The first insert worked (1 row created.) but when the forward slash was entered, SQL*Plus tried to insert the same data and I got caught on a unique constraint violation.

Conclusion

Hopefully, this page shows why the forward slash is needed, what is does, and when not to use it. To recap:

  • Include the forward slash at the end of each PL/SQL block
  • Do not include the forward slash after any SQL statements not in a PL/SQL block.
  • The forward slash after a single SQL statement will cause that SQL command to execute twice.

Sep 29

SQL Server Clustering from an Oracle RAC Perspective

Its no secret that I know Oracle’s database clustering solution pretty well. Recently, I completed a SQL Server clustering, high availability solution that took two years from initial design to final implementation. That process involved documenting requirements, determining the options, mapping requirements to implementation details, budgeting, procurement, installation, configuration, and testing.

Now that my project is complete, I thought I would give a few items about SQL Server’s clustering from the perspective of an Oracle RAC guy. We all know that SQL Server and Oracle are both RDBMS engines and they may have some things in common. But they are also completely different creatures too. So if you’re comfortable with Oracle’s Grid Infrastructure and RAC and Data Guard, and are looking at implementing a SQL Server HA solution, maybe this will provide some good information for you.

Our current production system is a 4-node Oracle RAC primary database. This provides high availability (and high performance) within our primary data center. We use Data Guard to transport redo to a 3-node RAC physical standby database. Even though SQL Server <> Oracle, I wanted to keep our configuration as similar as possible to ease administration. So we deployed a 2-node SQL Server Failover Cluster at our primary site and a 1-node “standby” database at our DR site.

Now on to my observations, in no particular order.

  • SQL Server’s HA clustering solution is Active/Passive. Oracle’s is Active/Active which to me is “better”, and yes…that’s a subjective term. For our Active/Passive implementation, I did not like the idea of two physical servers sitting there with one essentially idle all the time. So we have one physical server which is the ‘preferred’ node and one virtual server. If the physical server fails, clustering will automatically failover the SQL Server instance to the virtual server and we’re operational again. This Active/Passive cluster does nothign to address scalability like Oracle RAC does, but it does give me higher availability in our primary environment.
  • Implementing the clustering is super easy. Turn on clustering at the OS level. Because this is an entirely Microsoft stack, they built clustering into the OS. Its already there for you. You just need to turn it on. Then fire up Administrative Tools –> Failover Cluster Manager and wizards walk you through the setup. Its much easier than installing Grid Infrastructure. But Oracle does have to contend with different OS platforms which makes it harder there. It will be interesting to see how SQL Server 2016 on Linux handles Failover Clustering.
  • Oracle uses a Shared Disk model whereas SQL Server is Shared Nothing. But you do need to use “shared disk” in a way because the disk needs to be available on both nodes. However, MS Failover Clustering (MSFC) mounts the clustered disk on the active node. When SQL Server is moved to the other node, either automatically or manually, MSFC will unmount the disk on one node then mount it on the other. Its kinda strange to have a Windows Explorer window open and see the disk either appear or disappear during this transition.
  • Grid Infrastructure uses the Voting Disk for quorum operations. In MSFC, you can have a Quorum disk, use a file share, or configure with no quorum. If you go with the latter, you hamper your automatic failover capability.
  • I’m used to my primary having its own cluster and the standby its own cluster. With SQL Server, the primary nodes and the standby nodes need to be part of the same cluster. Thankfully, the cluster can cross subnets  which is different than Oracle GI. Adding the standby node was super easy, we just removed its voting rights and we did not configure the quorum disk for the standby node. This was fine with us as we want failover to the standby to be a manual operation.
  • For a standby database, you can use Database Mirroring, Log Shipping or AlwaysOn Availability Groups (AGs). The first two are on their way out so I went with the AGs. AGs require the standby node to be part of the same cluster as the primary. There’s a wizard to walk you through setting up the databases to participate in the AG. This is much easier than setting up an Oracle physical standby.
  • For those of you who hate the Oracle documentation, its time to be thankful. Many times during this process I found the MS documentation to be missing very big pieces. For example, I never did find out how to configure my standby node to have no voting rights. Luckily we were able to click our way through it.

 

When it was all said and done, getting the SQL Server solution implemented was not that tough. Sometimes I had to rely on my knowledge of clustering. Other times, Microsoft’s terminology got in the way. For example, the rest of the world calls it “split brain” but MS calls it “split cluster”. Sometimes getting over the lexicon differences was the biggest hurdle.

Sep 27

Agent Decomission in EM13c

I recently lost a clustered Oracle RAC system and had to blow away GRID_HOME and RDMBS_HOME. This meant my cluster-aware install of the Enterprise Manager 13c agents was lost as well. So I simply removed AGENT_HOME. I then removed the OraInventory and proceeded to start from scratch. Once I had Grid Infrastructure up and running and my Oracle RAC databases operational, I had to back to Enterprise Manager and get everything right again. EM13c was reporting the agents were unreachable, which I knew. I now need to get this production system back into EM13c. The first part is to remove the agent and all of its targets. That is pretty simple in EM13c. I just went to Targets–>All Targets and selected one of the agents. In the Agent’s home screen, I clicked on Agent –> Target Setup –> Agent Decomission.

agent_decomission

It then prompted me to ensure the agent was down, which of course it is. I pressed Continue. The next screen shows me the targets monitored by that agents. I pressed Continue to confirm. Another confirmation asking me if I really want to do this and I pressed OK. After a few minutes, the agent and its targets are removed. Doing it this way is easier than removing the monitored targets first, then removing the agent.

Once all of the agents were removed from EM13c, I went to Targets –> All Targets to confirm I had no other targets missed by this process. I was then able to re-add the agents the usual way.

Sep 09

Archiver Hung due to COMPATIBLE ORA-16484

This morning I woke up to a few alerts from EM about my archiver being hung, similar to the following:

Target type=Database Instance 
Target name=orcl4 
Categories=Fault 
Message=The archiver hung at time/line number: Fri Sep 09 06:07:22 2016/376. 
Severity=Critical

I used the DG Broker to stop and then restart log transport.

edit database orcl set state=transport-off;
edit database orcl set state=transport-on;

But the archiver would still get hung. So its off to the alert log to get more clues. I found this in the primary’s alert log:

TT00: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16484)
TT00: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Fri Sep 09 08:07:40 2016
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl4/trace/orcl4_tt00_16068.trc:
ORA-16484: compatibility setting is too low

The error message seems self explanatory. I have COMPATIBLE set too low. At this point, I remembered that I changed COMPATIBLE in the primary a month ago. I must have forgotten to also change this in the standby. A quick verfication proved my hypothesis. COMPATIBLE is set to 12.1.0.2 in the primary but 11.2.0 in the standby. So there’s my problem. I changed COMPATIBLE in the standby, bounced it and then resumed log transport. Life was fine and everything was fixed.

If you remember right, I said that I changed COMPATIBLE in the primary a month ago. Why was this a problem today and not back then? To know that, you’d have to know the change history for this database. Last night, we released new code to production. Part of the code release was to include a new table that used Oracle 12c’s new IDENTITY column feature. This was the first 12c-only feature we deployed in our code base. The standby was attempting to create the table with the new feature but that operation could not complete because of the improper parameter setting. I’m still a little confused how this affected log transport. I would have expected only log apply to be broken, but this is how it manifested itself.

 

Sep 01

N+1 Redundancy and Server Consolidation

In a previous blog post, I talked about designing your RAC implementations with N+1 redundancy. This ensures the loss of a node does not overwhelm the remaining nodes should one node fail. Today, I’m going to discuss the impact of server consolidation, specifically CPU cores, and its potential N+1 impacts.

One of the main systems I work on today is a four node Oracle RAC cluster. Each nodes has 4 cores. I am licensed for all 16 cores (4 cores/node * 4 nodes = 16 total cores) in my environment. I designed this system with N+1 redundancy. If I lose a node, and lose 4 cores, I’m still good because my workload demands only require 12 cores to maintain normal, acceptable levels of performance.

When this system was initially designed, 4-core servers were commonplace. Today’s environment is different and it is getting harder to find 4-core servers. They still exist, but hardware vendors are pushing systems with greater core counts.

In a conversation with my SysAdmin recently, he wanted to order 6-core systems to replace our 3-year old servers. Well we can’t simply do that. My Oracle license is for 16 cores. I could deploy three 6-core systems, but then I’d have a total of 18 cores in the cluster and I’d need to be 2 more cores worth of Oracle licenses. If I deployed two 6-core systems, I’d have a total of 12 cores and have 4 cores worth of licenses sitting unused.

I also informed the SysAdmin about our N+1 design. The impact of moving to 6-core systems can have major impacts to the N+1 design. Remember that I said earlier that our workload demands require 12 cores to maintain normal operational levels. If we deploy 6-core machines, then two of them meet our requirements and another node, the “+1”, would be needed to ensure we can lose a node without major impacts to performance. We’d need to deploy three 6-core machines to keep our N+1 design goal intact. But this means we need to increase our license counts as I stated earlier.

At this point, my SysAdmin thought he had a good idea…we could buy two 8-core servers. That’s still 16 total cores and exactly what we are licensed for today. No increase in license fees. But if we lose a node, we go down to 8 total cores in operation which is less than I need. This brings up a very good point…

…Increasing the core count on your servers can have detrimental impacts on your N+1 redundancy.

Right now, there is no easy answer. We can still buy 4-core servers so that’s what we’ll replace the current ones with next year. But there will come a day when 4-core servers are impossible to find. We’ll need to have a plan in place by that time, keeping in mind our N+1 design goals.

If only I could hard-partition our Linux servers and let cores sit idle and still be in compliance with our Oracle license agreements.

Older posts «