Mar 26

Grid Infrastructure Upgrade Assistants

Yesterday I was working on testing an upgrade of Grid Infrastructure 11.2.0.3 to 11.2.0.4 in my testbed. This is my first step in my larger plan to upgrade our production systems to 11.2.0.4 this calendar year.

For my X-windows environment which I need to run the Oracle Universal Installer, I run Ubuntu Linux in Virtual Box on my desktop. Yesterday I had the cluster installation completed and the OUI prompted me to run the rootupgrade.sh scripts on my cluster nodes. I needed to go home for the day so I figured I would come back to this later on. That evening, I ran the rootupgrade.sh scripts from home. My plan was to press the OK button on the OUI when I returned to the office this morning.

I arrived at the office and settled down with my morning caffeine habit. I was just getting ready to finish the upgrade when my workstation enjoyed a nice Blue Screen of Death. It’s been doing that a lot lately so I think I’m due for a new workstation, but that digresses from this discussion.  Obviously, I lost my Virtual Box guest OS which means I lost my OUI progress as well.

After my workstation returned, I started weighing my options. The rootupgrade.sh script has been run on each node in the cluster. All that remains to complete the upgrade is to run the configuration assistants. But I can’t startup the OUI anew. One option is to blow away my entire cluster and rebuild from scratch. It’s a testbed after all so I can do that. But I’ve done that a number of times in the past. It is time consuming and I wouldn’t learn anything new from the exercise. So I set out to determine if there was a better way.

I discovered Note 1360798.1 “How to Complete 11gR2 GI Config Assistant if OUI Is Not Available” on Metalink which was exactly what I needed.  The only thing in that note that seemed odd is that in step 1, the instructions appear to be giving me the command to create a new configToolAllCommands file. What is not clear, but yet what it is asking me to do is to create a file that contains this command. The command in step 1 should be the contents of this script file. I looked back in my old $GRID_HOME/cfgtoollogs to verify this.

My cluster upgrade is now complete. To look up this Note, fix my issue, and write this blog post took me about 30 minutes total. Rebuilding the cluster would have taken hours. And I learned something new in the process.

Feb 07

Obtain Database Patchsets

I see this question asked often on the My Oracle Support Communities, so it must not be that obvious as it seems to be. So I figured I would post this on my blog. The question is “how do I obtain the XX patchset for my Oracle database”? Sometimes people post the link to the exact patchset. But here is how you can find any patchset for any Oracle database version.

Sign on to Metalink (http://support.oracle.com) and click on the Patches and Updates tab.

On the next window, click on the Latest Patchset link. This will popup a another window so make sure your browser allows pop-ups, if nothing else for at least this site.

Next, mouse over (do not click) on Oracle Database, you will see a menu of platforms. Mouse over your platform of choice and you will be given a submenu showing all of the patchsets for that platform. Click on the link for the patchset of that platform that you are seeking.

The next screen will contain the download link and the readme file for this patchset. That’s all there is to it!

Feb 07

Oracle Interface Changes

I regularly participate in the My Oracle Support Communities and answer lots of questions there. Late last week, Oracle changed the interface and like many people, I haven’t necessarily found it to be better. I am getting used to the new UI. And as I use it more and get past the learning curve, it isn’t nearly as bad as my first day with the interface.

Then today I was answering a question about an Oracle product certified for a specific platform. As with any certification question, the first place to go is My Oracle Support and click on the Certifications tab. I was surprised that the interface here underwent a subtle change as well. There used to be pulldown menus where I could select the Product, Release and Platform. Now I have to click in the and a popup menu appears. I can search for a specific product easier than before and it remembers my most recent searches. There is no new functionality here, just a minor change that I was unprepared to see today.

Feb 05

TEMPFILE Offline Physical Standby

I received an alert from Enterprise Manager Cloud Control that a tempfile was offline in my physical standby database.

Message=The temporary file /u01/app/oracle/oradata/ncpp/temp/temp01.dbf is OFFLINE.

We recently did a restore of the standby, i.e. recreated it from a backup. Surprisingly only 7 of the 8 tempfiles were available. One tempfile was missing. I found this odd, but I’d like to get the tempfile back.

I initially tried this, but it failed with an error:

SQL> alter system set standby_file_management=manual scope=both;
System altered.
SQL> alter tablespace temp add tempfile
2>  '/u01/app/oracle/oradata/ncpp/temp/temp01.df' size 20g;
alter tablespace temp add tempfile
*
ERROR at line 1:
ORA-01109: database not open

The solution is to put the physical standby in READ ONLY mode to add the tempfile.

SQL> alter database recover managed standby database cancel;
Database altered.
SQL> alter database open read only;
Database altered.
SQL> alter tablespace temp add tempfile
2  '/u01/app/oracle/oradata/ncpp/temp/temp01.df' size 20g;
Tablespace altered.
SQL> alter database recover managed standby database disconnect;
Database altered.

Dec 31

ORA-38868

When working on a standby database recently, I went to DG Broker to check the status and received this:

DGMGRL> show configuration
 
Configuration - resp_ress_config
 
Protection Mode: MaxPerformance
Databases:
resp - Primary database
ress - Physical standby database
Error: ORA-16766: Redo Apply is stopped

Hmm…my standby is not applying redo. When I attempted to start managed recovery, I got the following in the standby alert log:

Tue Dec 31 09:52:10 2013
Managed Standby Recovery starting Real Time Apply
Tue Dec 31 09:52:10 2013
MRP0: Background Media Recovery terminated with error 38868
Errors in file /u01/app/oracle/diag/rdbms/ress/ress2/trace/ress2_pr00_13905.trc:
ORA-38868: warning: the control file may have incorrect data file structure
Managed Standby Recovery not using Real Time Apply
Slave exiting with ORA-38868 exception
Errors in file /u01/app/oracle/diag/rdbms/ress/ress2/trace/ress2_pr00_13905.trc:
ORA-38868: warning: the control file may have incorrect data file structure
Recovery Slave PR00 previously exited with exception 38868
MRP0: Background Media Recovery process shutdown (ress2)

So the ORA-38868 is telling me that I have a bad directory structure. My first thought was that this was related to the work I blogged about yesterday. But that work was on the primary side. I looked back at the standby alert log and found the first occurrence of this error about 2.5 months ago. If this were a production system, I could be in big trouble letting this issue go unnoticed for that period of time. But I have measures in place to alert me if my production standby lags behind the primary for an unacceptable period of time. This is just a test system which I can blow away and start from scratch if I need to. But what fun would that be? Let’s see if we can fix the problem.

My first stop was Metalink. But I got zero hits for the ORA-38868 error. When doing a web search, I got one relevant hit which offered a solution of simply restarting the instance and restarting redo apply. I was skeptical, but attempted the easy fix. It should be no surprise that a simple restart of the instance did not fix the problem. The error is telling me that my control file has corruption in it. Restarting the instance is not going to fix control file corruption.  Since Metalink and the Internet are of no use, I guess its up to me to fix this. If all else fails, I will simply drop the standby and recreate it.

My initial solution is to go back to the primary and create a standby control file. Then startup the standby with the standby control file.  I have confidence that a new control file from the primary will resolve the issue. However, I need to apply 2.5 months of redo of which I no longer have available.

I’ve been trying to investigate using RMAN to roll forward a standby via an incremental backup. But this seems to fall low in my priority list of things to do. I have an upcoming project where I will need to know how to do this and that project is now less than one month away. So this seems like a perfect time to practice this technique for my upcoming project and to fix my current issue. The steps to do this are in:

Metalink Note 836986.1 Steps to Perform for Rolling Forward a Standby Database Using RMAN Incremental Backups

The steps in this document not only rolled forward my standby, but also recreated the control files, thus fixing my problem.

Dec 30

ORA-01618

This is my 100th post to this blog!!!

I am trying to duplicate a problem where I have threads 4, 5, and 6 in my primary and my standby has threads 1 and 2. When I did a switchover, everything was fine, but I could not switch back because the old primary, now the standby, is expecting redo from thread 4 which doesn’t exist in my new primary, the old standby. For my testing, I have a 2-node RAC primary and a 2-node RAC standby. Predictably, these both have threads 1 and 2. So I attempted to switch the primary to threads 3 and 4 by simply changing the THREADS parameter in the SPFILE. But on startup, I received:

PRCR-1079 : Failed to start resource ora.resp.db
CRS-5017: The resource action "ora.resp.db start" encountered the following error:
ORA-01618: redo thread 3 is not enabled - cannot mount
. For details refer to "(:CLSN00107:)" in "/u01/app/crs11.2.0.3/log/myhost01/agent/crsd/oraagent_oracle/oraagent_oracle.log".
CRS-2674: Start of 'ora.resp.db' on 'myhost01' failed
CRS-2632: There are no more servers to try to place resource 'ora.resp.db' on that would satisfy its placement policy
CRS-5017: The resource action "ora.resp.db start" encountered the following error:
ORA-01618: redo thread 4 is not enabled - cannot mount
. For details refer to "(:CLSN00107:)" in "/u01/app/crs11.2.0.3/log/myhost02/agent/crsd/oraagent_oracle/oraagent_oracle.log".
CRS-2674: Start of 'ora.resp.db' on 'myhost02' failed

This isn’t a simple parameter change. To get around this, we first have to create online redo logs to support the new threads:

SQL> alter database add logfile thread 3 group 30
  2  '/oracle_data/oradata/resp/redo30.log' size 52429312;
Database altered.
SQL> alter database add logfile thread 3 group 31
  2  '/oracle_data/oradata/resp/redo31.log' size 52429312;
Database altered.
SQL> alter database add logfile thread 4 group 40
  2  '/oracle_data/oradata/resp/redo40.log' size 52429312;
Database altered.
SQL> alter database add logfile thread 4 group 41
  2  '/oracle_data/oradata/resp/redo41.log' size 52429312;
Database altered.

Next, we enable the threads.

SQL> alter database enable public thread 3;
Database altered.
SQL> alter database enable public thread 4;
Database altered.

Lastly, we change the initialization parameters and bounce the instance:
SQL> alter system set thread=3 sid='resp1' scope=spfile;

System altered.

SQL> alter system set thread=4 sid='resp2' scope=spfile;

System altered.
Then bounce the instances with srvctl.
Once everything was back up, I disabled threads 1 and 2.
SQL> alter database disable thread 1;
Database altered.
SQL> alter database disable thread 2;
Database altered.

Dec 19

Standby Network Tuning

I’ve been working on tuning our network configuration between our primary and standby databases. This entry shows the steps I took.

The first thing to check is our bandwidth. I talked with our Network Admin and we have a DS-3 link between our primary data center at our corporate headquarters and our DR site 30 miles away. This normally means a bandwidth of 45Mbps. However, we also employ a WAN Accelerator to compress and optimize the network traffic. Our WAN Accelerator shows we are optimizing 70% of the traffic. This gives us an effective bandwidth of about 65Mbps (65Mbps * 70% = 45.5 Mbps).

The question now becomes, is this enough? To answer that question, I used the following query to determine the redo log generation each hour.

select inst1.hour,inst1.num_1,to_char(inst1.bytes_1,'999,999,999,999') as bytes_1,
inst2.num_2,to_char(inst2.bytes_2,'999,999,999,999') as bytes_2,
inst3.num_3,to_char(inst3.bytes_3,'999,999,999,999') as bytes_3
from
(select to_char(first_time,'YYYY-MM-DD HH24') as hour,count(*) as num_1, sum(blocks*block_size) as bytes_1
from gv$archived_log where inst_id=1 and thread#=1 and dest_id=1
group by to_char(first_time,'YYYY-MM-DD HH24')) inst1,
(select to_char(first_time,'YYYY-MM-DD HH24') as hour,count(*) as num_2, sum(blocks*block_size) as bytes_2
from gv$archived_log where inst_id=2 and thread#=2 and dest_id=1
group by to_char(first_time,'YYYY-MM-DD HH24')) inst2,
(select to_char(first_time,'YYYY-MM-DD HH24') as hour,count(*) as num_3, sum(blocks*block_size) as bytes_3
from gv$archived_log where inst_id=3 and thread#=3 and dest_id=1
group by to_char(first_time,'YYYY-MM-DD HH24')) inst3
where inst1.hour=inst2.hour and inst2.hour=inst3.hour
order by 1;

We have a 3-node RAC primary database. The above will get the number of logs and redo bytes generated each hour of the data from V$ARCHIVED_LOG. I’m sure there is a more elegant way of writing this query, but this is how I did it. I exported the results to an Excel spreadsheet which let me convert the bytes per hour to megabits per second. I then determined the one singular hour which had the highest Mbps rate. I determined that our max hour period of time generated 57.08Mbps of redo. Our bandwidth needs are sufficient.

The next thing to do was to compute our Bandwidth Delay Product (BDP).  The BDP is a function of our bandwidth and the latency. The latency is about 10ms between the sites, but I will use 15ms to account for times of higher network congestion. As previously stated, the DS-3 link is 45Mbps.

BDP = 45Mbps * 15ms

= 45,000,000 * 0.015

= 675,000 megabits / 8 bits = 84,375 bytes

The calculations I have seen specify that we need a buffer size 3 times larger than the BDP. Which gives us the following calculation:

Buffer = 84,375 * 3 = 253,125

I rounded this up to 262,144 (which is 1024 * 256 = 2^10 * 2^8 = 2^18) . I like things to be nice powers of 2). This means I need to set the Oracle Net RECV_BUF_SIZE and SEND_BUF_SIZE to 262144. I also need to set the Session Data Unit (SDU) to be 32767.

Before we can begin, we need to look at the Linux kernel to verify the read and write max TCP buffer sizes:

[oracle@myhost ~]$ /sbin/sysctl -a | grep net.core | grep mem_max
net.core.wmem_max = 1048576
net.core.rmem_max = 4194304

The write max buffer size is about 1MB. The read max buffer size is about 4MB. I previously calculated the buffer size to be 262,144 bytes which is less than these maximums. If the max values were too low, then I would need to adjust the Linux kernel settings.

Now we need to check the kernel socket size for reads and writes.

[oracle@myhost ~]$ /sbin/sysctl -a | grep ipv4.tcp | grep mem
net.ipv4.tcp_wmem = 4096        16384   2067936
net.ipv4.tcp_rmem = 4096        87380   2067936

The three values are, in order, minimum, default, and maximum. We only need to verify that the maximum is larger than our computed value of 262,144 bytes, which they are.

Now I am on to the network configuration changes. As stated earlier, I need to set the SDU to 32767 and to handle the new TCP buffer sizes. This is done in my tnsnames.ora config file. First, I need to verify the service name I am using.

SQL> show parameter log_archive_dest_2
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_2                   string      service="ress", LGWR ASYNC NOA
FFIRM delay=0 optional compres
sion=disable max_failure=0 max
_connections=1 reopen=300 db_u
nique_name="ress" net_timeout=
30, valid_for=(all_logfiles,pr
imary_role)

From this, we can clearly see the service name being used.  Now we need to modify the alias for this service name in our tnsnames.ora config file. It should read as follows:

RESS =
(DESCRIPTION =
 (SDU=32767)
 (SEND_BUF_SIZE=262144)
 (RECV_BUF_SIZE=262144)
(ADDRESS = (PROTOCOL = TCP)(HOST = dr-rac01-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = dr-rac02-vip)(PORT = 1521))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = ress)
)
)

Note that primary is a three-node RAC system. So I need to make the same change on the other nodes as well. Also note that I need to make a corresponding change to the standby side in case I ever need to perform a switchover operation.

Now I need to update the listener.ora on each node of the primary and each node of the standby.

SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
 (SDU=32767)
 (SEND_BUF_SIZE=262144)
 (RECV_BUF_SIZE=262144)
(GLOBAL_DBNAME=RESP_DGMGRL)
(ORACLE_HOME=/u01/app/oracle/product/11.2.0.3)
(SID_NAME = resp1))
(SID_DESC=
(SID_NAME=plsextproc)
(ORACLE_HOME=/u01/app/oracle/product/11.2.0.3)
(PROGRAM=extproc)))

Don’t forget to reload the listeners. Since this is RAC on both primary and standby, I just used srvctl to start/stop the listeners.

That’s all there is to it.

Dec 13

SQL Developer 4.0 Released

The latest/greatest version of Oracle’s free SQL Developer has now been released. Previously, version 4.0 was in Early Adopter release, which I think is just a fancy way of saying “beta”.

If you want to see new features or tips and tricks, then go to That Jeff Smith’s blog. He is the product manager for this product and writes about lots of new things in the product. I visit his blog regularly.

One new feature I had not previously seen but I am going to love is the ability to have new worksheets use a new connection. In earlier versions, if you opened a second worksheet on the same database connection, that second worksheet shared the connection of the first worksheet. This meant that if you entered a long-running query on the first worksheet that the second worksheet was essentially unable to issue a new query until the first worksheet was done with its query. The workaround was to create a new connection with the same userid/password but a different connection name. Now in SQL Developer 4.0, you can to go Tools–>Preferences. Then in Database –> Worksheet, check “New Worksheet to use unshared connection” and then click Ok. The downside is that if you open multiple worksheets, you will have multiple sessions in the database.

I have been using this product since 2005 when it was named Project Raptor. In the beginning, this product was slow and not very good. I think this product is outstanding in its current form. Version 4.0 is a significant upgrade from 3.x. I highly recommend that you download it today. Best of all…its free!

Dec 10

Flushing Single Cursor

Earlier today, I was answering a question where someone proposed as a possible solution the idea of flushing the Shared Pool to solve a problem with one SQL statement. I find this to be bad advice. As I stated in my reply, why kill all end user’s performance to solve one guy’s problem? My answer was that if we needed to remove that SQL statement from the Shared Pool, let’s flush the cursor. This ability has been around since Oracle 10g. And Oracle employee blogged the details here:

https://blogs.oracle.com/mandalika/entry/oracle_rdbms_flushing_a_single

The capability exists in Oracle 11g. I assume this is still available in 12c as well. I haven’t checked. But I wonder if it is needed in Oracle 12c with the adaptive optimizer new features?

Nov 25

EM12c Database Time Spent Waiting alerts

I have an Oracle EM12c environment (version 12.1.0.3) set up to monitor my production databases. On one database, I occasionally get an alert for the ‘Database Time Spent Waiting (%)’ category. Yet I have disabled this metric in EM.

I found out that this metric is a server-side alert. Disabling in the OMS does not stop the metric from being collected by the agent on the server side. The part that I don’t understand is that if I have disabled the metric in the OMS, even if the agent informs the OMS of the problem, shouldn’t the OMS ignore it? Why do I still get the alert?

To adjust this, I had to sign on to the database in question and issue this:

begin
dbms_server_alert.set_threshold(dbms_server_alert.db_time_waiting,
dbms_server_alert.OPERATOR_GE,99,
dbms_server_alert.OPERATOR_GE,100,
60,6,
'db_name',dbms_server_alert.object_type_event_class,'Network');
end;
/

I set the alerts to have a warning threshold of 99 and critical of 100. Note in red that this is for the Network category. The possible categories are:

  • Administrative
  • Application
  • Cluster
  • Commit
  • Concurrency
  • Configuration
  • Network

I issued similar commands for the other categories. Now I no longer receive these alerts.

It is still frustrating that I have to take this action. I understand the concept that this is a server side alert. What I fail to see is why I still get an email. The OMS can surely be smarter than this. It should have the power to say that I received the alert but it has been disabled for this target so I’m not going to bother anyone.

Older posts «