May 21

ORA-16205 Upgrading to 11.2.0.3

I am trying to upgrade all of our databases from 11.2.0.2 to 11.2.0.3 this year. My first upgrades were on a 2-node RAC primary with 2-node RAC standby database in a testbed. There isn’t much to this database as it is just a starter database. But this let me test out the upgrade on RAC databases with both a primary and standby. I documented the process along the way.

After that, I was ready to upgrade our development and test databases. I used the DBUA to perform the upgrade and it went off without a hitch. Our dev and test databases are clones of production and we were able to test our custom application with the new version.

Finally, I was ready to upgrade production. Again, I used the DBUA to perform the upgrade of the primary database. But this time, I hit an error:

ORA-16205 log_archive_dest2 contains repeated or conflicting values

Uh oh. This was unexpected as I had not seen this error in any of my previous upgrades. The dev and test databases do not have a standby, so LOG_ARCHIVE_DEST2 is not set. My testbed is set up slightly differently, so I did not catch the issue there. Since this was an unforeseen event, I cancelled my upgrade that evening and decided to get to the bottom of the issue and reschedule the upgrade for a later date.

I discovered that Bug 13387526 (fixed in 11.2.0.4) can cause this issue for this parameter when you do STARTUP UPGRADE on the database. In my testbed, I created a RAC 11.2.0.2 database with a RAC standby. In the primary, I set LOG_ARCHIVE_DEST_2 to something very similar to what was in production. I attempted the upgrade on this testbed and ran into the same bug. To get around the problem, I set LOG_ARCHIVE_DEST_2 to ‘service=my_standby db_unique_name=my_standby’. With this minimal parameter setting, the upgrade then proceeded correctly. After the upgrade was done, I set this parameter back to its original setting.

Apr 17

Auto Generating Response File

I am working on upgrading multiple databases to the Oracle 11.2.0.3 version. Before the upgrade, I need to install this version on multiple machines. We’ve known for a long time that you can generate a response file and use this for a silent install, speeding up the installation. The way I’ve always created my response file is to open the .rsp sample in a text editor and then in another window, fire up the OUI. I walk through the OUI and then figure out where that screen’s items are in the response file and update the response file accordingly. I recently discovered (although it is completely obvious) that the 11gR2 OUI automatically records your responses as you work through the OUI. When you get to the Summary screen, there is an option to save the responses to a .rsp file.

After pressing the button and saving the response file, you can either Cancel the OUI or proceed with the Install.

Now to proceed with a silent install, do the following:

cd {Oracle media directory}

./runInstaller -silent -responseFile /dir/response_file_created_above.rsp

Mar 21

.patch_storage

I got an alert from Enterprise Manager that my C: drive on one of my database servers was running short on space. The disk device is 20GB is size and had less than 1GB free. I removed old log files, but this still did not free up enough space. In looking around, I saw that my 11gR2 ORACLE_HOME directory was over 7GB in size. More investigation revealed that the %ORACLE_HOME%\.patch_storage directory was consuming over half of that! You can’t just go and delete this folder as it will stop your ability to apply future patches.

I discovered Metalink Note 550522.1 which gave me a nice feature of OPatch to clean up some of this space.  You can see this feature below:

C:\>%ORACLE_HOME%\OPatch\opatch util cleanup

Invoking OPatch 11.2.0.1.1

Oracle Interim Patch Installer version 11.2.0.1.1

Copyright (c) 2009, Oracle Corporation.  All rights reserved.

UTIL session

Oracle Home       : c:\oracle\product\11.2.0.2

Central Inventory : C:\Program Files\Oracle\Inventory

from           : n/a

OPatch version    : 11.2.0.1.1

OUI version       : 11.2.0.2.0

OUI location      : c:\oracle\product\11.2.0.2\oui

Log file location : c:\oracle\product\11.2.0.2\cfgtoollogs\opatch\opatch2013-03-

21_13-02-46PM.log

Patch history file: c:\oracle\product\11.2.0.2\cfgtoollogs\opatch\opatch_history

.txt

Invoking utility “cleanup”

OPatch will clean up ‘restore.sh,make.txt’ files and ‘rac,scratch,backup’ direct

ories.

You will be still able to rollback patches after this cleanup.

Do you want to proceed? [y|n]

y

User Responded with: Y

Size of directory “c:\oracle\product\11.2.0.2\.patch_storage” before cleanup is

3921310486 bytes.

Size of directory “c:\oracle\product\11.2.0.2\.patch_storage” after cleanup is 2

233118071 bytes.

UtilSession: Backup area for restore has been cleaned up. For a complete list of

files/directories

deleted, Please refer log file.

OPatch succeeded.


Running this cleaned up about 1.4GB of space.

Feb 12

Testbeds

I have often mentioned in my posts that testing/testing/testing is required for specific tasks. Testing lets you accomplish two tasks: 1) make sure your change works as intended and 2) make sure your change does not cause unforseen problems.

Today, when reading some of the many blogs I visit, I came across this article from Pythian:

http://feedproxy.google.com/~r/PythianGroupBlog/~3/k4tn0CoKPJ8/

I just wanted add that setting up a learning lab, or test bed as I call it, is not just for the home. If you do not have something like this at work, then I strongly recommend talking with your supervisor. With today’s VM technology, there is no reason you cannot setup testbeds to facilitate your work. My current employer fully embraces such testbeds. We re-purposed old VM hardware for our testbed, but as the article above indicates, you can do this on a much smaller scale.

At my current job, I have a two-node RAC primary cluster and a two-node RAC standby cluster in my testbed environment, all running on VM’s. Our production environment is RAC for both a primary and a standby. This RAC testbed setup was invaluable to me when I was planning and testing our upgrade from Oracle 11.1 to 11.2 as quite a bit changed in Grid Infrastructure for the 11.2 version.  With this testbed, I was able test out a number of issues before in advance. When I finally upgraded our production environment, the upgrade went without a hitch thanks to being able to test things out before hand.

On the flip side of the coin, I was recently upgrading a four-node RAC system from Oracle 10.2.0.4 to 11.2.0.3 for another company. This company did not have any such testbed so our upgrade went straight to production without any testing at all. As you can probably imagine, things did not go well. We had planned on upgrading the Clusterware software first and then upgrading the database at a later date. The actual clusterware upgrade went smoothly, but the 10.2 instance would not start on the 11.2 Grid Infrastructure. I later found out that 10.2 pins the instances to specific nodes and you have to perform an additional step before the 10.2 instances will start on the 11.2 Grid Infrastructure.  If we had a testbed available, we would have seen this in advance. This company is now starting to utilize Virtual Machines more fully and I will be talking to them about setting up a testbed for their Oracle RAC databases.

Just recently, I have been asked my by supervisor to start training a co-worker on Oracle database administration. The first thing I did was to get a VM setup as his personal playground. We are installing Oracle 11.2.0.3 there and he is creating his first database this week. This nice thing about using VM’s is that he can do whatever damage he wants and he will not be affecting our dev/test/production databases. Once he has learned some basic skills, we can transition him to our corporate database.  But until then, if he breaks his database, he won’t be affecting anyone else. Using VM’s in this manner is an excellent idea.

With his testbed, we have the capability of creating a snapshot of the VM, but I’m not sure how much we’ll use that. Part of learning is to break things and fix them. If he gets used to just reverting back to a previous snapshot, he won’t be fixing things that break.

I’ve also used VM’s for short periods to test out new versions. When SQL Server 2012 was released, I had my server admin create a VM for me for about 3 months. I was able to install SQL Server 2012 in the VM without affecting anything. I could then play around with that new database version. Once I had seen many of the new features, I was done with that VM and I asked my server admin to destroy the VM. I’m sure that when Oracle 12c is released, I will do something similar. In the old days, I would install the new version on my workstation but using VM’s is cleaner in my opinion. And others on my team can play around in that VM as well.

If you do not already have Virtual Machines leveraged for your testbeds, I highly recommend you work towards getting them set up. VM’s are a cost-effective way to test out many of your database administration changes before implementing in other environments.

Jan 31

EM 12c Adjusting Threshold Values

On a test system, I’ve been getting frequent alerts from Enteprise Manager 12c Cloud Control that my host is running low on swap space. Since this is a test system, and since I know my swap space is just fine, I decided to increase the Warning threshold from 80 to 87%. In doing so, I found a nice little new feature of EM 12c. When I got to the screen to adjust the threshold settings, I noticed a button titled “Test Thresholds”.

So I modified the Warning value from 80 to 87 and then pressed the button to test out my new setting. The graph below the button changes. With this new feature, you can see if the new value for the threshold works for the actual values graphed over time. In my case, I was able to easily see that a value of 87 was above all the values in the graph. So hopefully this metric will stop alerting me to a problem that does not exist.

Jan 24

Tasklist

I’m sure that someone reading this will ask, “how did you not know that?”. But sometimes the simplest things come to us late in the game while others knew it early on.

I’ve been patching Oracle on Windows since the Windows NT days.  Everyone who patches Oracle on Windows knows that OPatch will often balk at the fact that some DLL files are locked by other processes. The easiest way for me to get around this is to just rename the DLL files that will be replaced by the patch. The OS won’t let you delete the DLL files, but you can rename them and then OPatch can proceed. In the early days of OPatch, I would run the apply, it would spit out the DLL that was locked. I would rename the DLL file and then run OPatch again to get the next DLL file that was locked. This process would repeat about 20 times until OPatch found no locked DLL files. At least now when you run OPatch, it gives you a full list of DLL files that are locked.

Sometimes I want to know process that is locking the DLL file. In the past, I would download one of those freely available utilities to show me the process(es) locking the file in question and terminate the process. But I don’t like freeware bloating up my servers so I often remove the utility when I am done. Until the next time I need that utility…

Someone recently pointed me to Metalink Note 294350.1 which contained one little nugget I was not privy to prior to reading this document. Apparently, Windows includes a command-line utility to show which processes are locking DLL files. I can now do:

tasklist /m

The output can be long, so it is beneficial to store the output in a file and then open up the file in a text editor:

tasklist /m > c:\oracle\task_list.txt
notepad c:\oracle\task_list.txt

I can then use the text editor’s search function to locate the DLL in question.

Now how is it that this little utility has escaped my attention all these years?

Jan 22

Trailing Zero

I had a developer ask me an interesting question recently. He was working on a problem where numeric values were stored in a table but when he queried that table in PL/SQL Developer, it would show trailing zeroes after the last digit. He wondered if this was contributing to the problem he was trying to debug. The developer needed to know if Oracle was storing those trailing zeroes.

My response was that Oracle does not store trailing zeroes. Oracle only stores the exponent and mantissa of the number.  Oracle does not right-pad the numeric value with zeroes. The developer now knew that his problem was not with the data in the database but rather with something his development platform was doing.

But was my statement true? Many times I’ve made a statement about how Oracle works internally but then had to go back and validate my statement or prove that it was false which inevitably leads to the correct statement.

To test out my statement, I created a simple table and inserted data into it.

SQL> create table test_tab (val number(38,5));
Table created.
SQL> insert into test_tab values (25);
1 row created.
SQL> insert into test_tab values (25.0);
1 row created.
SQL> insert into test_tab values (25.2);
1 row created.
SQL> insert into test_tab values (25.20);
1 row created.
SQL> commit;
Commit complete.
SQL> select * from test_tab;
VAL
----------
        25
        25
      25.2
      25.2

In SQL*Plus, we do not see any trailing zeroes even though I explicitly added them in. The values 25 and 25.0 as well as 25.2 and 25.20 all look the same. But maybe this is just how SQL*Plus is displaying the values. So let’s dump the data block to see how exactly Oracle is storing these values.

SQL> select file_id,block_id,blocks
2  from dba_extents where segment_name='TEST_TAB';
FILE_ID   BLOCK_ID     BLOCKS
---------- ---------- ----------
6        128          8
SQL> alter system dump datafile 6 block min 128 block max 135;
System altered.

I had to determine the file and block number for my segment I created. I then issued the command to dump the contents of the data blocks to a trace file. When you look in the trace file, search for the keyword “block_row_dump” and you can see the contents of these rows in dump below:

block_row_dump:
tab 0, row 0, @0x1f92
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 1a
tab 0, row 1, @0x1f8c
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 1a
tab 0, row 2, @0x1f85
tl: 7 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 3]  c1 1a 15
tab 0, row 3, @0x1f7e
tl: 7 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 3]  c1 1a 15
end_of_block_dump

We can see from the block dump the first value is 2 bytes long and consists of the hex characters “C1 1A”. The second row has the same exact values! This is important because it verifies my initial claim that Oracle is not storing any extra zeroes for the second row in the table. If there were an extra zero, the length would not be 2 bytes. For the third and fourth row, we can see the hex values are identical, “C1 1A 15″.

But let’s be sure that these hex values correspond to our data. To do that, we’ll use the DBMS_STATS.CONVERT_RAW_VALUE procedure.

SQL> set serveroutput on

SQL> declare

  2  n number;

  3  begin

  4  dbms_stats.convert_raw_value('C11A',n);

  5  dbms_output.put_line(n);

  6  end;

  7  /

25

 

PL/SQL procedure successfully completed.

 

SQL> declare

  2  n number;

  3  begin

  4  dbms_stats.convert_raw_value('C11A15',n);

  5  dbms_output.put_line(n);

  6  end;

  7  /

25.2

 

PL/SQL procedure successfully completed.

So the hex values “C1 1A” are the internal (raw) representation of ’25′ and “C1 1A 15″ is 25.2 as we would have expected.

The moral of this story is that sometimes when you think you know how Oracle is working internally, you may still have to devise a test case to validate your statements.

Jan 10

Changing SYS password in RAC

I recently came across this blog post about changing the SYS password in RAC databases:

http://marioalcaide.wordpress.com/2011/05/24/changing-sys-password-in-rac-databases/

The author of this blog post makes the claim that the “SYS password is instance specific in RAC databases”. This claim is incorrect. I posted a comment to prove this falseness of this statement and offered the real culprit in the author’s problem which was a non-shared password file. After two weeks, my comment is still awaiting moderation so people will not see it. I decided to post my comment to this post here as a correction. Here is my comment to this blog post:

Sorry, but this is, on its face, incorrect. The SYS password is stored in the Data Dictionary. The encrypted hash combination of the username/password is stored in SYS.USER$. As such, when you change the SYS password in one instance, it updates SYS.USER$ and the change is available in all other RAC instance. So on its face, your post is incorrect. To prove my point, lets look at this example:

SQL> select instance_name from v$instance;

INSTANCE_NAME
—————-
resp1

SQL> alter user sys identified by MynewPass;

User altered.

On my instance RESP1, I changed my SYS password. Now let’s use that password to sign on to another instance in my RAC database:

SQL> connect sys/MynewPass as sysdba;
Connected.
SQL> select instance_name from v$instance;

INSTANCE_NAME
—————-
resp2

As you can see, I signed on to the RESP2 isntance with the new password yet I never changed the password for SYS in that instance.

Your problem is that when you connect as SYS, you must also connect as SYSDBA. When you connect as SYSDBA, no matter which SYSDBA user you connect with, Oracle needs to verify the password in the password file. I’ll bet that each instance has its own password file, which is why you had to change the SYS password in each instance. You noticed this behavior and had to update each password file. It is incorrect to say “SYS password is instance specific in RAC databases”. What is more correct would be that for your configuration, your password files are instance specific. As such, any SYSDBA user would appear that their password is instance specific.

To get around this, I move the password file to my shared storage. So on one instance, do something like this on node 1:

cp $ORACLE_HOME/dbs/orapworcl1 /u01/app/oracle/oradata/orcl/orapworcl

Now that the password file is in one shared location, let’s get all instances to use the same password file. On each node, do:

cd $ORACLE_HOME/dbs
rm orapworclX
ln -s /u01/app/oracle/oradata/orcl/orapworcl orapworclX

In the rm and ls commands, change X to your instance number on that node. Each instance will now have a softlink pointing to the same password file. Once this is complete, the SYS password will no longer be instance specific (it never was) and your password file will no longer be instance specific. When you change the password for any SYSDBA user, you will only have to change it on one instance.

Reminder: If this database is a primary for a standby database, don’t forget to copy the password file to the standby database.

Dec 19

Extend EM Grid Control to New Nodes

If you’ve followed some of my recent posts, you know that I have been working on extending my existing RAC database cluster to new hardware. I now have my instances running on the new nodes. My applications are now connecting to the new instances. It’s now time to work on some of the behind-the-scenes items.

I now need to extend my Enterprise Manager Grid Control 11g to cover the new instances. Initially, I went to Deployments–>Install Agent–>Fresh Install and followed the wizard to push the agent to the new nodes. But when I looked at my Databases tab, I saw two entries for my databases. I had the original entry like orcl.acme.com and a new entry orcl.acme.com_myclu for each database. The new entries were appended with the cluster name and only included the new instances, not the old instances. The old entries only had the old instances, not the new instances. What I need is one entry in the Databases tab that has all instances for the clustered database.

I found Metalink Note 800953.1 How To Extend Agent After Adding The New Node To Existing Cluster Environment. I followed the instructions, running that agentca utility on each node as directed. But this was of no help. I still had the old cluster database entry in the Databases tab. And for each new node, I had one entry like orcl.acme.com_orcl4 where the last part was the instance name. I don’t want Grid Control to manage the instances individually.

In all of the above cases, I removed the new entries from my Databases tab, leaving my original 3-instance clustered databases.

To fix there, here are the steps that I took:

  1. Click on the Databases tab.
  2. Select the radio button next to the clustered database and press the Configure button.
  3. Scroll down the next page and you will see the old instances. Press the Add button.
  4. On the Add Instance page:
    1. make sure the instance name is correct. For me, it is of the form orcl.acme.com_orcl4
    2. Choose the new host.
    3. The Listener Machine Name should use the correct vip (hostname-vip) for this host.
    4. The Database SID should make the instance name (orcl4)
  5. Press OK.
  6. You should get a Success message.
  7. Repeat for each new instance.
  8. Press the Next button.
  9. Press Submit.
  10. Click on the Databases tab.
  11. Click on the database name
  12. Scroll down and you should see your new instances.

I couldn’t find this documented anywhere so I hope this helps someone else in the future.

Dec 18

Manually Add New RAC Instance

To start this post, I will say that if at all possible, you should use the DBCA to add a new instance to your RAC database. The DBCA will take care of all of the details for you. It’s a simple 6-page wizard and your new instance is up and running on your new node.

Recently, I was charged with adding a new instance for our production databases running on our 3-node cluster. I was able to create the fourth instance on the fourth node for 2 of the 3 databases running on this cluster. When I attempted to extend the third database to the fourth node, I got an error informing me that the $ORACLE_HOME directory was not shared on all nodes. Well for starters, the $ORACLE_HOME directory is not required to be shared on all nodes. And this worked for the other two databases in the same home directory. I have yet to figure out why there is a problem with this third database. After two weeks, my Service Request to resolve the issue had virtually zero progress. So I decided to manually add the instances since Oracle Support is not providing the help I require. In my mind, I knew that manually adding the instances was not that difficult, but I could not find much information on how to actually do it. This post shows the steps to manually add a new instance to your RAC database.

1. Set instance-specific parameters to your SPFILE. I had to add the following parameters for my new instance:

alter system set instance_number=4 scope=spfile sid=’orcl4′;

alter system set local_listener=’LISTENER_ORCL4′  scope=spfile sid=’orcl4′;

alter system set log_archive_format=’%t_%s_%r.dbf’ scope=spfile sid=’orcl4′;

alter system set log_archive_trace=0 scope=spfile sid=’orcl4′;

alter system set thread=4 scope=spfile sid=’orcl4′;

alter system set undo_tablespace=’UNDOTBS4′ scope=spfile sid=’orcl4′;

2. One of the parameters requires the LISTENERS_ORCL4 entry in the TNSNAMES.ORA file. Make this entry in the file and any other entries you need for this new instance.

3. I put my spfile and password file on shared storage. On the new node, I need to softlink to those.

cd $ORACLE_HOME/dbs

ln -s /u01/app/oracle/oradata/orcl/data01/spfileorcl.ora spfileorcl4.ora

ln -s /u01/app/oracle/oradata/orcl/data01/orapworcl orapworcl4

4. Add online redo log groups.

alter database add logfile thread 4 group 40
( ‘/u01/app/oracle/oradata/smp/redo01/redo40a.log’,
‘/u01/app/oracle/oradata/smp/redo01/redo40b.log’) size 52429312;

5. Add standby redo log files.

alter database add standby logfile thread 4
‘/u01/app/oracle/oradata/smp/redo01/smp_4srl400.f’ size 52429312;

6. Since I have a standby database, I need to add online redo logs and standby redo logs in the standby database as well. See my previous post.

7. Enable redo thread.

alter database enable thread 4;

8. Create the new instance in Grid Infrastructure

srvctl add instance -d orcl -i orcl4 -n myhost04

9. Start the new instance

srvctl start instance -d orcl -i orcl4

10. On the new node, verify SMON is running.

ps -ef|grep smon

11. Verify the instance is open.

select instance_number,instance_name,thread#,status from gv$instance;

12. Modify /etc/oratab on new node to contain instance entry

orcl4:/u01/app/oracle/product/11.2.0.2:N

13. On the new node, sign on to the instance with SQL*Plus to ensure you can access the instance.

14. Verify instance with GI.

srvctl status database -d orcl

15. The instance is now up and running on the new node. All that remains is to make any database services available or preferred on that new node and you connections can start using the new instance.

srvctl modify service -d orcl -s orcl_svc -n -i orcl1,orcl2,orcl3,orcl4

srvctl start service -d orcl-s orcl_svc -i orcl4

srvctl status service -d orcl

That’s all there is to it!

Older posts «