Aug 19

Sticky Upgrade Problem

When performing database upgrades, adequate testing is important to understand the impacts, both positive and negative, the database upgrade has on the application. I have been preparing to upgrade databases from the 11.2.0.3 version to 11.2.0.4. One weekend, myself and another DBA spent some time upgrading about half of our production databases to the new target version. First thing on Monday morning I got a call from a Developer who had a query that was now running slowly. Why did we upgrade just half of the dev databases? For this specific reason. I immediately suspected the query performance was version related. I was able to formulate a reproducible test case. I ran the test case against all of the dev databases. The 11.2.0.3 databases executed the query in 30 seconds, consistently across the board. The 11.2.0.4 databases ran the same query in 3.5 minutes, repeatable in the same version. Because we only upgraded half of the databases, I was able to verify if the issue was version-related…and it was…at least on the surface.

After any database upgrade that has SQL statements performing worse, a common “fix” is to upgrade the table and index stats so that the new optimizer version has good information to work with. Updating stats did not fix the problem. I could see that in the 11.2.0.3 database, the CBO was choosing to use an index and since everything it needed was in the index, it did not access the table. Furthermore, the join was performed with a Nested Loops algorithm. In the 11.2.0.4 database, the same index was used but the table was also accessed and a Hash Join algorithm was used. Why was the CBO making two different decisions?

Anytime we need to peak into the CBO decision making process, it means we need to use the 10053 trace. I captured the trace files from each version. The first part of the trace file shows the optimizer-related initialization parameters. All the parameters were the same except for OPTIMIZER_FEATURES_ENABLE and DB_FILE_MULTIBLOCK_READ_COUNT. Neither of these are explicitly set so they are default values. Obviously, O_F_E has a different default value for each database version. I was surprised that DB_F_M_R_C changed its default value from 11.2.0.3 to 11.2.0.4. I tried to explicitly set the parameter values in the 11.2.0.4 database to match the 11.2.0.3 database but it did not improve the runtime. These parameters, while different, were not having any bearing on the query performance.

The next part of the 10053 trace shows statistics on the tables involved in the query. These were identical in both versions so stats weren’t the issue.

The next part of the 10053 trace shows the table access paths and which one is deemed to have the lowest cost. Here is where the mystery got interesting. In the 11.2.0.3 version, the CBO determined that the cost to access the table using the index was 1258 and the cost of using just the index alone was 351. In the 11.2.0.4 version, the CBO determined the cost to access the table using the index was 127 and the cost of using just the index alone was 351. In fact, all of the table access paths examined by the CBO were identical in both version. It was in that very first cost calculation that the CBO determined a low cost in 11.2.0.4 and a higher cost in 11.2.0.3, thus leading to one access path for one version and another access path for the other version. In the part of the 10053 where it considers which join method to use, the answers were different because the chosen access paths were different.

I still have no answers as to why 11.2.0.4 made that one calculation differently than 11.2.0.3 did, especially when all the other access path calculations were identical in both versions. That one puzzles me and I might need the help of higher powers to get to the answer.

That being said, I was able to determine the root cause of the problem and it wasn’t really version related after all. The problem was that we had in our WHERE clause the following:

WHERE column = :b1

It seems innocent enough. The problem is that the column is defined as VARCHAR2(4) and the bind variable is declared as NUMBER. Oracle performs an implicit conversion. Because the CBO doesn’t have an accurate picture of the bind variable’s contents, it obtains a suboptimal execution plan. Changing the datatype of the bind varialbe fixed the issue. The query now ran in 10 seconds! Wait…it went from 3.5 minutes down to 10 seconds, which is great, but in 11.2.0.3 it was running in 30 seconds. Because the bind variable had the wrong datatype there as well. The proper data type in the 11.2.0.3 had the query running in…you guessed it…10 seconds.  This is why I say the problem turned out to not be a version-related issue. We had the same problem in 11.2.0.3 a query that could be improved with the proper datatypes. The new version just magnified an existing problem we didn’t know we had.

All of this highlights the importance of proper testing even for simple patchset upgrades.

 

 

Aug 14

Result Cache

I was playing around with the Result Cache the other day…I know…this isn’t a new feature and has been available for awhile. Unfortunately, it can take a while to get to things I guess.

In my simple test, I had a query that exhibited this behaviour:

select
   max(det.invoice_date)
from
   invoices i
join
   invoice_detail det
on i.dept_id=det.dept_id

call    count       cpu   elapsed       disk      query   current       rows
------- ------  -------  -------- ---------- ---------- ---------  ---------
Parse        1     0.00      0.00          0          0          0         0
Execute      1     0.00      0.00          0          0          0         0
Fetch        2     2.77      6.66      75521      75583          0         1
------- ------  -------  -------- ---------- ---------- ---------- ---------
total        4     2.77      6.67      75521      75583          0         1

75,000 disk reads to return 1 row. Ouch! Now run this through the Result Cache and get some really nice numbers. :)

 

select
   /*+ result_cache */
   max(det.invoice_date)
from
   invoices i
join
   invoice_detail det
   on i.dept_id=det.dept_id

call     count     cpu   elapsed       disk      query    current       rows
------- ------  ------ --------- ---------- ---------- ----------  ---------
Parse        1    0.00      0.00          0          0          0          0
Execute      1    0.00      0.00          0          0          0          0
Fetch        2    0.00      0.00          0          0          0          1
------- ------  ------ --------- ---------- ---------- ----------  ---------
total        4    0.00      0.00          0          0          0          1

 

Still 1 row returned but zero disk reads, zero current blocks, and basically zero elapsed time. Nice!

 

The Result Cache works best when returning a few number of rows on tables that do not change often. DML operations on the underlying tables will invalidate the Result Cache entry and the work will need to be performed anew before it will be stored in the Result Cache.

Sometime soon, when I get a chance, I’m going to figure out the impact of bind variables on queries that use the Result Cache.

Aug 13

Big Data, Cloud Computing, and In Memory databases

Found this on Twitter today.

http://diginomica.com/2014/08/12/gartnerhypecycle/#.U-t6w2N98mR

Big Data has now entered the Trough of Disillusionment. Just ahead of it are In Memory databases. Cloud Computing is starting to come out of the trough on its way to the Slope of Enlightenment.

Aug 08

GI 12.1.0.2 Upgrade

The 12.1.0.2 patchset has been out for a bit now. I am just now finding time to be able to take my first look at it. I’m interested, like many others, in looking at the In Memory database option. But I need to upgrade my Grid Infrastructure before I can upgrade my database.

The upgrade went smoothly. The only thing I thought was odd was a prerequisite check failed on the panic_on_oops kernel parameter. I was upgrading from 12.1.0.1 to 12.1.0.2 so this is a brand new check. The OUI provided a fixup script which I ran and then proceeded without any other upgrade issues.

Aug 06

GI 12.1.0.2

Got a pleasant surprise when trying to download Grid Infrastructure 12.1.0.2 the other day. GI is now two download files, 3 of 8 and 4 of 8 instead of just one as was seen in previous versions. Then I went back and looked at GI 12.1.0.1 was also two files. Not sure why it didn’t capture my attention then.

Aug 04

DB Popularity

I ran across this web page that ranked databases by popularity:

http://db-engines.com/en/ranking

This is completely unscientific and doesn’t take market share into account. But I thought the results were interesting none-the-less.

 

Aug 01

Oracle In-Memory Cost

Just read an interesting post by James Morle on the “true cost” of Oracle’s new In-Memory database option. It’s a good read.

 

 

Jul 31

Now on Twitter

I now have a professional Twitter account. I can be followed on @BPeaslandDBA

Jul 24

Oracle 12.1.0.2 Released

Oracle just released the 12.1.0.2 patchset approximately thirteen months after 12.1.0.1 was made available. There are a few new features with this release that can be seen here.

Jul 22

New Parameters in 11.2.0.4

I am working towards upgrading databases from 11.2.0.3 to 11.2.0.4. This should go smoothly, right? Well after I upgraded a development database, I got a call about a slow performing query. In all of the 11.2.0.3 databases, the query runs in about 50 seconds. In the 11.2.0.4 databases, the query runs in 4 minutes. The execution plans are different even though the table/index stats are identical. So this is a version-related problem. One of the things I did was to check the parameters between the different versions. I was surprised to find that a simple patchset upgrade now has five new parameters.

From the 11.2.0.3 database:

 

SQL*Plus: Release 11.2.0.3.0 Production on Tue Jul 22 14:23:39 2014
Copyright (c) 1982, 2011, Oracle. All rights reserved.
SQL> connect / as sysdba
Connected.
SQL> select count(*) from v$parameter
 2 where name in ('cell_offloadgroup_name','dnfs_batch_size',
 3 'enable_goldengate_replication','instant_restore','java_restrict');
COUNT(*)
----------
 0
SQL> select count(*) from v$parameter;
COUNT(*)
----------
 347

 

From the 11.2.0.4 database:

 

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jul 22 14:23:26 2014
Copyright (c) 1982, 2013, Oracle. All rights reserved.
SQL> connect / as sysdba
Connected.
SQL> select count(*) from v$parameter
 2 where name in ('cell_offloadgroup_name','dnfs_batch_size',
 3 'enable_goldengate_replication','instant_restore','java_restrict');
COUNT(*)
----------
 5
SQL> select count(*) from v$parameter;
COUNT(*)
----------
 352

 

Yep, five more parameters in 11.2.0.4. Those parameters are:

cell_offloadgroup_name
dnfs_batch_size
enable_goldengate_replication
instant_restore
java_restrict

The two in red are in the Oracle 11.2 documentation. The other three are not, nor is there any mention of them in Metalink.

 

I’ll have to see if I can get to the bottom of these and if they could be contributing to my performance problem.

 

 

Older posts «