Nov 14

High Space Usage From crfclust.bdb

I have a 2-node testbed running Oracle RAC 11.2.0.4 on OL6. Pretty much everything is in the system disk. This is just a testbed after all. The root partition has been filling up. I got an alert from EM about the disk space issue and went in and cleaned up some log files. As I was cleaning up old log files, my brain was telling me that log file space utilization was not out of control and that there must be another underlying issue. Sure enough, three days later I got the alert that the disk was filling up again. I knew that I needed to dig further into this. There must be another file or two hogging up the space. After some digging, I came to this directory in my Grid Infrastructure installation:

 

[oracle@host01 host01]$ pwd
/u01/app/crs11.2.0.4/crf/db/host01
[oracle@host01 host01]$ ls -l
total 10945448
-rw-r--r-- 1 root root 1773999 Jul 2 13:54 02-JUL-2014-13:54:50.txt
-rw-r--r-- 1 root root 1120665 Jul 2 14:00 02-JUL-2014-14:00:06.txt
-rw-r--r-- 1 root root 16953 Mar 25 2014 25-MAR-2014-19:51:58.txt
-rw-r----- 1 root root 280764416 Nov 13 16:15 crfalert.bdb
-rw-r----- 1 root root 9850126336 Nov 13 16:14 crfclust.bdb
-rw-r----- 1 root root 8192 Jul 2 13:59 crfconn.bdb
-rw-r----- 1 root root 352174080 Nov 13 16:15 crfcpu.bdb
-rw-r----- 1 root root 249356288 Nov 13 16:15 crfhosts.bdb
-rw-r----- 1 root root 265261056 Nov 13 16:14 crfloclts.bdb
-rw-r----- 1 root root 172232704 Nov 13 16:14 crfts.bdb
-rw-r----- 1 root root 24576 Jul 2 13:54 __db.001
-rw-r----- 1 root root 401408 Nov 13 16:15 __db.002
-rw-r----- 1 root root 2629632 Nov 13 16:15 __db.003
-rw-r----- 1 root root 2162688 Nov 13 16:15 __db.004
-rw-r----- 1 root root 1187840 Nov 13 16:15 __db.005
-rw-r----- 1 root root 57344 Nov 13 16:15 __db.006
-rw-r----- 1 root root 16777216 Nov 13 16:06 log.0000008765
-rw-r----- 1 root root 16777216 Nov 13 16:15 log.0000008766
-rw-r--r-- 1 root root 120000000 Jul 2 13:55 host01.ldb
-rw-r----- 1 root root 8192 Jul 2 13:54 repdhosts.bdb

 

The crfclust.bdb file is about 9.8GB. My system disk is only 30GB so this one file is taking up 33% of the entire space. And it keeps growing.  To fix the problem, I performed these steps:

 

[oracle@host01 host01]$ /u01/app/crs11.2.0.4/bin/crsctl stop resource ora.crf -init
CRS-2673: Attempting to stop ‘ora.crf’ on ‘host01’
CRS-2677: Stop of ‘ora.crf’ on ‘host01’ succeeded
[oracle@host01 host01]$ su
Password:
[root@host01 host01]# rm -rf *
[oracle@host01 host01]$ /u01/app/crs11.2.0.4/bin/crsctl start resource ora.crf -init
CRS-2672: Attempting to start ‘ora.crf’ on ‘host01’
CRS-2676: Start of ‘ora.crf’ on ‘host01’ succeeded

 

Why did this work? These files are the Berkeley database used for the Cluster Health Monitor (CHM). One of the files is only supposed to be about 1GB in size and regularly purge older data. But the purge step is not working. By manually removing the files, I will lose historical performance data, but that is acceptable to me at this point. On startup, CHM will create the files anew if they are missing.

 

After fixing the issue, I did find Metalink Note 1343105.1 which describes the problem. I haven’t yet been able to find a specific Bug number, but it is clear that a bug exists.