TNS-12519 w/o Max Processes Reached

I got a call from someone that they were getting TNS-12519 errors in their application. Sure enough, those messages were in the listener.log file as well.

TNS-12519: TNS:no appropriate service handler found

For those that aren’t familiar with this error, it typically means one of two things. Either the service name is not registered with the listener or that the database has reached its maximum number of processes. For the latter, what happens is that the Listener knows that the database can’t accept any new processes so it takes the service name out-of-service so to speak. A quick “lsnrctl status” shows me that the service name is registered correctly. So it must be the latter. I then issue the following query and confirm my suspicions.

 

SQL> select resource_name,current_utilization,max_utilization
 2 from v$resource_limit where resource_name='processes';
RESOURCE_NAME   CURRENT_UTILIZATION MAX_UTILIZATION
--------------- ------------------- ---------------
processes                       299             300

Sure enough. I’ve reached max processes, which is defined in my SPFILE to be 300. I increased the parameter to 600 and bounced the instance.  We hit the error again with double the number of processes. I obviously need to dig into this further.

For some background, and for something that will be important later on, it is important to know that this database supports our automated testing efforts. We have a test harness that exercises our primary production application. This test suite launches the application, connects to the database, pushes a few buttons and selects a few menu items and then disconnects.

I examined the listener.log file to see where the connection requests were coming from. These requests were coming from a rogue application server, not our test suite. I knew something was amiss because we had not yet started the test suite and we were getting the errors. We fixed the application server and I didn’t see the errors return. We then fired up the test suite and some time later, the TNS-12519 errors returned. Hmmm…I thought I found the culprit. But let’s check our process utilization.

SQL> select resource_name,current_utilization,max_utilization
 2 from v$resource_limit where resource_name='processes';
RESOURCE_NAME   CURRENT_UTILIZATION MAX_UTILIZATION
--------------- ------------------- ---------------
processes                       89             157

 

So I’m currently seeing 89 processes with a maximum utilization of 157. I’m nowhere near my new limit of 600. So what gives? It took me awhile to figure out what the issue was. The service name is correctly registered and I’m nowhere near my limit. MOS NOTE 552765.1 talks about how the Listener arrives at the TNS-12519 error. Previously, I was seeing the most common cause. Max PROCESSES had been reached. But not this time So what gives?

After investigation, I found my answer in the listener’s log. But it wasn’t obvious like some big error message. The first occurrence of the TNS-12519 error was at 9:38 am.  I grep’ed for “service_update” in the listener log and saw these entries.

25-JUN-2015 09:17:08 * service_update * orcl * 0
25-JUN-2015 09:17:26 * service_update * orcl * 0
25-JUN-2015 09:17:29 * service_update * orcl * 0
25-JUN-2015 09:17:44 * service_update * orcl * 0
25-JUN-2015 09:17:50 * service_update * orcl * 0
25-JUN-2015 09:17:53 * service_update * orcl * 0
25-JUN-2015 09:18:56 * service_update * orcl * 0
It is always better to start with the lowest cost with the other company. sildenafil generic uk For firm and hard erections it is essential to consult a doctor, before taking any sort  sildenafil from india of contraceptive pill. So, buy some ginseng tea tablet viagra http://www.devensec.com/meetings/FY2017_Audit.pdf and health supplements. But Obama is not alone in his adherence to the Cheap Energy Contract means that adding an energy levitra prescription http://www.devensec.com/sustain/DEEC_Overview%20_2011.pdf tax of almost any kind is considered to be political suicide as well as any actions that could be construed as raising energy prices viewed from the perspective of a future political campaign. 25-JUN-2015 09:18:59 * service_update * orcl * 0
25-JUN-2015 09:19:50 * service_update * orcl * 0
25-JUN-2015 09:20:11 * service_update * orcl * 0
25-JUN-2015 09:21:27 * service_update * orcl * 0
25-JUN-2015 09:22:09 * service_update * orcl * 0
25-JUN-2015 09:24:05 * service_update * orcl * 0
25-JUN-2015 09:27:53 * service_update * orcl * 0
25-JUN-2015 09:29:32 * service_update * orcl * 0
25-JUN-2015 09:34:07 * service_update * orcl * 0
25-JUN-2015 09:41:45 * service_update * orcl * 0

Notice that this service update occurs regularly at 9:17 and 9:18, but the time between the service updates takes longer and longer. Notice that there was 8 minutes 38 seconds between service updates at the end (9:34 to 9:41). Why is this important?

This is an Oracle 11.2.0.4 database. For 11.2 and earlier, PMON is responsible for cleaning up after processes and for passing information to the Listener. On database startup, PMON tries to register the services with the Listener. One other thing PMON does is to tell the Listener how many max processes can be serviced. In this case, PMON tells the listener that it can have up to 600 processes. PMON does more, but for purposes of this discussion, that’s enough.

One important piece to know is that the Listener never knows how many processes are currently connected. It only knows how many connection requests it has helped broker. The Listener never knows if processes disconnect from the database. The service_update above is where PMON is telling the Listener how many processes are actually being used. So at 9:34:07, the PMON service update tells the Listener that there are 145 processes in use. The Listener is now up-to-date. When a new connection request comes in, the Listener increments this to 146 processes. Between the service updates, the Listener is totally unaware that 1 or more processes may have been terminated, normally or abnormally. It keeps incrementing its count of processes until the next service update when the Listener gets an accurate account of how many processes are spawned.

So we have that 8.5 minute gap between service updates. Why did it take PMON so long to get back to the Listener? Well the clue for that is in the listener.log as well. I stripped out everything from the log prior to the 9:34 service_update and after the 9:41 service update. From there, it was easy to grep for “(CONNECT_DATA=” in what remained and pipe to “wc -l” to get a count of lines.

During this 8.5 minute interval, I had well over 450 new connection requests! Yet most of those new connections terminated as evidenced by V$RESOURCE_LIMIT showing me that I had a max of 150.  PMON was so busy cleaning up for the application exiting its database connections that it had a big lag before it updated the Listener. As far as the Listener was concerned, the 150 current connections plus the 450 new connections meant that it reached its limit of 600.

PMON can take up to 10 minutes to get back to the Listener with its next service update. Cleaning up after sessions exit the instance has a higher priority than service updates to the Listener. At the 10 minute mark, PMON makes the service update the top priority if the service update had not been previously done in that time interval.

Remember that this is a database to support automated testing. We have to live with this many connect/disconnect operations because we have an automated robot testing out our application in rapid-fire fashion. We don’t want to change how the application works because it works very well when run by a single user. Our automated test suite is executing the application differently than what it was designed to do. But we want to exercise the application as its written to potentially expose bugs before the code changes hit production.

For now, I’ve increased the PROCESSES parameter to a value that we’ll never reach. All this so that the Listener can not hit the limit in its internal counter. The more PROCESSES, the more memory needed in the SGA to support that higher number. But this database can handle it.

If anyone knows of a way to get the service update to occur in a 5 minute window, please let me know. There aren’t any documented parameters to control this behavior and I’ve been unable to find an undocumented one as well.

Lastly, this issue is in one of my 11.2.0.4 database. Oracle 12c changes the architecture a bit. The new Listener Registration (LREG) background process handles the work that PMON used to do. I don’t have a system quite yet to test, but I bet that LREG won’t have the same issue in 12c that PMON is exhibiting here in 11g as LREG won’t have to handle cleanup duties that PMON does. MOS Note 1592184.1 shows that LREG will do the service updates.

 

Update: Since I wrote this article I did have a chance to upgrade the database to 12c with its new LREG process. With LREG handling the Listener’s service updates, we saw the issue disappear. Even during times of heavy session activity, specifically connecting and disconnecting, LREG made regular service updates that PMON would not have been able to perform as often.