Oracle DBA – A lifelong learning experience

11g – library cache mutex X – known bug

Posted by John Hallas on March 31, 2008

On a high volume performance rig (11.0.6.1 2 node RAC, OEL5 Linux) we are seeing the main wait event as ‘library cache: mutex x”. This is an event that is new to 11g and there is not much documentation around it.

An AWR report for a 4 minute snapshot shows the evidence of the problems being caused

Top 5 Timed Foreground Events

Event

Waits

Time(s)

Avg wait (ms)

% DB time

Wait Class

library cache: mutex X

19,674,620

19,177

1

70.15

Concurrency

log file sync

564,966

5,334

9

19.51

Commit

DB CPU

1,722

6.30

latch: ges resource hash list

78,225

552

7

2.02

Other

latch: row cache objects

44,421

213

5

0.78

Concurrency

There is one hit on Metalink which leads to bug 5928271 and this has been seen by one customer who also has a high throughput database with very high levels of CPU usage. For information our CPU and memory usage is shown below. PS We do not need 128Gb of memory but that is what the servers came with (evaluation purposes).

Host CPU (CPUs: 8 Cores: 8 Sockets: 4)

Load Average Begin

Load Average End

%User

%System

%WIO

%Idle

54.30

104.45

90.9

7.2

0.0

0.3

Instance CPU

%Total CPU

%Busy CPU

%DB time waiting for CPU (Resource Manager)

98.5

98.8

0.0

Memory Statistics

Begin

End

Host Mem (MB):

128,987.8

128,987.8

SGA use (MB):

2,400.0

2,400.0

PGA use (MB):

1,174.2

1,231.1

% Host Mem used for SGA+PGA:

2.77

2.77

Several changes were recommended to try, one was to increase shared pool size (which was definitely not our problem) and the other  was to try an undocumented parameter _session_cached_instantiations with of value of 100. This did not show any differences on our system and we are now awaiting Oracle to fix the bug (or at least provide more information on what it actually means and how we can address the problems).

 

I will update this blog when I know more

About these ads

16 Responses to “11g – library cache mutex X – known bug”

  1. yuntaa said

    I’m getting the same issue in my 11.1.0.6 database! It’s a high throughput system loading and unloading Securefile BLOBs. But through monitoring, I see this all the time and usually as a top consumer:

    BEGIN MGMT_PAF_AQ.DEQUEUE_REQUEST(p_node_id => :1 , p_wait => :2 , x_xml_data => :3, x_request_id => :4, x_timestamp => :5, x_return_status => :6 ) ; END;

    Google doesn’t even have the word “MGMT_PAF_AQ” … and neither does Metalink. So not sure if this package is being run as the result of something else, or is the cause of the latch problem.

    Looking forward to your update

    • Paul Simmons said

      BEGIN MGMT_PAF_AQ.DEQUEUE_REQUEST(p_node_id => :1 , p_wait => :2 , x_xml_data => :3, x_request_id => :4, x_timestamp => :5, x_return_status => :6 ) ; END;

      I am getting this in a 10.2.0.4 DB, the code is executed by an OMS process for Oracle Grid Control. The OMS process is causing 60% cpu load so I guess it is a similar problem

  2. Umar Syyid said

    Did you find a resolution to this issue.. I see the same thing occurring on my machine.

  3. John Hallas said

    Sorry for the delay in replying. I have now left the site where we had the problems , however the last situation was that Oracle had identified a fix, which was to be released in 11.1.0.7. I am not sure whether it was an actual code change or the reduction of the mutex X wait event.

    We had also doubled the number of CPUs in the server and the wait event was hardly noticeable after that, even with additional throughput.

    Overall I have a feeling that the mutex X event is likely to be caused by a shortage of CPU rather than a the wait event causing a shortage of CPU

  4. veekay said

    We had similar issue related to waits on library cache: mutex. Sessions used to run for hours consuming 2-3% cpu. The problem was with “set serverout on”–seems to be a bug in 11g.

    We has to replace “set serverout on” it with dbms_output.enable and dbms_output.get_lines. “set serverout on” may not needed if the script doesn’t contain dbms_output.put_line. Hope this helps..

    • John Hallas said

      Interesting comment. I am not at the same site now and canot test that as easily as I could then, however it sounds like it might be another 11g buggette that could be fixed in 11.1.0.7

  5. Joshua said

    Any new information on this bug? We have an identical environment and our experiencing the same Library Cache : Mutex X wait. It happens sporadically but has a big impact on our systems as everything pauses.

    • John Hallas said

      Joshua and others,

      I am sorry but I am now working at a different site.
      However I do know that we were hitting max CPU and that seemed to cause the issue and when we changed our system to reduce CPU we did not hit the error very much at all.
      I was always unclear as to how much it was a documention issue as opposed to an impacting wait event. By that I mean, were we just hitting an event which was being reported but had no detrimental impact on performance or was it a definite performance issue that could be fixed.

      I have seen all the Oracle information available on this bug and I am still not very clear on how much of an impact there was or whether the reported fix was only to remove the wait event report.

      John

      • Venkat said

        Hi John,

        Did you ever find our the root cause fo rthis issue? Even we are having a very high CPU when this occurs and huge number of sessions. Our Production instance did not show this error as it twice the size of our UAT instance. Do you know if that matters?

        Regards,
        Venkat.

      • John Hallas said

        Hi Venkat, never really moved on much from that problem, although we still see the same wait event when there is CPU starvation.

        John

  6. RSN said

    We encountered this issue in 11.1.0.7. Pls. refer to Bug 7307972 and Note id 7155860.8 in Metalink
    There is currently no known fix for this that we can see.
    thanks
    RSN

  7. Lynne said

    A patch was released for bug 7307972 on 11/06/09. It’s available for Solaris, Linux, HP-UX and AIX.

  8. Jeff Graham said

    I am seeing this on our New Exadata half rack running 11.2.0.1 when running heavy SOE load with 300 users in swingbench. Bug 7307972 state that its fixed in 11.2 but I’m not so sure.

  9. Jos Gordon said

    Actually that wait event is not unique to 11g but also is evident in highly concurrent 10g systems. I’ve resolved this one before but it was some time ago. Will post here again when I get it fixed.

  10. Mahesh said

    There are infact so many bugs related to “library cache: mutex X” and some of them seem to be fixed (list available at the Mutex Contention section of the metalink article 1179583.1) in 11.2.0.2 patchset which was released on 11/17. You might want to search for the patch number 10098816 in metalink. I’m going to apply this patch on 11.1.0.7 on Tux64 and hoping that it’ll solve my probable bug 10145558(Selects on library cache V$/X$ views cause “library cache: mutex X” waits)!!

  11. Andrew said

    This is a late update with our experience. We had just upgraded to 11.1.0.7 from 10.2.0.4 and the utlu111i.sql pre-upgrade script denoted instance_groups as a deprecated parameter. After the upgrade, I removed the “instance_groups” parameter from our initialization file. After a while, we started seeing a large number of “Library Cache: mutex X” waits, a huge number of E-Business Suite sessions in the database and a huge number of processes on the application server. Eventually these sessions just consumed the instance associated with E-Business (we are running RAC). It turns out that while the “instance_groups” may be deprecated in 11g, nobody told the Oracle E-Business folks. The errors were caused by not having the “instance_groups” parameter set while the “parallel_instance_group” parameter was set (it did not show up as deprecated. Oracle Support stated that if EBS needed that parameter set, then it should be set. After re-setting that parameter, the “mutex -X” errors went away.

    Anyway, just wanted to share my thrills with the “mutex-X” errors.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 217 other followers

%d bloggers like this: