Crash caused by AMM resize operations and high ‘direct file read’
Posted by John Hallas on November 16, 2011
Much off the following diagnosis was performed by a colleague Scott Chisholm. The outcome is a lot of new information (to me at least) about the parameter _small_table_threshold but I have also developed thoughts around which set of checks we should put in place after restarts of major systems.
We start of with the executive summary for those that like to cut to the chase.
Instance 1 of a 2 node RAC cluster (11GR1 HPUX) crashed and the evidence indicates this was related to excessive AMM resize operations. The resize operations left the instance with an undersized buffer cache at instance start-up and this has steadily grown since. However, the _small_table_threshold value appears to have been set too low and does not appear to have been dynamically sized. As such the method for buffer cache or direct file read has changed and was the cause for a change in disk read activity on instance 1. This would be exacerbated by multiple high disk IO operations such as RMAN running.
End of summary – however I will point out that this is not the first time we have been bitten by the automatic memory management killing an instance because it decides that it will consume excessive CPU by constantly resizing the SGA. In fact it is my view that AMM is an unnecessary overhead on any systems which are have been in production for a while and realistic SGA values are known. AMM can be very flakey/buggy and can sporadically lead to a high number of resize operations (if Oracle can’t get it right). In fact, the high number of operations actually occurred in the final 3 minutes of the final snap period before the crash indicating 1324 resize operations in a 3 minute period
So we had an instance crash and restart on a Sunday. The next day, when main usage was starting up, very poor performance of sql_id ‘2cahqnpt4cq23’ was noticed on instance 1. The sql was generating intensive ‘direct file read’, later it reverted to ‘ksfd: async disk IO’ However, on instance 2 the same query continued to use buffer gets rather than direct disk reads and was proving to be far more efficient. It was noted that from about 08:10 (after the RMAN backup was complete) the response time for the problem SQL was much improved and observations were that it had reverted to buffer gets rather than ‘direct file read’.
Scott performed research to understand why Oracle would select to use a direct file read rather than read records into or from the buffer cache. This led to the parameter _small_table_threshold which is used to determine whether a buffer cache or disk read will be more efficient (MOS Note 787373.1 gives a full description and introduced me to the term smallness logic). This value was set to 418 on Instance 1 and 57802 on instance 2. We noted that this value (if not set) takes it’s value from the size of the buffer cache at instance start-up. When using AMM the buffer cache at instance start-up is initiated from the setting at instance shutdown. This was be the key to why we were seeing high disk reads on Instance 1 and not on instance 2. The last entries we can see in the DBA_HIST_MEMORY_RESIZE_OPS view indicates that db_cache_size had gone from 8.5Gb to 180M after the database restart
One side aspect was that db_file_multiblock_read_count was noticed (after the event) as having changed from 128 to 10 after the instance crash and reported as changing from 10 to 128 when the database was restarted with new AMM settings This is left to be determined automatically by Oracle in our database so I’m guessing the small value was chosen due to the undersized buffer cache and will have contributed significantly to worsened I/O experienced on that instance
So we now have an explanation as to what caused the difference in performance between the nodes.
My focus is really about what processes we can put in place to stop a re-occurrence. My thoughts are :-
- Why use AMM when we know we have had problems before. What value does it really provide on a stable system.
- If using AMM then why not fix key values such as buffer cache so they have no opportunity to fluctuate. One aspect I see regularly with AMM resize operations is that it makes minute changes such as amending a 15Gb buffer cache to a 14.9Gb buffer cache and then reverts that change a few minutes (even seconds) later. I appreciate that the shared pool is the likely beneficiary of this change but when that is 4Gb for example then 0.1Gb is not going to make a significant difference anyway.
- We already have a daily ‘morning check’ that reports difference in spfile parameter changes and that did pick up the db_file_multiblock_read_count change but missed the _small_table_threshold change, probably because it was an underscore parameter. The script is shown below.
Implement processes on major systems that report differences between parameters on startup. That should be relatively easy to do as we just need to create a pfile from spfile before a shutdown and repeat after startup and do a diff command on the two files. If we have a crash the pfile from the previous startup should be in place so we have a constant comparison available. The more I think about the more I think we should have done that a long-time ago and indeed we have already implemented the reporting capability later on but not the discipline of checking after a restart.
So I have offered a problem we had, shown how the solution was identified, no doubt generated some thoughts on researching _small_table_threshold and consequently developed some ideas, if only for myself for ways we can improve management of critical systems.
PS ‘ksfd: async disk IO’ deservers a blog entry of it’s own – currently hoping PSU 7 will fix it. But not tested as yet
deltatime "Date", target_name "Database", hostname "Host", operation "Operation", key1 "Parameter", attribute "Attribute", oldvalue "Old Vlue", newvalue "New Value" from MGMT$ECM_CONFIG_HISTORY_KEY1 where target_name like '%PRD%' and deltatime >= sysdate -1 and collectiontype='MGMT_DB_INIT_PARAMS_ECM' and key1 !='resource_manager_plan