We have been seeing an increasing number of alerts stating that OEM cannot ping an agent. These then generate alerts and incidents and potential callouts. The situation was getting increasingly worse and therefore we started some investigation as we had put it down to a busy network and the fact we have a lot of distributed agents.
The error message is Message=Agent is unable to communicate with the OMS. (REASON = Agent is Unreachable (REASON : Agent to OMS Communication is broken ). Severity=Unreachable Start
We are on GC 10.2.0.5. We came across Note 9276193.8 which highlights bug 9276193 - gc sends false alerts with “agent to oms communication broken” message
There are two workarounds suggested :-
Turn off alerts notification – which is a bit of a joke really
Increase max_inactive_time in emd_ping table to a large value – the table name is actually mgmt_emd_ping.
Currently the default value is 120 seconds and we upped it to 240 and that resolved our problems.
Below is a test case showing a selection of agents and their target guids and how we proved the fix. Read the rest of this entry »
