Cloud agent set to DEBUG causing out of memory errors
Posted by John Hallas on June 23, 2015
The following technical detail was put together by a colleague John Evans and have taken it , with his permission, and wrapped some more detail around it as it seemed to be of real value to anybody who might have upgraded an agent to 184.108.40.206
Following an upgrade of the EM agent from 220.127.116.11 (or 18.104.22.168) to 22.214.171.124 after about 90 days of usage we saw a number of agents failing with out of memory errors.
We traced this down to a line in the properties file where the trace level of parameter Logger.sdklog.level=DEBUG rather than INFO
Our view is that the property was set in 126.96.36.199 to DEBUG. However when we upgraded to 188.8.131.52 that brought in more features and thus more checks as to what the agent was doing. The out of memory errors only manifested themselves about 90 days after we had done the upgrade. A clean 184.108.40.206 or 220.127.116.11 install has INFO as the default, it is only an issue when upgrading from 18.104.22.168 to 22.214.171.124
#### Tracing related properties
# emagent perl tracing levels
# supported levels: DEBUG, INFO, WARN, ERROR
# default level is WARN
We changed this to INFO which is how it was before we had done the agent upgrade and we were comfortable that this resolved our issues as the 2 graphs below demonstrate.
Note the slow increase of dispatched actions after the parameter is set and the agent is reloaded compared to before when it climbs fairly rapidly.
However we had 1100 agents to change and we wanted to do it centrally rather than agent by agent. We raised this as an SR with Oracle and after a bit of to and froing where they suggested changing a number of kernel parameters (on 1100 servers!!!) and we point-blank insisted they answer the question we were asking they came along with internal bug
Bug 16492328 : PSR:NGAGENT:CHANGE LOGGER.SDKLOG.LEVEL FROM DEBUG TO INFO BY DEFAULT
The SR response was as follows – which would work but was a lot of effort
In 126.96.36.199, if this property contains DEBUG, then it will carry forwarded to 188.8.131.52.
If you have fresh 184.108.40.206 agent and if you upgrade this agent to 220.127.116.11, then property value should be INFO
From the uploaded emd.properties file, only following parameter shows debug enabled:
To turnoff the debug on agent, following command can be used:
/bin>./emctl setproperty agent -name “Logger.sdklog.level” -value “INFO”
However to turnoff the debug for all the agents at one time, you need to run the above commands as part of an OS command job from console.
Should you ever want or need to change an agent property en-masse and don’t want to sign in to every box then you can either use the agent parameters page OR use emcli:
Using Cloud Control – Setup / manage Cloud Control / Agents
Highlight the agents you are interested in and select properties. From that screen then select the parameters tab which takes you to the screen below and you put the parameter and value in
To run from emcli take the following steps:-
From the OMS repository select all the agent installs
select 'emcli set_agent_property -agent_name="' ||target_name ||'"-name="Logger.sdklog.level" -value=INFO' from SYSMAN.mgmt_targets where target_type = 'oracle_emd' order by 1;
Which produces 1100 lines similar to the following
emcli set_agent_property -agent_name="server.domain:1830"-name="Logger.sdklog.level" -value=INFO
If you want to add a new property then just append –new to the generated command.
Spool the output to a file and login to emcli which is normally on the OMS repository, chmod +x the file… then ./file
The main problem with either approach is that the agent needs to be reloaded for teh new parameters to come into effect but cloud control requires credentials to be enabled which is a nightmare for my current site at least. I do think there should be a tick box in cloud control to allow a reload if you want to select that option.
The alternative is to wait until the agent is restarted or use some form of batch routine to connect to all the servers and reload the agent using the emcli command line.