Archive for the ‘Grid control and agents’ Category
December 16, 2009
I am making my first presentation to a SIG meeting in January when I talk about how my company has moved from a site where Oracle was almost non-existent less than two years ago to one that is now delivering what seems to be every product that Oracle has invented (or purchased). It won’t be a ‘how great we are’ approach but rather how making some simple but fundamental decisions have made it much easier to build and support a large and growing infrastructure.
I will be talking about having a set of standards that are very simple but are a fundamental building block in our aim of having the same look and feel across the estate and how a small number of setup scripts can make life so much easier.
I will probably diverge from the straight and narrow and discuss our experiences of running a centralised OEM solution and given we currently have 7 open SRs raised referencing Grid problems I haved an inkling of which way it may be slanted. However I hope that by the time of the talk I can be much more positive because there is lot to be said for OEM.
The date is 21st January 2010 in London and an abstract can be seen in the Unix SIG agenda
I hope to see some of you there
Posted in Grid control and agents, Oracle | Tagged Grid, SIG, Unix SIG | 3 Comments »
April 8, 2009
I blogged about how to remove targets from OEM <a href=”http://jhdba.wordpress.com/2009/01/07/removing-a-grid-target-from-the-oms/”> removing a Grid target </a> and I used my own blog entry yesterday to try and force the removal of several entries. These appeared to work but when I tried adding new targets (we were migrating databases fom one server to another) I got the error message
java.sql.SQLException: ORA-20600: The specified target is in the process of being deleted.(target name = SID)(target type = oracle_database)(target guid = 21D8EFD67CCF409D7CDB41DCFD1F9D94)
ORA-06512: at “SYSMAN.TARGETS_INSERT_TRIGGER”, line 36
ORA-04088: error during execution of trigger ‘SYSMAN.TARGETS_INSERT_TRIGGER’
ORA-06512: at “SYSMAN.EM_TARGET”, line 1918
ORA-06512: at “SYSMAN.MGMT_TARGET”, line 2705
ORA-06512: at line 1
A colleague, Allan Ho, looked at the problem and resolved it by looking in sysman.mgmt_targets_delete
select * from mgmt_targets_delete;
To delete the entry use :
begin
mgmt_admin.delete_target('SID','oracle_database');
end;
You can also force deletion by using :
begin
mgmt_admin.delete_target_internal('SID','oracle_database');
end;
The dynamic sql would be
select 'execute sysman.mgmt_admin.delete_target_internal ('''||target_name||''','''||target_type||'''); ' from sysman.mgmt_targets_delete;
Posted in Grid control and agents, Oracle | Tagged mgmt_admin.delete_target, mgmt_admin.delete_target_internal, mgmt_targets_delete, ORA-20600, SYSMAN.TARGETS_INSERT_TRIGGER, The specified target is in the process of being deleted | Leave a Comment »
February 18, 2009
We have an Enterprise manager Grid control morning check report that indicates issues across all our environments.
The list of checks include :
Usable flash recovery area less than 20%
Filesystems over 90% used
Databases not backed up within 1 day and not blacked out and not a physical standby
Dataguard status (targets not blacked out)
Alert log errors
Various specific job checks
The alert log query we use is a view based on this query
select distinct t.target_name,
m.column_label,
smh.key_value,
smh.string_value
from mgmt_targets t,
mgmt_metrics m,
mgmt_string_metric_history smh
where t.target_guid = smh.target_guid
and m.metric_guid = smh.metric_guid
and m.statefull = 0
and m.metric_name like ‘alertLog%’
However that still requires us to logon to each database and click on the alert_log link under diagnostic options on the home purge. We then need to purge the alerts.
Looking for a better way to do this I traced an EM session on the OMS repository database whilst purging these alerts. This provided me with the following statement which can be used to delete all outstanding alerts for every system.
delete from mgmt_string_metric_history where metric_guid in (
select metric_guid from mgmt_metrics where statefull=0 and metric_name like ‘alertLog%’);
This is set up as a script on the OMS database server. There is no commit within the script. This is intentional so that the count of deleted records should match the number of alerts outstanding on the morning check report. It is also intentional that we have not created this as a EM job. The reason is that it will then perform an auto-commit which does not allow any regression if the record count is different from the number of alerts.
I hope the idea of a morning check report proves useful and the sql statements listed can be utilised.
Posted in Grid control and agents, Oracle | Leave a Comment »
February 10, 2009
This blog shows how to run DB console in conjunction with Enterprise Manager Grid.
Why would I want to do both, well purely because DB console is at 11.1.0.7 whereas the latest version of EM Grid agent is 10.2.0.4. The new db console has additional functionality to manage RAC, RMAN and Data Guard which is not available from EM.
The initial errors I received when trying to create a db console repository was that the SYSMAN user already existed. This is created within the database as part of the DBCA build. Once that user was dropped I received one error after another until I finally got the order of commands correct. From the source database (not the EM OMS database)
drop user sysman cascade;
drop role mgmt_user;
drop user mgmt_view cascade;
drop public synonym MGMT_TARGET_BLACKOUTS;
drop PUBLIC SYNONYM MGMT_AVAILABILITY;
Now we drop the repository (if it exists) and any existing public synonyms or views that may cause the repository create to fail. Bug 6111734 details some issues in this area and Metalink note 470246.1 gives some information, but not much
$ORACLE_HOME/sysman/admin/emdrep/bin/RepManager host 1522 SID -action drop
It requires some input and then lists all the actions that are taking place. Note the use of quiesce database.
Enter SYS user’s password :
Enter repository user name : SYSMAN
Getting temporary tablespace from database…
Found temporary tablespace: TEMP
Checking SYS Credentials … rem error switch
OK.
rem error switch
Dropping the repository..
Quiescing DB … Done.
Checking for Repos User … Exists.
Repos User exists..
Clearing EM Contexts … OK.
Dropping EM users …
Done.
Dropping Repos User … Done.
Dropping Roles/Synonymns/Tablespaces … Done.
Unquiescing DB … Done.
Dropped Repository Successfully.
The repository can be created in the normal manner using either
emca –repos create
emca –config dbcontrol db (to setup, this command can be repeated)
or
emca -config dbcontrol db -repos create
Posted in Grid control and agents, Oracle | Tagged database quiesce, drop user sysman, emca –repos create, mgmt_user, mgmt_view, repmanager | 2 Comments »
January 7, 2009
Within Enterprise Manager a number of HP servers were showing Collection Failure errors on the Agent Process Metric. These were all Itanium but I understand that the issue also applies to PA-RISC chips
sh: ps: not found. sh: tail: not found. sh: ps: not found. Could not execute ps -ef -o ‘ppid pid vsz sz sz args’: at /app/oracle/product/gc10.2/agent10g/sysman/admin/scripts/emdprocesschars.pl line 210.
I was unable to find anything on Google or Metalink to help me answer the problem until I sent out a message to my fellow DBAs and I got two responses back. These were effectively the same thing but the view was that it was better to change the emdprocessorschars.pl file as that was guaranteed to fix the issue whereas a profile setting might not be picked up
Stop the agent before the fix and restart it afterwards
FIX 1
Amend the .profile to add
Export UNIX95=””
FIX 2
You need to edit /app/oracle/product/gc10.2/agent10g/sysman/admin/scripts/emdprocesschars.pl by changing: -
# aaitghez 05/29/02 - Creation
#
$stat_offset=2;
sub countFileHandlesOfType {
To
# aaitghez 05/29/02 - Creation
#
$stat_offset=2;
$ENV{UNIX95} = “XPG4″;
sub countFileHandlesOfType {
Posted in Grid control and agents, Oracle | Tagged Agent Process metric errors, emdprocessorschars.pl, HP, ps -ef -o 'ppid pid vsz sz sz args' | Leave a Comment »
January 7, 2009
The problem
We have a data (normal usage) and a management connection to each server.
In one case the /etc/hosts file has been set up incorrectly so that the entry looked like
10.1.2.3 server.unix.companyname.net server-data server
When the agent was started it registered the targets on that server as
listener_server.server_data.unix.companyname.net
whereas we wanted it to match all the others servers and use the full name and not the alias
Listener_server.server.unix.companyname.net
I deleted all the targets I could from grid control but could not remove the host and the agent.
I tried every permutation of emctl remove target but still with no success so I decided to do it from the management server side.
select distinct target_name,target_type from SYSMAN.MGMT$TARGET where target_name like ‘%server%’
TARGET_NAME TARGET_TYPE
—————————————————————-
server-data.unix.companyname.net:3872 oracle_emd
server-data.unix.companyname.net host
SQL> select distinct target_name,target_type from SYSMAN.MGMT$TARGET where target_name like ‘%server%’
TARGET_NAME TARGET_TYPE
——————————————————————————–
server-data.unix.companyname.net:3872 oracle_emd
server-data.unix.companyname.net host
SQL> exec sysman.mgmt_admin.cleanup_agent(’server-data.unix.companyname.net:3872′);
SQL> exec sysman.mgmt_admin.cleanup_agent(’server-data.unix.companyname.net ‘);
PL/SQL procedure successfully completed.
SQL> select distinct target_name,target_type from SYSMAN.MGMT$TARGET where target_name like ‘%server%’;
no rows selected
I was then in a position to do a discover and re-register the server with OMS
Posted in Grid control and agents, Oracle | Tagged admin.cleanup_agent, emctl, remove grid agent, remove OMS entry | Leave a Comment »
December 10, 2008
This is the standard set of actions that I go through when I have problems with an EM agent that a stop/start/upload does not resolve
The two types of errors I generally see are
EMD upload error: uploadXMLFiles skipped :: OMS version not checked yet..
Starting agent …………………………… started but not ready.
Follow the steps below, which can be cut and pasted
export AGENT_HOME=/app/oracle/product/gc10.2/agent10g/
Stop the agent
$AGENT_HOME/bin/emctl stop agent
Remove the old log files from AGENT_HOME/sysman/log
Delete any pending upload files from the agent home
rm -r $AGENT_HOME/sysman/emd/state/*
rm -r $AGENT_HOME/sysman/emd/upload/*
rm $AGENT_HOME/sysman/emd/lastupld.xml
rm $AGENT_HOME/sysman/emd/agntstmp.txt
rm $AGENT_HOME/sysman/emd/protocol.ini
Start the agent
$AGENT_HOME/bin/emctl start agent
Issue an agent clearstate from the agent home
$AGENT_HOME/bin/emctl clearstate agent
Force an upload to the OMS
$AGENT_HOME/bin/emctl upload agent
Finally I sometimes need to re-secure the agent
$AGENT_HOME/bin/emctl secure agent
Posted in Grid control and agents, Oracle | Tagged clearstate, emctl, enterprise manager, grid agent, upload | Leave a Comment »
March 20, 2008
This blog is about a problem I had with an Enterprise Management agent and how I resolved it.
The symptoms are that an agent is not started and yet I can stop and start it successfully. Looking at the trace file $AGENT_HOME/sysman/RACnode/log or $AGENT_HOME/sysman/log if the node is not part of a RAC cluster I see the following error messages and emctl agent status shows lots of files waiting to load
Agent Version : 10.2.0.2.0
OMS Version : 10.2.0.3.0
Protocol Version : 10.2.0.2.0
Agent Home : /u00/app/oracle/OracleHomes/agent10g/mat051.myco.co.uk
Agent binaries : /u00/app/oracle/OracleHomes/agent10g
Agent Process ID : 20241
Parent Process ID : 20220
Agent URL : https://mat051.myco.co.uk:3872/emd/main
Repository URL : https://mat019.myco.co.uk:1159/em/upload
Started at : 2008-03-19 14:19:16
Started by user : oracle
Last Reload : 2008-03-19 14:19:16
Last successful upload : 2008-03-19 14:19:29
Total Megabytes of XML files uploaded so far : 0.07
Number of XML files pending upload : 625
Size of XML files pending upload(MB) : 35.23
Available disk space on upload filesystem : 35.86%
Last successful heartbeat to OMS : 2008-03-20 09:37:26
—————————————————————
Agent is Running and Ready
2008-03-20 09:42:39 Thread-4097833872 ERROR upload: Failed to upload file B0000002.xml, ret = -2
2008-03-20 09:42:39 Thread-4097833872 WARN upload: FxferSend: received http error in header from repository: https://URL/em/upload ERROR-400|ORA-01461: can bind a LONG value only for insert into a LONG column
All sorts of emctl actions failed to help including
emctl clearstate agent
emctl secure agent
I then discovered that several agents all had the same issue so that pointed to the management server
I restarted that with
opmnctl stopall
opmnctl startall
with no success.
I finally shutdown the EM database and rebooted the server and everything sprung back into life. I suspect it was the restart of the database that fixed the problem but I don’t know for certain.
Update 27 March, 2008
Oracle have confirmed that it is an RDBMS issue although mine is 10.1.0.4 and a restart should fix the problem. Full details below
We have Note 469077.1 (still under edit, therefore unpublished) which states:
“New and intermittent errors have started occurring after going to 10.2.0.3 from an older patch
set level or Oracle RDBMS version.
…
These symptoms can eventually be tracked to issues with shared cursors improperly evaluated lengths, which break Oracle i
n various ways. The resultant different errors seen are dependent on the datatyp
e or connectivity used. Some of the errors encountered include:
ORA-932 – inconsistent datatypes expected %s got %s
ORA-1008 – not all variables bound
ORA-1460 – unimplemented or unreasonable conversion requested
ORA-1461 – can bind a LONG value only for insert into a LONG column
ORA-1483 – ORA 1483 invalid length for DATE or NUMBER bind variable
….
While there is more than one potential source for all of the above errors in 10.2.0.3, most will be assoc
iated with a Shared Cursor problem. This note addresses some of the most common
sources found when using 10.2.0.3 If you are not using 10.2.0.3, there are other
known bugs which have similar shared cursor issues which will be covered in oth
er notes to be provided in the future.”
Posted in Grid control and agents, Oracle | Tagged ERROR-400 ORA-01461 emctl | 4 Comments »