Archive for the ‘Grid control and agents’ Category

h1

Forced remove of targets from OEM repository

April 8, 2009

I blogged about how to remove targets from OEM <a href=”http://jhdba.wordpress.com/2009/01/07/removing-a-grid-target-from-the-oms/”> removing a Grid target </a> and I used my own blog entry yesterday to try and force the removal of several entries. These appeared to work but when I tried adding new targets (we were migrating databases fom one server to another) I got the error message

 java.sql.SQLException: ORA-20600: The specified target is in the process of being deleted.(target name = SID)(target type = oracle_database)(target guid = 21D8EFD67CCF409D7CDB41DCFD1F9D94)
ORA-06512: at “SYSMAN.TARGETS_INSERT_TRIGGER”, line 36
ORA-04088: error during execution of trigger ‘SYSMAN.TARGETS_INSERT_TRIGGER’
ORA-06512: at “SYSMAN.EM_TARGET”, line 1918
ORA-06512: at “SYSMAN.MGMT_TARGET”, line 2705
ORA-06512: at line 1 

A colleague, Allan Ho, looked at the problem and resolved it by looking in sysman.mgmt_targets_delete

 select * from mgmt_targets_delete;

 To delete the entry use :

 

 begin
    mgmt_admin.delete_target('SID','oracle_database');
 end;

 You can also force deletion by using :

 

 begin
    mgmt_admin.delete_target_internal('SID','oracle_database');
 end;

The dynamic sql would be

 select 'execute sysman.mgmt_admin.delete_target_internal ('''||target_name||''','''||target_type||'''); ' from sysman.mgmt_targets_delete;
h1

Removing alerts from Enterprise manager Grid Control

February 18, 2009

We have an Enterprise manager Grid control morning check report that indicates issues across all our environments.

The list of checks include :

Usable flash recovery area less than 20%

Filesystems over 90% used

Databases not backed up within 1 day and not blacked out and not a physical standby

Dataguard status (targets not blacked out)

Alert log errors

Various specific job checks

The alert log query we use is a view based on this query

select distinct t.target_name,

m.column_label,

smh.key_value,

smh.string_value

from mgmt_targets t,

mgmt_metrics m,

mgmt_string_metric_history smh

where t.target_guid = smh.target_guid

and m.metric_guid = smh.metric_guid

and m.statefull = 0

and m.metric_name like ‘alertLog%’

However that still requires us to logon to each database and click on the alert_log link under diagnostic options on the home purge. We then need to purge the alerts.

Looking for a better way to do this I traced an EM session on the OMS repository database whilst purging these alerts. This provided me with the following statement which can be used to delete all outstanding alerts for every system.

delete from mgmt_string_metric_history where metric_guid in (

select metric_guid from mgmt_metrics where statefull=0 and metric_name like ‘alertLog%’);

This is set up as a script on the OMS database server. There is no commit within the script. This is intentional so that the count of deleted records should match the number of alerts outstanding on the morning check report. It is also intentional that we have not created this as a EM job. The reason is that it will then perform an auto-commit which does not allow any regression if the record count is different from the number of alerts.

I hope the idea of a morning check report proves useful and the sql statements listed can be utilised.

h1

11G DB Console with EM Grid Agent

February 10, 2009

 

This blog shows how to run  DB console in conjunction with Enterprise Manager Grid.

 

Why would I want to do both, well purely because DB console is at 11.1.0.7 whereas the latest version of EM Grid agent is 10.2.0.4. The new db console has additional functionality to manage RAC, RMAN and Data Guard which is not available from EM.

 

The initial errors I received when trying to create a db console repository was that the SYSMAN user already existed. This is created within the database as part of the DBCA build. Once that user was dropped I received one error after another until I finally got the order of commands correct. From the source database (not the  EM OMS database)

 

drop user sysman cascade;

drop role mgmt_user;

drop user mgmt_view cascade;

drop public synonym MGMT_TARGET_BLACKOUTS;

drop PUBLIC SYNONYM MGMT_AVAILABILITY;

Now we drop the repository (if it exists) and any existing public synonyms or views that may cause the repository create to fail. Bug 6111734 details some issues in this area and Metalink note 470246.1 gives some information, but not much

 

$ORACLE_HOME/sysman/admin/emdrep/bin/RepManager host 1522 SID -action drop

It requires some input and then lists all the actions that are taking place. Note the use of quiesce database.

 

Enter SYS user’s password :

Enter repository user name : SYSMAN

Getting temporary tablespace from database…

Found temporary tablespace: TEMP

Checking SYS Credentials … rem error switch

OK.

rem error switch

Dropping the repository..

Quiescing DB … Done.

Checking for Repos User … Exists.

Repos User exists..

Clearing EM Contexts … OK.

Dropping EM users …

Done.

Dropping Repos User … Done.

Dropping Roles/Synonymns/Tablespaces … Done.

Unquiescing DB … Done.

Dropped Repository Successfully.

The repository can be created in the normal manner using either

 

 

emca –repos create

emca –config dbcontrol db (to setup, this command can be repeated)

or

 

emca -config dbcontrol db -repos create

 

 

h1

Agent Process metric errors on HP unix servers

January 7, 2009

Within Enterprise Manager a number of HP servers were showing Collection Failure errors on the Agent Process Metric. These were all Itanium but I understand that the issue also applies to PA-RISC chips

 

sh: ps: not found. sh: tail: not found. sh: ps: not found. Could not execute ps -ef -o ‘ppid pid vsz sz sz args’: at /app/oracle/product/gc10.2/agent10g/sysman/admin/scripts/emdprocesschars.pl line 210.

 

I was unable to find anything on Google or Metalink to help me answer the problem until I sent out a message to my fellow DBAs and I got two responses back. These were effectively the same thing  but the view was that it was better to change the emdprocessorschars.pl file as that was guaranteed to fix the issue whereas a profile setting might not be picked up

 

Stop the agent before the fix and restart it afterwards

 

FIX 1

 

Amend the .profile to add

 Export UNIX95=””

FIX 2

 

You need to edit /app/oracle/product/gc10.2/agent10g/sysman/admin/scripts/emdprocesschars.pl by changing: -

 # aaitghez 05/29/02 - Creation
#

$stat_offset=2;

sub countFileHandlesOfType {

 

To

 

# aaitghez 05/29/02 - Creation
#

$stat_offset=2;
$ENV{UNIX95} = “XPG4″;

sub countFileHandlesOfType {

 

h1

Removing a grid target from the OMS

January 7, 2009

The problem

 We have a data (normal usage) and a management connection to each server.

In one case the /etc/hosts file has been set up incorrectly so that the entry looked like

 

10.1.2.3 server.unix.companyname.net server-data server

 

When the agent was started it registered the targets on that server as

 

listener_server.server_data.unix.companyname.net

 

whereas we wanted it to match all the others servers and use the full name and not the alias

 

Listener_server.server.unix.companyname.net

 

I deleted all the targets I could from grid control but could not remove the host and the agent.

I tried every permutation of emctl remove target but still with no success so I decided to do it from the management server side.

  

select distinct target_name,target_type  from SYSMAN.MGMT$TARGET where target_name like ‘%server%’

 

TARGET_NAME TARGET_TYPE

—————————————————————-

server-data.unix.companyname.net:3872 oracle_emd

server-data.unix.companyname.net host

 

 

SQL> select distinct target_name,target_type  from SYSMAN.MGMT$TARGET where target_name like ‘%server%’

 

TARGET_NAME TARGET_TYPE

——————————————————————————–

server-data.unix.companyname.net:3872 oracle_emd

server-data.unix.companyname.net host

 

SQL> exec sysman.mgmt_admin.cleanup_agent(’server-data.unix.companyname.net:3872′);

 

SQL> exec sysman.mgmt_admin.cleanup_agent(’server-data.unix.companyname.net ‘);

 

PL/SQL procedure successfully completed.

 

SQL> select distinct target_name,target_type  from SYSMAN.MGMT$TARGET where target_name like ‘%server%’;

 

no rows selected

 

 

I was then in a position to do a discover and re-register the server with OMS

 

agentca -d

h1

Clearing an Enterprise manager agent that fails to upload

December 10, 2008

This is the standard set of actions that I go through when I have problems with an EM agent that a stop/start/upload does not resolve

The two types of errors I generally see are

EMD upload error: uploadXMLFiles skipped :: OMS version not checked yet..

Starting agent …………………………… started but not ready.

Follow the steps below, which can be cut and pasted

export AGENT_HOME=/app/oracle/product/gc10.2/agent10g/

Stop the agent

$AGENT_HOME/bin/emctl stop agent

Remove the old log files from AGENT_HOME/sysman/log

Delete any pending upload files from the agent home

rm -r $AGENT_HOME/sysman/emd/state/*

rm -r $AGENT_HOME/sysman/emd/upload/*

rm $AGENT_HOME/sysman/emd/lastupld.xml

rm $AGENT_HOME/sysman/emd/agntstmp.txt

rm $AGENT_HOME/sysman/emd/protocol.ini

Start the agent

$AGENT_HOME/bin/emctl start agent

Issue an agent clearstate from the agent home

$AGENT_HOME/bin/emctl clearstate agent

Force an upload to the OMS

$AGENT_HOME/bin/emctl upload agent

Finally I sometimes need to re-secure the agent

$AGENT_HOME/bin/emctl secure agent

h1

emctl agent fails to upload – ERROR-400|ORA-01461: can bind a LONG value only for insert into a LONG column

March 20, 2008

This blog is about a problem I had with an Enterprise Management agent and how I resolved it.

The symptoms are that an agent is not started and yet I can stop and start it successfully. Looking at the trace file $AGENT_HOME/sysman/RACnode/log or $AGENT_HOME/sysman/log if the node is not part of a RAC cluster I see the following error messages and emctl agent status shows lots of files waiting to load

Agent Version : 10.2.0.2.0
OMS Version : 10.2.0.3.0
Protocol Version : 10.2.0.2.0
Agent Home : /u00/app/oracle/OracleHomes/agent10g/mat051.myco.co.uk
Agent binaries : /u00/app/oracle/OracleHomes/agent10g
Agent Process ID : 20241
Parent Process ID : 20220
Agent URL : https://mat051.myco.co.uk:3872/emd/main
Repository URL : https://mat019.myco.co.uk:1159/em/upload
Started at : 2008-03-19 14:19:16
Started by user : oracle
Last Reload : 2008-03-19 14:19:16
Last successful upload : 2008-03-19 14:19:29
Total Megabytes of XML files uploaded so far : 0.07
Number of XML files pending upload : 625
Size of XML files pending upload(MB) : 35.23
Available disk space on upload filesystem : 35.86%
Last successful heartbeat to OMS : 2008-03-20 09:37:26
—————————————————————
Agent is Running and Ready

2008-03-20 09:42:39 Thread-4097833872 ERROR upload: Failed to upload file B0000002.xml, ret = -2
2008-03-20 09:42:39 Thread-4097833872 WARN upload: FxferSend: received http error in header from repository: https://URL/em/upload ERROR-400|ORA-01461: can bind a LONG value only for insert into a LONG column

All sorts of emctl actions failed to help including

emctl clearstate agent

emctl secure agent

I then discovered that several agents all had the same issue so that pointed to the management server

I restarted that with

opmnctl stopall

opmnctl startall

with no success.

I finally shutdown the EM database and rebooted the server and everything sprung back into life. I suspect it was the restart of the database that fixed the problem but I don’t know for certain.

Update 27 March, 2008

Oracle have confirmed that it is an RDBMS issue although mine is 10.1.0.4 and a restart should fix the problem. Full details below

We have Note 469077.1 (still under edit, therefore unpublished) which states:
“New and intermittent errors have started occurring after going to 10.2.0.3 from an older patch
set level or Oracle RDBMS version.

These symptoms can eventually be tracked to issues with shared cursors improperly evaluated lengths, which break Oracle i
n various ways. The resultant different errors seen are dependent on the datatyp
e or connectivity used. Some of the errors encountered include:

ORA-932 – inconsistent datatypes expected %s got %s
ORA-1008 – not all variables bound
ORA-1460 – unimplemented or unreasonable conversion requested
ORA-1461 – can bind a LONG value only for insert into a LONG column
ORA-1483 – ORA 1483 invalid length for DATE or NUMBER bind variable
….
While there is more than one potential source for all of the above errors in 10.2.0.3, most will be assoc
iated with a Shared Cursor problem. This note addresses some of the most common
sources found when using 10.2.0.3 If you are not using 10.2.0.3, there are other
known bugs which have similar shared cursor issues which will be covered in oth
er notes to be provided in the future.”