Oracle DBA – A lifelong learning experience

Valuable BAU OEM reports – some examples

Posted by John Hallas on July 18, 2014

A well configured OEM 12c (or any previous version of your choice) is a fantastic enterprise tool and the reporting capabilities are a major asset.

I am going to share three examples of reports we use and find valuable and if anybody has any  other good ideas I am happy  to hear about them.

I don’t think I wrote any of them but I almost certainly identified the requirement and instigated the creation of each report. I have masked out obvious details so the screen shots do look messy.

For anybody new to OEM reports I blogged a post about creating a new report some time ago

http://jhdba.wordpress.com/2011/03/30/using-oem-reports-to-show-psu-levels-across-the-estate/

Location of Primary and Standby databases

Like many organisations we have 2 data centres and we use DataGuard extensively. We have a naming convention that has the 8th character of the database name being either A or B which signifies Primary or Standby database. Over time as we do failovers and move things around you cannot always assume that A is the primary database. So I wanted a way of telling quickly what was impacted if we had issues at one site or another.

The report is produced daily and emailed out to the DBA team. In the event of a DR that is our first point of call to see what work is required. The name of each site shows against the server name. This can be achieved because we have a file on a Unix server that contains all the server names and their site location. We create a small table (server_sites)  in the OEM database with the name of the production servers in and the site and then we can join it to the mgmt$target table. We don’t add new production servers that often so it is not too much of an overhead to maintain the table.

select
a.target_name, a.target_type, a.type_qualifier3, a.type_qualifier4, a.host_name, b.value, c.site
from
(select *
from mgmt$target
where 
-- type_qualifier2 in('Primary','Physical Standby')
type_qualifier4 in('Primary','Physical Standby')) a,
mgmt$db_init_params b,
MORRDBA.SERVER_SITES c
where a.target_name = b.target_name
and a.target_name like ('%PRD%')
and b.name = 'dg_broker_start'
and c.server = a.host_name
order by target_name

flash2

Guaranteed Restore Points

The availability of guaranteed restore points is a great idea and we use it a lot, especially before data fixes and new code releases. We will create a GRP, apply the code pack, update data and then smoke test. If all fails miserably then we have a quick and easy way to get back, as well as the normal backup and recovery methods. A true belt and braces approach. The downside is that it is easy to forget to remove the GPR and then we have problems with the FRA running out of space.

select fo.target_name "Database",
fo.value "FB On",
to_date(oft.value,'YYYY-MM-DD HH24:MI:SS') "Oldest Flashback Time",
fo.collection_timestamp "Flashback Time Collected",
dip.collection_timestamp "Retention Target Collected",
to_number(dip.value)/60 "Retention (Hours)",
round(to_number(fo.collection_timestamp-to_date(oft.value,'YYYY-MM-DD HH24:MI:SS'))*24) "FB Available (Hours)",
round((to_number(fo.collection_timestamp-to_date(oft.value,'YYYY-MM-DD HH24:MI:SS'))*24)-(to_number(dip.value)/60)) "Difference (Hours)",
case when dip.collection_timestamp 0 then 'Yes' else 'No' end "Target Met"
from (select target_name, target_guid, value, collection_timestamp from SYSMAN.MGMT$METRIC_CURRENT where metric_label = 'Flash Recovery' and column_label = 'Flashback On' fo,
(select target_guid, value from SYSMAN.MGMT$METRIC_CURRENT where metric_label = 'Flash Recovery' and column_label = 'Oldest Flashback Time') oft,
(select target_guid, value, collection_timestamp from MGMT$DB_INIT_PARAMS where name = 'db_flashback_retention_target') dip
where fo.target_guid = oft.target_guid
and fo.target_guid = dip.target_guid
and fo.target_guid in (select distinct member_target_guid from MGMT$TARGET_FLAT_MEMBERS where aggregate_target_name in ('Production_Systems','Non_Production_Systems'))

flash1

Databases with no flashback enabled

Reasonably self-evident and all databases should have flashback enabled so it is good to pick up on those that have not got it enabled

SELECT
m.target_name,
t.type_qualifier4  AS Role,
m.column_label     AS Flashback,
m.value            AS Status
FROM
mgmt$metric_current m,
mgmt$target t
WHERE m.metric_label = 'Flash Recovery'
AND m.column_label = 'Flashback On'
AND m.value = 'NO'
AND m.target_name like '%PRD%'
AND t.type_qualifier4 in('Primary','Physical Standby')
AND t.target_name=m.target_name
AND t.target_guid=m.target_guid
order by t.type_qualifier4 ,
m.value

flash

Posted in Grid control and agents, Oracle | Tagged: , , | 1 Comment »

The value of audit_trail=DB,EXTENDED

Posted by John Hallas on July 15, 2014

I was recently reading a blog entry by Dominic Brooks regarding auditing and I was intrigued by the line referring to the audit_trail parameter being set to DB, EXTENDED

Behaves the same as AUDIT_TRAIL=DB, but also populates the SQL bind and SQL text CLOB-type columns of the SYS.AUD$ table, when available.

Nothing wrong with that and straight from the manual but I was surprised that they were CLOBS. However on looking at them they are NVARCHAR2(2000) so CLOB-type. No issues there.

I have long thought that there were 3 benefits of having the extended value set in audit_trail

1)      Adds the DBID to the syslog file when using OS auditing for SYSDBA and SYSOPER activities. No real issue if you only have a single database on the server but if running several databases it is really mandatory otherwise you cannot tell the entries apart.

My research shows that this is no longer true from 11GR2 onwards and the DBID is always written to the syslog file now

2)      To capture the sql binds and text of any statement run as SYSDBA. This is fundamentally the purpose of using a syslog file, otherwise the DBA can do bad things and delete the entries from the AUD$ table. Having the syslog file set to capture that information and not be editable by other than the root user means that a record of all activity is kept.

Doing some testing, again with the jolt provided by Dom’s blog I found that it did not matter whether the audit_trail was DB or DB,EXTENDED, all the activity and values carried out by SYSDBA were written to that file.

From the documentation

Auditing User SYS and Users Who Connect as SYSDBA and SYSOPER
#############################################################
You can fully audit sessions for users who connect as SYS, including all users connecting using the SYSDBA or SYSOPER privileges. This enables you to write the actions of administrative users to an operating system file, even if the AUDIT_TRAIL parameter is set to NONE, DB, or DB, EXTENDED. Writing the actions of administrator users to an operating system audit file is safer than writing to the SYS.AUD$ table, because administrative users can remove rows from this table that indicate their bad behavior.

 

3)      To catch the activities of ‘ordinary’ users and to record the values of their sql statements, provided sufficient auditing has been enabled

A simple example

User AUDIT_TEST owns a table TEST1

 audit update table, delete table, insert table by AUDIT_TEST by access

declare

Begin

For i in 1..5loop

Insert into test1 values ('Peter',i);

End loop;

commit;

End;

/

select sql_bind,sql_text from dba_audit_trail where username='AUDIT_TEST'
 ACTION_NAME SQL_BIND   SQL_TEXT
------------ ---------- ----------------------------------------
INSERT       #1(1):1   INSERT INTO TEST1 VALUES ('Peter',:B1 )
INSERT       #1(1):2   INSERT INTO TEST1 VALUES ('Peter',:B1 )
INSERT       #1(1):3   INSERT INTO TEST1 VALUES ('Peter',:B1 )
INSERT       #1(1):4   INSERT INTO TEST1 VALUES ('Peter',:B1 )
INSERT       #1(1):5   INSERT INTO TEST1 VALUES ('Peter',:B1 )

Setting the audit_trail parameter to DB we do not see the values used which really makes whole exercise of auditing rather pointless.

INSERT
INSERT
INSERT
INSERT
INSERT
INSERT

 

So in summary, 2 out of my 3 reasons for enabling EXTENDED auditing have been blown out of the water. However the remaining one is justification in itself and I see no reason why every production system should not have it enabled.

At my site ENABLED and writing to a  SYSLOG  file is enabled by default as part of the build on every database we have . We have gone a little too far as the DBAs do not have access to even read the file never mind delete entries but as I sit down with external auditors a couple of times a year I do know they are always impressed that I can demonstrate that even with SYSDBA privileges all my activities are securely audited.

Posted in 11g new features, security | Tagged: , , , , , | 2 Comments »

Tidying up SYSAUX – removing old snapshots which you didn’t know existed

Posted by John Hallas on July 9, 2014

This post runs in parallel with a post I made 5 years ago and which is still the most read one on  this blog

http://jhdba.wordpress.com/2009/05/19/purging-statistics-from-the-sysaux-tablespace/

It will show how to reduce space taken up in the  SYSAUX tablespace by because you have many more AWR snapshots being retained than you think.

Firstly lets take an example database and we can see that we are using 92Gb  of space

 

set linesize 120
set pagesize 100
col ash form a30
col retention form a30
col snap form a30

COLUMN "Item" FORMAT A25
COLUMN "Space Used (GB)" FORMAT 999.99
COLUMN "Schema" FORMAT A25
COLUMN "Move Procedure" FORMAT A40

SELECT  occupant_name "Item",
space_usage_kbytes/1048576 "Space Used (GB)",
schema_name "Schema",
move_procedure "Move Procedure"
FROM v$sysaux_occupants
WHERE occupant_name = 'SM/AWR'
ORDER BY 1
/

 

Item                     Space Used (GB) Schema                   Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------

SM/AWR                             91.88                                      SYS

Now we are looking to see how long we should be retaining AWR snapshots for and what we actually have

select sysdate - a.sample_time ash,
sysdate - s.begin_interval_time snap,
c.RETENTION
from sys.wrm$_wr_control c,
(
select db.dbid,
min(w.sample_time) sample_time
from sys.v_$database db,
sys.Wrh$_active_session_history w
where w.dbid = db.dbid group by db.dbid
) a,
(
select db.dbid,
min(r.begin_interval_time) begin_interval_time
from sys.v_$database db,
sys.wrm$_snapshot r
where r.dbid = db.dbid
group by db.dbid
) s
where a.dbid = s.dbid
and c.dbid = a.dbid;

 

ASH                           SNAP                           RETENTION

—————————— —————————— ——————————

+000000875 22:22:41.045       +000000030 12:22:28.323       +00030 00:00:00.0

We want to keep 30 days worth of snapshots – we have set the retention period to that  – but we have 875 days worth of active session history.

I rather smugly said let’s show an example and then I produce  a schema with nearly 3 years worth of snapshots being maintained but there is no trickery involved. Looking around at other systems I can find a few similar examples and it is always down to the same issue. Let’s look further.

select table_name, count(*)
from dba_tab_partitions
where table_name like 'WRH$%'
and table_owner = 'SYS'
group by table_name
order by 1;

 

TABLE_NAME                       COUNT(*)
------------------------------ ----------
WRH$_ACTIVE_SESSION_HISTORY             2
WRH$_DB_CACHE_ADVICE                    2
WRH$_DLM_MISC                           2
WRH$_EVENT_HISTOGRAM                    2
WRH$_FILESTATXS                         2
WRH$_INST_CACHE_TRANSFER                2
WRH$_INTERCONNECT_PINGS                 2
WRH$_LATCH                              2
WRH$_LATCH_CHILDREN                     2
WRH$_LATCH_MISSES_SUMMARY               2
WRH$_LATCH_PARENT                       2
WRH$_OSSTAT                             2
WRH$_PARAMETER                          2
WRH$_ROWCACHE_SUMMARY                   2
WRH$_SEG_STAT                           2
WRH$_SERVICE_STAT                       2
WRH$_SERVICE_WAIT_CLASS                 2
WRH$_SGASTAT                            2
WRH$_SQLSTAT                            2
WRH$_SYSSTAT                            2
WRH$_SYSTEM_EVENT                       2
WRH$_SYS_TIME_MODEL                     2
WRH$_TABLESPACE_STAT                    2
WRH$_WAITSTAT                           2

24 rows selected.

There is the problem, all the WRH$ data is held in 2 partitions and the overnight house-keeping job does not have time to finish it’s tidy-up before it’s 15 minutes of fame is over again for another 24 hours.

Two components of Server Manageability (SM) components that reside in the SYSAUX tablespaces cause the problem. The components involved are the Automatic Workload Repository (AWR) and Optimizer Statistics History (OPTSTAT). Both of these components have retention periods associated with their data, and the MMON process should run nightly, as part of the scheduled maintenance tasks, to purge data that exceeds these retention periods. From version 11G onwards, the mmon purging process has been constrained to a time-limited window for each of the purges, if this window is exceeded the purging stops and an ORA-12751 error is written to an m000 trace file.

For the AWR data, held in tables with names commencing with WRH$, the probable cause is due to fact that a number of the tables are partitioned. New partitions are created for these tables as part of the mmon process. Unfortunately, it seems that the partition splitting process is the final task in the purge process. As the later partitions are not split they end up containing more data. This results in partition pruning within the purge process becoming less effective.

For the OPTSTAT data, held in tables with names commencing with WRI$, the cause is more likely to be related to the volume of data held in the tables. WRI$ tables hold historical statistical data for all segments in the database for as long as specified by the stats history retention period. Thus, if there database contains a large number of tables with a long retention period – say 30 days, then the purge process will have an issue trying to purge all of the old statistics within the specified window.

I also think that two scenarios lead to this problem

1) SYSAUX running out of space, which I know has happened on at least one occasion and the volume of data left in a partition is too much to handle in a constrained time-window and so continues to grow in the same partition.

2) If the database is shut down over the period of the maintenance task and again the volume in the partition becomes too large to handle in a short-time.

 

The resolution to this problem is split into two stages.

Firstly a temporary manual fix is required to rectify the partition splitting failure. A job needs to scheduled on the database, to be run as the sys user which executes the following code:This job will force the splitting of partitions for the WRH$ tables, and should be scheduled to run on a daily basis until the number of partitions per table exceeds the AWR retention period.

begin
 execute immediate 'alter session set "_swrf_test_action" = 72';
 end;

What I do is drop the retention period to 8 days ( or maybe less) and then run this procedure once a day. It can take up to an hour but afterwards you can see the additional partitions have been created.

TABLE_NAME                       COUNT(*)
------------------------------ ----------
WRH$_ACTIVE_SESSION_HISTORY             3
WRH$_DB_CACHE_ADVICE                    3
WRH$_DLM_MISC                           3
WRH$_EVENT_HISTOGRAM                    3
WRH$_FILESTATXS                         3
WRH$_INST_CACHE_TRANSFER                3
WRH$_INTERCONNECT_PINGS                 3
WRH$_LATCH                              3
WRH$_LATCH_CHILDREN                     2
WRH$_LATCH_MISSES_SUMMARY               3
WRH$_LATCH_PARENT                       2
WRH$_OSSTAT                             3
WRH$_PARAMETER                          3
WRH$_ROWCACHE_SUMMARY                   3
WRH$_SEG_STAT                           3
WRH$_SERVICE_STAT                       3
WRH$_SERVICE_WAIT_CLASS                 3
WRH$_SGASTAT                            3
WRH$_SQLSTAT                            3
WRH$_SYSSTAT                            3
WRH$_SYSTEM_EVENT                       3
WRH$_SYS_TIME_MODEL                     3
WRH$_TABLESPACE_STAT                    3


Once the correct number of partitions have been created the old data may be automatically purged by the mmon process. If this does not occur, it will be necessary to manually purge the data from the AWR tables.

Firstly turn off the production of snapshots for the duration of this activity:

 

get the dbid for this database

select dbid from v$database;

<strong>-- get current snap interval (in minutes)</strong>

select extract( day from snap_interval) *24 *60

       + extract( hour from snap_interval) *60

       + extract( minute from snap_interval) snap_interval

       from wrm$_wr_control where dbid = ;

<strong>-</strong><strong>- disable snaps by setting interval to 0</strong>

exec dbms_workload_repository.modify_snapshot_settings(interval=> 0, dbid => <dbid>)

Then determine the maximum snapshot to be purged from the database by querying the maximum snap-id outside of the retention period:

select max(snap_id) max_snap

from wrm$_snapshot

where begin_interval_time

< (sysdate - (select retention

from wrm$_wr_control

where dbid = ));

This will return the maximum snapshot to be purged

exec dbms_workload_repository(1, <max_snap>, <dbid>)

The AWR tables will not purge if baselines exist, so check and remove as required. The following should return only 1 row called SYSTEM_MOVING_WINDOW:

select baseline_name,creation_time

from dba_hist_baseline;

If other baselines are found they will need to be removed:

exec dbms_workload_repository.drop_baseline(baseline_name => <baseline>);

If we look at the current retention period in this code it is set to 15 days, so we set it to 8 days

select d.dbid,w.snap_interval,w.retention from DBA_HIST_WR_CONTROL w, v$database d where w.dbid = d.dbid;

Begin DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(11520,0); end;
begin</strong> DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(129600,0); <strong>end</strong>;

Then we purge the snapshots ( – in ranges if there a are a lot as in this example). This can take several hours

EXEC dbms_workload_repository.drop_snapshot_range(low_snap_id=>100000, high_snap_id=>114159);

EXEC dbms_workload_repository.drop_snapshot_range(low_snap_id=>116261, high_snap_id=>116265);

Once all that is complete it is just a  matter of tidying up the indexes

ASH    SNAP    RETENTION —————————————————————————

+000000008 23:47:01.421    +000000008 23:46:55.865    +00008 00:00:00.0


Item                     Space Used (GB) Schema                   Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             93.70 SYS
select 'alter index '||segment_name||' rebuild online parallel (degree 4);' from dba_segments where tablespace_name= 'SYSAUX' and segment_name like 'WRH$_%' and segment_type='INDEX' order by segment_name;

double check for unusable indexes

 

select 'alter index '||s.segment_name||' rebuild online parallel (degree 4);'

from dba_segments s,  dba_indexes i where s.tablespace_name= 'SYSAUX'

and s.segment_name like 'WRH$_%'

and s.segment_type='INDEX'

and i.status = 'UNUSABLE'

and i.index_name = s.segment_name

order by s.segment_name

&nbsp;

select 'alter table '||segment_name||' move tablespace sysaux ;' from dba_segments where tablespace_name= 'SYSAUX' and segment_name like 'WRH$_%' and segment_type = 'TABLE'   order by segment_name
Item                     Space Used (GB) Schema                  Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             10.05 SYS

 

I hope that has been useful and I would be pleased to hear if others see the same issues on some databases whereby the actual data retained is much lonher that the snapshot retention period would suggest

Posted in Oracle | Tagged: , , , , , , , , | 9 Comments »

Bugs with rm -rf from within asmcmd

Posted by John Hallas on July 8, 2014

Performing an rm -rf operation is normally a simple operation albeit risky if you are in the wrong folder. However within ASMCMD there are a couple of bugs associated with it.

Here are two examples and workrounds

FRA full - warning in the alert log  - action needs to be taken - this is an 11.2.0.4 database on OEL (5.8)
ORA-19815: WARNING: db_recovery_file_dest_size of 10484711424 bytes is 100.00% used, and has 0 remaining bytes available.
 ARCH: Error 19809 Creating archive log file to '+FRA'
Errors in file /app/oracle/diag/rdbms/tst11204/TST11204/trace/TST11204_ora_13209.trc:
ORA-16038: log 1 sequence# 148 cannot be archived
ORA-19809: limit exceeded for recovery files
ORA-00312: online log 1 thread 1: '+DATA/tst11204/redo01.log'
USER (ospid: 13209): terminating the instance due to error 16038

 

Let’s look at the disk from asmcmd and we see a lot of old folders that we do not need

ASMCMD [+FRA/TST11204/ARCHIVELOG] > ls
2013_12_08/
2013_12_09/
2013_12_10/
2013_12_11/
2013_12_12/
2013_12_13/
2013_12_14/
2013_12_15/
2013_12_16/
2013_12_17/
2013_12_18/
2013_12_19/
2013_12_20/
2013_12_21/
2013_12_22/
2013_12_23/
2013_12_24/

Lets remove them all

 rm -r 2013*

You may delete multiple files and/or directories.
Are you sure? (y/n) y
ORA-29261: bad argument
ORA-00600: internal error code, arguments: [ORA_NPI_ERROR], [600], [ORA-00600: internal error code, arguments: [KFCHK_ERRRET], [kfa.c], [3025], [], [], [], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], []
ORA-06512: at line 4 (DBD ERROR: OCIStmtExecute)

rm -r 2013_12_08
You may delete multiple files and/or directories.
Are you sure? (y/n) y
ORA-29261: bad argument
ORA-00600: internal error code, arguments: [ORA_NPI_ERROR], [600], [ORA-00600: internal error code, arguments: [KFCHK_ERRRET], [kfa.c], [3025], [], [], [], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], []
ORA-06512: at line 4 (DBD ERROR: OCIStmtExecute)

We have hit a known bug – only applies to 11.2.0.4 but I am sure I have seen manifestations of this issue on previous version of 11GR2 Read the rest of this entry »

Posted in Oracle | Tagged: , , , , , | Leave a Comment »

Converting an SCN to a timestamp

Posted by John Hallas on June 27, 2014

Something nice and simple

It is easy to get the current SCN from the database and to map between SCN and timestamp because from 10g onwards there are 2 functions to convert between SCN and TIMESTAMP

In 9i there was also a means of mapping an SCN to a TIMESTAMP. This involves the system view SYS.SMON_SCN_TIME. The view no longer exists in 12c. One limitation is that it only holds 1440 rows (well it does at 9i but I can see 2711 rows in an 11GR1 database and 2500 in an 11GR2 one) and therefore cannot hold more than 5 days worth of history. There is a good blog entry explaining this at http://orainternals.wordpress.com/2012/01/19/scn-what-why-and-how/

I will demonstrate the methods below

 

col current_scn form 999999999999999999

col TIMESTAMP_TO_SCN form 9999999999999999

col NOW10G form 99999999999999999999

col NOW9I form 99999999999999999999

col SCN form 99999999999999999999

col TIMESTAMP form 99999999999999999999

alter session set NLS_DATE_FORMAT='DD-MON-YY HH:MI:SS';

 

Normal 10G onward syntax

 

 select current_scn from v$database;

 

       CURRENT_SCN

-------------------

     10900098356623

 

To convert an SCN to a TIMESTAMP the SCN_TO_TIMESTAMP (10g onwards) is used

 

select SCN_TO_TIMESTAMP(10900098356623) from dual;

 

SCN_TO_TIMESTAMP(10900098356623)

---------------------------------------------------------------------------

27-JUN-14 07.31.31.000000000 AM

 

 

To get a SCN from a timestamp

 

SELECT TIMESTAMP_TO_SCN(order_date) FROM orders
   WHERE order_id = 5000;

TIMESTAMP_TO_SCN(ORDER_DATE)
----------------------------
                      574107

 

 

To get the current SCN in 9i only

select max(scn_bas) SCN_BASE from smon_scn_time;

 

 SCN_BASE

----------

3766338378

 

select time_dp TIMESTAMP, scn_wrp*4294967296+SCN_bas SCN from sys.smon_scn_time where scn_bas = '3766338378'

TIMESTAMP                           SCN

------------------ ---------------------

27-JUN-14 06:44:26       10900098368330

 

 

 


 

 

 

 

 

Posted in Oracle | Tagged: , , , | Leave a Comment »

Another really enjoyable UKOUG Database SIG

Posted by John Hallas on May 8, 2014

Today’s venue was the Metropole Hotel in the centre of Leeds and there was a good attendance, encouraged by a strong agenda.

After introductions, health and safety and UKOUG information it was straight into techie talk with Phil Davies from Oracle doing his normal support update (although he does share duties with Owen Ireland). Invariably I make more notes from this session than most others I hear because of the wealth of information it contains. Snippets I jotted down that are worthy of sharing are :-

12.1.0.2 RDBMS is due out Jul-Dec 2014 and is likely to contain new functionality that did not get into the first 12c release – I think in-memory will be one such feature. A straw poll of members saw very little uptake of 12c in production and some but not much take-up in non-production.

Oracle Clud Control (OEM) 12.1.0.4 is due very soon – Q2 2014

I was intrigued enough about MoS note 1557478.1 about a bug with transparent huge pages to make a note of it for further investigation and the same goes for 17761775.8 – Bug 17761775  ORA-600 [kclchkblkdma_3] ORA-600 [3020] or ORA-600 [kcbchg1_16] Join of temp and permanent table in RAC might lead to corruption

Next up was Phil Brown of E-DBA who ran an open forum where the audience picked from a selection of topics.

The three topics covered in about 70 minutes were

1)      Consolidation v Virtualisation

2)      The worst SQL statement in the world

3)      DBA interviews

 

All three allowed plenty of audience participation and competing views and I think almost everyone in the room provided input. My input regarding DBA interviews was a tip given to me several years ago which I think is very valid. Even if you are not thinking of moving on keep updating your CV so it is an accurate record of what you have been doing over the last couple of years. Read the rest of this entry »

Posted in 12c new features, Oracle, UKOUG | Tagged: , , | Leave a Comment »

In praise of bloggers (along with sql_profiles and RAC interconnect pings)

Posted by John Hallas on March 12, 2014

Today I am going to mention two articles which I came across in the last few days whilst investigating problems and talk about the background to the problems but also praise the articles. I am sure many of us have run the SQL Tuning advisor from within OEM or using the DBMS_SQLTUNE package from the command line and often it recommends a sql_profile that should be applied, (invariably giving 99.98 perceived benefits). Now I see this as both a good thing and a bad thing. Yes it is nice to apply the profile and hopefully fix the problem but I would also like to know what the profile is doing and that is well-hidden.  I agree that there is the option to look at current and new explain plans but sometimes it is difficult to really understand what hints have been used to get the new plan, especially on a statement that has many lines in the explain plan. So why is this important might you ask, just go and select the apply button and forget about it. Well I have two very good reasons why it is important.

Read the rest of this entry »

Posted in Oracle | Tagged: , , , , , , , , , , , , | 1 Comment »

Don’t forget the emctl reload agent command

Posted by John Hallas on March 11, 2014

Just a quick entry to show the use of a command that I had forgotten existed but it seemed to work nicely

Problem  – OEM12C agent had been down for a few days on a non-production server and more than the maximum number of files had been created ready to be uploaded.

emctl status agent

Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Agent is Not Running
emctl start agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
Starting agent ........................................................................................................................... started but not ready.

emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Agent Version     : 12.1.0.2.0
OMS Version       : 12.1.0.2.0
Protocol Version  : 12.1.0.1.0
Agent Home        : /app/oracle/product/gc12.1/agent12c
Agent Binaries    : /app/oracle/product/gc12.1/core/12.1.0.2.0
Agent Process ID  : 2930
Parent Process ID : 2806
Agent URL         ::1830/emd/main/
Repository URL   ::4900/empbs/upload
Started at        : 2014-03-11 12:45:03
Started by user   : oracle
Last Reload       : (none)
Last successful upload                       : 2014-03-11 12:48:27
Last attempted upload                        : 2014-03-11 12:48:27
Total Megabytes of XML files uploaded so far : 1.07
Number of XML files pending upload           : 3,118
Size of XML files pending upload(MB)         : 8.51
Available disk space on upload filesystem    : 26.60%
Collection Status                            : [COLLECTIONS_HALTED(
  UPLOAD_SYSTEM Threshold (UploadMaxNumberXML: 5000) exceeded with 5001 files)]
Heartbeat Status                             : Ok
Last attempted heartbeat to OMS              : 2014-03-11 12:48:11
Last successful heartbeat to OMS             : 2014-03-11 12:48:11
Next scheduled heartbeat to OMS              : 2014-03-11 12:49:11

Edit $ORACLE_HOME/sysman/config/emd.properties and change the entry to 6000 from 5000 for the line

UploadMaxNumberXML=5000

Then instead of the time-consuming process of stopping and starting the agent

Emctl reload agent

That works very quickly and the files were processed few in a few minutes and then I just re-edited the properties file back again and performed a reload

Posted in Grid control and agents, Oracle | Tagged: , | 1 Comment »

aud$ and ASSM – a good combination

Posted by John Hallas on March 6, 2014

We had a 8 node RAC cluster that was showing evidence of connections timing out. The following AWR report segments indicates a high level of login activity and blocking on the AUD$ table and segments.

ex3

ex1

ex2

None of it conclusive but what was the kicker was the fact that the aud$ table was still in the system tablespace which is manually  managed and therefore automatic segment space management is not coming into play.

Over a 24 hour period there were over 50K connections, all being audited and the aud$ table was on the system tablespace which was manual and not running ASSM.

On all our systems we migrate the aud$ table to an ASSM managed tablespace (normally SYSAUX) after the build but this database had been delivered by a 3rd party and was not under our direct control.

I was pretty certain that moving that table would relieve the symptons of contention on the aud$ segments, which were being exaggerated by the 8 RAC nodes.

The following test case indicates the differences. I have taken the code used from a site which discusses freelist contention and ASSM

Create a test table in  TEST a manually managed tablespace, run some throughput through and monitor through AWR.

Create a new table in an auto managed tablespace, repeat the tests and compare results

drop table test;
 create table test (
   x date,
   y char(255) default 'x'
 )
 storage (freelists 1)
 tablespace test;

exec dbms_workload_repository.create_snapshot();

declare
     l_job number;
 begin
     for i in 1 .. 15
     loop
         dbms_job.submit( l_job, 'do_insert;' );
     end loop;
     commit;
 end;
 /

select substr(job,1,4) "job",
        substr(schema_user,1,10) "user",
        substr(to_char(last_date,'dd.mm.yyyy hh24:mi'),1,16) "last date",
        substr(to_char(next_date,'dd.mm.yyyy hh24:mi'),1,16) "next date",
        substr(broken,1,2) "b",
        substr(failures,1,6) "failed",
        substr(what,1,32) "command"
   from dba_jobs;

The AWR top events from the first run were

Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
buffer busy waits                 1,876,080       2,873      2   63.7 Concurrency
enq: HW - contention                227,951       1,091      5   24.2 Configuration
DB CPU                                              469          10.4
db file sequential read             118,194         108      1    2.4 User I/O
log file switch (checkpoint in           60          10    166     .2 Configuration

followed by the second run using  the tablespace that was ASSM managed

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU                                            1,696          24.4
log file switch (checkpoint in        1,775         938    528   13.5 Configuration
db file sequential read             724,504         630      1    9.1 User I/O
log file switch completion            2,344         434    185    6.3 Configuration
buffer busy waits                   987,073         260      0    3.7 Concurrency

The overall result was that we had a similar number of logins, no busy segments on the aud$ table and no timeouts. Job done

Posted in 11g new features, Oracle | Tagged: , , | 6 Comments »

The ‘rfs writes’ event on a DataGuard standby database

Posted by John Hallas on March 5, 2014

It is well known that poor performance on the standby server of a DataGuard pair can affect the performance of the primary database. This post shows an example and how to use the view GV$EVENT_HISTOGRAM to track down an issue.

The databases were 11.2.0.1 on HPUX. I had been seeing alerts from OEM to state that the standby was seeing lag_apply delays when applying redo to standby. Looking at the primary database alert log I could see the entries

ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Errors in file /app/oracle/diag/rdbms/xxxprd1a/BSMPRD1A/trace/xxxPRD1A_lgwr_24722.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
Error 16198 for archive log file 1 to 'xxxPRD1B'
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED

That seemed to correct itself later on but the timeout error was indicative of a network problem  – well at least that was my original hypothesis.

However I have a script which I call rfs_writes.sql which I use on the standby database quite often and once I had run that I was sent in a different direction. Read the rest of this entry »

Posted in Oracle | Tagged: , , , , | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 190 other followers