Oracle DBA – A lifelong learning experience

How does statistic gathering fit in with ITIL Change Management processes

Posted by John Hallas on September 16, 2014

In the (good) old days there was only one optimizer and plans were developed based on the precedence of the rules. For some reason I have always thought there were 11 rules but looking at the manual it tells me there are 15 rules

Nowadays the vast majority of databases will be using the CBO and gathering statistics either by using the overnight maintenance functions or manually gathering as required. I suppose there is always also a third option which is to load pre-saved statistics in on a newly created partition and then lock them.

The other day I identified some code that would perform better if we fixed a profile. Co-incidentally the profile would fix some statistics as the table was constantly having stale stats. I debated with myself whether that needed a change control and in the end decided it did not. That got me thinking about how we constantly allow changes to happen that can seriously change performance, for the better or worse with no change control at all. That of course happens when we gather statistics. I doubt many sites raise an ITIL change record when they manually gather stats and I would have thought nobody would do so for the scheduled maintenance activities and yet when, as DBAs, the first thing we ask when a job suddenly goes horribly wrong is – have the statistics changed?

If the issue gets escalated to a problem record then there is no change in the ITIL recording system that any change has been applied. I am not actually suggesting that we do raise change to gather stats as it would be well-nigh impossible to manage on some systems and we would also need to lock statistics to have a point to return to, which again is possible but a massive overhead.

So there is no point to my post really, just a general observation and a chance to air something that has crossed my mind a few times. I am happy to hear views on whether creating a profile should be flagged as a change – I suggest most will say yes as that is a definite action you are taking to make a change to the current operational process whereas gathering statistics is more like an engine fine tuning the ratio of air to petrol as it is running.

Posted in Oracle | Tagged: , , | 2 Comments »

My misunderstandings regarding when indexes would not be used

Posted by John Hallas on August 7, 2014

It is normal for bloggers including myself to post about the great things they have done  – however in this case I am blogging about why I was surprised that adding a specific index had the substantial benefits it did have.

The table contains around 32M rows, is not partitioned and is on Linux 11.2.0.3


Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
SELL_UNIT_ID                              NOT NULL VARCHAR2(20)
PRD_CD                                    NOT NULL VARCHAR2(75)
PRD_TYP_CD                                NOT NULL CHAR(3)
ITEM_LOCATION_STATUS                               CHAR(1)
LAUNCH_DATE                                        DATE
OFF_SALE_DT                                        DATE
LST_MTC_TS                                NOT NULL DATE
LST_MTC_USR_ID                            NOT NULL NUMBER
BATCHNUMBER                               NOT NULL NUMBER
ACTION                                    NOT NULL CHAR(1)
CRT_TS                                    NOT NULL DATE
CRT_USR_ID                                NOT NULL NUMBER
PROC_TS                                            DATE
S_UOM                                              CHAR(255)
FORCE_ENTER_PRICE                                  CHAR(255)
M_F_PROD_NUMBER                                    CHAR(255)

The two interesting columns are ACTION which contains one of 4 values but currently only has 3 values in and until recently did not have any ‘D’ values in – which may be quite pertinent
Statistics are up to date and gathered nightly across tables and indexes

select /*+ parallel (8) */ action, count(*) from TABLENAME  group by action;
A   COUNT(*)
- ----------
U    2019249
I   27663840
D    2030224

There were 2 indexes on the table

INDEX_NAME                     COLUMN_NAME                    COLUMN_POSITION
------------------------------ ------------------------------ ---------------
EXTSTRPRDRNG_X1                SELL_UNIT_ID                                 1
                               PRD_CD                                       2
                               PRD_TYP_CD                                   3
EXTSTRPRDRNG_X2                LST_MTC_TS                                   1
                               LAUNCH_DATE                                  2

[CODE]SELECT DISTINCT PRD_CD FROM TABLE_NAME
WHERE SELL_UNIT_ID = :1 AND ACTION != ‘D’ AND TRIM(M_F_PROD_NUMBER) = :2[/code]

So I will  expose myself to the whole world now and say that I thought :- 1) the trim function would negate any use of an index on the column M_F_PROD_NUMBER 2) the predicate ACTION != ‘D’ would stop any index being used on the ACTION column – and that would be not used anyway as the non ‘D’s were 100% of the table, at that time   The plan was as expected

--------------------------------------------------------------------------------------
| Id  | Operation          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                 |   416 |   110K|   992K  (1)| 03:18:31 |
|   1 |  HASH UNIQUE       |                 |   416 |   110K|   992K  (1)| 03:18:31 |
|*  2 |   TABLE ACCESS FULL| TABLE_NAME      |   417 |   110K|   992K  (1)| 03:18:31 |

————————————————————————————–

So I figured that an index on a combination of SELL_UNIT_ID and a function based index on TRIM(M_F_PROD_NUMBER) would be useful – maybe add ACTION in just to see if that made a difference

The existing index (also the PK) on  SELL_UNIT_ID and PRC_CD would  be also used in combination with my new index.  I did work out how to create a function on the ACTION != ‘D’ predicate and then create a FBI based on that but that is a separate matter , to follow in my next post

 

To my surprise the following index made a significant  difference to performance and I am struggling to really understand why. The original two indexes were still in place

INDEX_NAME                     COLUMN_NAME                    COLUMN_POSITION
------------------------------ ------------------------------ ---------------

EXTSTRPRDRNG_X3                SELL_UNIT_ID                                 1
                               M_F_PROD_NUMBER                              2

 

exp

The index I really wanted to create was this one, but I never got a chance to use it as the one we created did the job and saved a lot of agony and the benefits of trying a different one will be minimal and I had to take the pragmatic approach of not pushing a test

CREATE /*+ PARALLEL (8) */ INDEX EXTSTRPRDRNG_X4
> ON TABLE_NAME
   (
   SELL_UNIT_ID ASC,
   ACTION ASC,
   TRIM(M_F_PROD_NUMBER) ASC
  )

The difference in performance was quite significant  – just taking a 1 hour time slice shows the difference and later on in the day we are at 6K executions an hour

BEGIN_TIME               END_T      EXECS   ELA_EXEC   CPU_EXEC
------------------------ ----- ---------- ---------- ----------
04-AUG-14 06:00          07:00       4812     6.9738      .3663
05-AUG-14 06:00          07:00       2265    11.9742      .3955
06-AUG-14 06:00          07:00       1264      .2523      .0832

So we have a good result, but in a different manner to the way I thougt we would get it.

Putting a 10053 trace on it only select one join order and the base statistics on the table and index shows

BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: EXT_STR_PRD_RNG  Alias: EXT_STR_PRD_RNG
    #Rows: 31719095  #Blks:  5976448  AvgRowLen:  814.00  ChainCnt:  0.00
Index Stats::
  Index: EXTSTRPRDRNG_X1  Col#: 1 2 3
    LVLS: 2  #LB: 114050  #DK: 29355425  LB/K: 1.00  DB/K: 1.00  CLUF: 29124675.00
  Index: EXTSTRPRDRNG_X2  Col#: 7 5
    LVLS: 3  #LB: 121445  #DK: 160563  LB/K: 1.00  DB/K: 185.00  CLUF: 29806195.00
  Index: EXTSTRPRDRNG_X3  Col#: 1 16
    LVLS: 3  #LB: 1174735  #DK: 30493285  LB/K: 1.00  DB/K: 1.00  CLUF: 30586340.00
  Column (#2):
    NewDensity:0.000008, OldDensity:0.000013 BktCnt:254, PopBktCnt:0, PopValCnt:0, NDV:132798
  Column (#2): PRD_CD(
    AvgLen: 10 NDV: 132798 Nulls: 0 Density: 0.000008
Final cost for query block SEL$1 (#1) - All Rows Plan:
  Best join order: 1
  Cost: 1936.3367  Degree: 1  Card: 387.0000  Bytes: 104877
  Resc: 1936.3367  Resc_io: 1934.0000  Resc_cpu: 60545244
  Resp: 1936.3367  Resp_io: 1934.0000  Resc_cpu: 60545244


============
Plan Table
============
-------------------------------------------------------+-----------------------------------+
| Id  | Operation                     | Name           | Rows  | Bytes | Cost  | Time      |
-------------------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT              |                |       |       |  1936 |           |
| 1   |  HASH UNIQUE                  |                |   386 |  102K |  1936 |  00:00:24 |
| 2   |   TABLE ACCESS BY INDEX ROWID | EXT_STR_PRD_RNG|   387 |  102K |  1935 |  00:00:24 |
| 3   |    INDEX RANGE SCAN           | EXTSTRPRDRNG_X3|   414 |       |  1536 |  00:00:19 |
-------------------------------------------------------+-----------------------------------+

 

What I want to do now is see if the line ACTION != can be changed to ACTION in (‘A’,’B’,’C’)  and understand what is in the M_F_PROD_NUMBER column which is character column but only contains numbers and why it needs a trim on it  rather than getting the data into a properly designed column. I also need to do more research on why this index worked well despite my two reasons why it would not.

Richard Foote’s blog on indexing a not equal to predicate is useful and gives me the idea of trying to  amend the query  ACTION <> ‘D’ to ACTION >’D’ and adding an index on ACTION and seeing how that works out. Looks like my invisible index blog entry will come in useful for that

Posted in Oracle | Tagged: , , , | Leave a Comment »

Preventing standby databases opening in Active DataGuard mode + chopt !!!

Posted by John Hallas on August 6, 2014

It is long been a concern of mine that it is relatively easy to open a standby database  inadvertently and then be potentially liable to a licensing cost as usage is shown in the DBA_FEATURE_USAGE_STATISTICS view.

In fact we logged an SR on the subject based on experiences on failing over using DataGuard Broker

On dgmgrl switchover the Oracle restart database ‘start option’ property is not modified to ‘mount’ database even though dgmgrl modifies the Oracle restart database role property to PHYSICAL_STANDBY. Nor does a dgmgrl switchover modify the Oracle restart database ‘start option’ property to ‘open’ database even though dgmgrl modifies the Oracle restart database role property to PRIMARY. The potential is there to accidentally open a standby database and invoke Active dataguard on any reboot/restart of services which is what we do not want.

After a fair bit of discussion we got this rather unhelpful response

Starting in 11, broker no longer changes the state of the database, i.e., from mount to open.  It is up to the user what state they want their database in and to setup start_option appropriately.  
In the course of role changes or other operations that involve database restarts, the broker will restart the database based on the role of the database, i.e., open for PRIMARY database and mounted for PHYSICAL STANDBY.  
If the user wants the new standby to only be in the mounted mode for “srvctl start database …”, they must manually change the start_options for the databases appropriately after the role change – they will have to change it to open for the new primary as well.
In other words, what you are seeing is as designed!
We recommend that you leave the start option to OPEN for both databases, so that regardless of who the primary is, you can be sure it will be opened after a role change. This of course means the standby will be opened, which is okay as long as the user has paid for the Active DG option.
If the user has not paid for the Active DG option, you should modify the start option for both databases prior to the role change, so that after the role change the new primary will open and the new standby will be mounted.
The broker does not modify the startup options during a role change in the Clusterware Repository. It only modifies the role of the database.
Set the start options to OPEN for both databases and you won’t run into this issue.

They did provide a script but we ended up using a startup trigger something like this

CREATE OR REPLACE TRIGGER SYS.DB_ROLE
AFTER STARTUP ON DATABASE
DECLARE
CTL varchar(10);
BEGIN
SELECT CONTROLFILE_TYPE INTO CTL FROM V$DATABASE;
IF CTL = 'STANDBY' THEN
execute immediate 'alter system set dg_broker_start=FALSE scope=memory';
END IF ;
END DB_ROLE;

However that is not the purpose of this post. It is more about coming across an undocumented parameter from 11Gr2 which enforces that a standby cannot be opened (and activate the need for an ADG license) providing the MRP process is running. I think that came from a  post by Fritz Hoogland on the Oracle-L list. This is exactly what we wanted and a colleague Hamid Ansari prepared a summary and test of the parameter Read the rest of this entry »

Posted in 11g new features, Oracle | Tagged: , , , , | Leave a Comment »

Valuable BAU OEM reports – some examples

Posted by John Hallas on July 18, 2014

A well configured OEM 12c (or any previous version of your choice) is a fantastic enterprise tool and the reporting capabilities are a major asset.

I am going to share three examples of reports we use and find valuable and if anybody has any  other good ideas I am happy  to hear about them.

I don’t think I wrote any of them but I almost certainly identified the requirement and instigated the creation of each report. I have masked out obvious details so the screen shots do look messy.

For anybody new to OEM reports I blogged a post about creating a new report some time ago

http://jhdba.wordpress.com/2011/03/30/using-oem-reports-to-show-psu-levels-across-the-estate/

Location of Primary and Standby databases

Like many organisations we have 2 data centres and we use DataGuard extensively. We have a naming convention that has the 8th character of the database name being either A or B which signifies Primary or Standby database. Over time as we do failovers and move things around you cannot always assume that A is the primary database. So I wanted a way of telling quickly what was impacted if we had issues at one site or another.

The report is produced daily and emailed out to the DBA team. In the event of a DR that is our first point of call to see what work is required. The name of each site shows against the server name. This can be achieved because we have a file on a Unix server that contains all the server names and their site location. We create a small table (server_sites)  in the OEM database with the name of the production servers in and the site and then we can join it to the mgmt$target table. We don’t add new production servers that often so it is not too much of an overhead to maintain the table.

select
a.target_name, a.target_type, a.type_qualifier3, a.type_qualifier4, a.host_name, b.value, c.site
from
(select *
from mgmt$target
where 
-- type_qualifier2 in('Primary','Physical Standby')
type_qualifier4 in('Primary','Physical Standby')) a,
mgmt$db_init_params b,
MORRDBA.SERVER_SITES c
where a.target_name = b.target_name
and a.target_name like ('%PRD%')
and b.name = 'dg_broker_start'
and c.server = a.host_name
order by target_name

flash2

Guaranteed Restore Points

The availability of guaranteed restore points is a great idea and we use it a lot, especially before data fixes and new code releases. We will create a GRP, apply the code pack, update data and then smoke test. If all fails miserably then we have a quick and easy way to get back, as well as the normal backup and recovery methods. A true belt and braces approach. The downside is that it is easy to forget to remove the GPR and then we have problems with the FRA running out of space.

select fo.target_name "Database",
fo.value "FB On",
to_date(oft.value,'YYYY-MM-DD HH24:MI:SS') "Oldest Flashback Time",
fo.collection_timestamp "Flashback Time Collected",
dip.collection_timestamp "Retention Target Collected",
to_number(dip.value)/60 "Retention (Hours)",
round(to_number(fo.collection_timestamp-to_date(oft.value,'YYYY-MM-DD HH24:MI:SS'))*24) "FB Available (Hours)",
round((to_number(fo.collection_timestamp-to_date(oft.value,'YYYY-MM-DD HH24:MI:SS'))*24)-(to_number(dip.value)/60)) "Difference (Hours)",
case when dip.collection_timestamp 0 then 'Yes' else 'No' end "Target Met"
from (select target_name, target_guid, value, collection_timestamp from SYSMAN.MGMT$METRIC_CURRENT where metric_label = 'Flash Recovery' and column_label = 'Flashback On' fo,
(select target_guid, value from SYSMAN.MGMT$METRIC_CURRENT where metric_label = 'Flash Recovery' and column_label = 'Oldest Flashback Time') oft,
(select target_guid, value, collection_timestamp from MGMT$DB_INIT_PARAMS where name = 'db_flashback_retention_target') dip
where fo.target_guid = oft.target_guid
and fo.target_guid = dip.target_guid
and fo.target_guid in (select distinct member_target_guid from MGMT$TARGET_FLAT_MEMBERS where aggregate_target_name in ('Production_Systems','Non_Production_Systems'))

flash1

Databases with no flashback enabled

Reasonably self-evident and all databases should have flashback enabled so it is good to pick up on those that have not got it enabled

SELECT
m.target_name,
t.type_qualifier4  AS Role,
m.column_label     AS Flashback,
m.value            AS Status
FROM
mgmt$metric_current m,
mgmt$target t
WHERE m.metric_label = 'Flash Recovery'
AND m.column_label = 'Flashback On'
AND m.value = 'NO'
AND m.target_name like '%PRD%'
AND t.type_qualifier4 in('Primary','Physical Standby')
AND t.target_name=m.target_name
AND t.target_guid=m.target_guid
order by t.type_qualifier4 ,
m.value

flash

Posted in Grid control and agents, Oracle | Tagged: , , | 1 Comment »

The value of audit_trail=DB,EXTENDED

Posted by John Hallas on July 15, 2014

I was recently reading a blog entry by Dominic Brooks regarding auditing and I was intrigued by the line referring to the audit_trail parameter being set to DB, EXTENDED

Behaves the same as AUDIT_TRAIL=DB, but also populates the SQL bind and SQL text CLOB-type columns of the SYS.AUD$ table, when available.

Nothing wrong with that and straight from the manual but I was surprised that they were CLOBS. However on looking at them they are NVARCHAR2(2000) so CLOB-type. No issues there.

I have long thought that there were 3 benefits of having the extended value set in audit_trail

1)      Adds the DBID to the syslog file when using OS auditing for SYSDBA and SYSOPER activities. No real issue if you only have a single database on the server but if running several databases it is really mandatory otherwise you cannot tell the entries apart.

My research shows that this is no longer true from 11GR2 onwards and the DBID is always written to the syslog file now

2)      To capture the sql binds and text of any statement run as SYSDBA. This is fundamentally the purpose of using a syslog file, otherwise the DBA can do bad things and delete the entries from the AUD$ table. Having the syslog file set to capture that information and not be editable by other than the root user means that a record of all activity is kept.

Doing some testing, again with the jolt provided by Dom’s blog I found that it did not matter whether the audit_trail was DB or DB,EXTENDED, all the activity and values carried out by SYSDBA were written to that file.

From the documentation

Auditing User SYS and Users Who Connect as SYSDBA and SYSOPER
#############################################################
You can fully audit sessions for users who connect as SYS, including all users connecting using the SYSDBA or SYSOPER privileges. This enables you to write the actions of administrative users to an operating system file, even if the AUDIT_TRAIL parameter is set to NONE, DB, or DB, EXTENDED. Writing the actions of administrator users to an operating system audit file is safer than writing to the SYS.AUD$ table, because administrative users can remove rows from this table that indicate their bad behavior.

 

3)      To catch the activities of ‘ordinary’ users and to record the values of their sql statements, provided sufficient auditing has been enabled

A simple example

User AUDIT_TEST owns a table TEST1

 audit update table, delete table, insert table by AUDIT_TEST by access

declare

Begin

For i in 1..5loop

Insert into test1 values ('Peter',i);

End loop;

commit;

End;

/

select sql_bind,sql_text from dba_audit_trail where username='AUDIT_TEST'
 ACTION_NAME SQL_BIND   SQL_TEXT
------------ ---------- ----------------------------------------
INSERT       #1(1):1   INSERT INTO TEST1 VALUES ('Peter',:B1 )
INSERT       #1(1):2   INSERT INTO TEST1 VALUES ('Peter',:B1 )
INSERT       #1(1):3   INSERT INTO TEST1 VALUES ('Peter',:B1 )
INSERT       #1(1):4   INSERT INTO TEST1 VALUES ('Peter',:B1 )
INSERT       #1(1):5   INSERT INTO TEST1 VALUES ('Peter',:B1 )

Setting the audit_trail parameter to DB we do not see the values used which really makes whole exercise of auditing rather pointless.

INSERT
INSERT
INSERT
INSERT
INSERT
INSERT

 

So in summary, 2 out of my 3 reasons for enabling EXTENDED auditing have been blown out of the water. However the remaining one is justification in itself and I see no reason why every production system should not have it enabled.

At my site ENABLED and writing to a  SYSLOG  file is enabled by default as part of the build on every database we have . We have gone a little too far as the DBAs do not have access to even read the file never mind delete entries but as I sit down with external auditors a couple of times a year I do know they are always impressed that I can demonstrate that even with SYSDBA privileges all my activities are securely audited.

Posted in 11g new features, security | Tagged: , , , , , | 2 Comments »

Tidying up SYSAUX – removing old snapshots which you didn’t know existed

Posted by John Hallas on July 9, 2014

This post runs in parallel with a post I made 5 years ago and which is still the most read one on  this blog

http://jhdba.wordpress.com/2009/05/19/purging-statistics-from-the-sysaux-tablespace/

It will show how to reduce space taken up in the  SYSAUX tablespace by because you have many more AWR snapshots being retained than you think.

Firstly lets take an example database and we can see that we are using 92Gb  of space

 

set linesize 120
set pagesize 100
col ash form a30
col retention form a30
col snap form a30

COLUMN "Item" FORMAT A25
COLUMN "Space Used (GB)" FORMAT 999.99
COLUMN "Schema" FORMAT A25
COLUMN "Move Procedure" FORMAT A40

SELECT  occupant_name "Item",
space_usage_kbytes/1048576 "Space Used (GB)",
schema_name "Schema",
move_procedure "Move Procedure"
FROM v$sysaux_occupants
WHERE occupant_name = 'SM/AWR'
ORDER BY 1
/

 

Item                     Space Used (GB) Schema                   Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------

SM/AWR                             91.88                                      SYS

Now we are looking to see how long we should be retaining AWR snapshots for and what we actually have

select sysdate - a.sample_time ash,
sysdate - s.begin_interval_time snap,
c.RETENTION
from sys.wrm$_wr_control c,
(
select db.dbid,
min(w.sample_time) sample_time
from sys.v_$database db,
sys.Wrh$_active_session_history w
where w.dbid = db.dbid group by db.dbid
) a,
(
select db.dbid,
min(r.begin_interval_time) begin_interval_time
from sys.v_$database db,
sys.wrm$_snapshot r
where r.dbid = db.dbid
group by db.dbid
) s
where a.dbid = s.dbid
and c.dbid = a.dbid;

 

ASH                           SNAP                           RETENTION

—————————— —————————— ——————————

+000000875 22:22:41.045       +000000030 12:22:28.323       +00030 00:00:00.0

We want to keep 30 days worth of snapshots – we have set the retention period to that  – but we have 875 days worth of active session history.

I rather smugly said let’s show an example and then I produce  a schema with nearly 3 years worth of snapshots being maintained but there is no trickery involved. Looking around at other systems I can find a few similar examples and it is always down to the same issue. Let’s look further.

select table_name, count(*)
from dba_tab_partitions
where table_name like 'WRH$%'
and table_owner = 'SYS'
group by table_name
order by 1;

 

TABLE_NAME                       COUNT(*)
------------------------------ ----------
WRH$_ACTIVE_SESSION_HISTORY             2
WRH$_DB_CACHE_ADVICE                    2
WRH$_DLM_MISC                           2
WRH$_EVENT_HISTOGRAM                    2
WRH$_FILESTATXS                         2
WRH$_INST_CACHE_TRANSFER                2
WRH$_INTERCONNECT_PINGS                 2
WRH$_LATCH                              2
WRH$_LATCH_CHILDREN                     2
WRH$_LATCH_MISSES_SUMMARY               2
WRH$_LATCH_PARENT                       2
WRH$_OSSTAT                             2
WRH$_PARAMETER                          2
WRH$_ROWCACHE_SUMMARY                   2
WRH$_SEG_STAT                           2
WRH$_SERVICE_STAT                       2
WRH$_SERVICE_WAIT_CLASS                 2
WRH$_SGASTAT                            2
WRH$_SQLSTAT                            2
WRH$_SYSSTAT                            2
WRH$_SYSTEM_EVENT                       2
WRH$_SYS_TIME_MODEL                     2
WRH$_TABLESPACE_STAT                    2
WRH$_WAITSTAT                           2

24 rows selected.

There is the problem, all the WRH$ data is held in 2 partitions and the overnight house-keeping job does not have time to finish it’s tidy-up before it’s 15 minutes of fame is over again for another 24 hours.

Two components of Server Manageability (SM) components that reside in the SYSAUX tablespaces cause the problem. The components involved are the Automatic Workload Repository (AWR) and Optimizer Statistics History (OPTSTAT). Both of these components have retention periods associated with their data, and the MMON process should run nightly, as part of the scheduled maintenance tasks, to purge data that exceeds these retention periods. From version 11G onwards, the mmon purging process has been constrained to a time-limited window for each of the purges, if this window is exceeded the purging stops and an ORA-12751 error is written to an m000 trace file.

For the AWR data, held in tables with names commencing with WRH$, the probable cause is due to fact that a number of the tables are partitioned. New partitions are created for these tables as part of the mmon process. Unfortunately, it seems that the partition splitting process is the final task in the purge process. As the later partitions are not split they end up containing more data. This results in partition pruning within the purge process becoming less effective.

For the OPTSTAT data, held in tables with names commencing with WRI$, the cause is more likely to be related to the volume of data held in the tables. WRI$ tables hold historical statistical data for all segments in the database for as long as specified by the stats history retention period. Thus, if there database contains a large number of tables with a long retention period – say 30 days, then the purge process will have an issue trying to purge all of the old statistics within the specified window.

I also think that two scenarios lead to this problem

1) SYSAUX running out of space, which I know has happened on at least one occasion and the volume of data left in a partition is too much to handle in a constrained time-window and so continues to grow in the same partition.

2) If the database is shut down over the period of the maintenance task and again the volume in the partition becomes too large to handle in a short-time.

 

The resolution to this problem is split into two stages.

Firstly a temporary manual fix is required to rectify the partition splitting failure. A job needs to scheduled on the database, to be run as the sys user which executes the following code:This job will force the splitting of partitions for the WRH$ tables, and should be scheduled to run on a daily basis until the number of partitions per table exceeds the AWR retention period.

begin
 execute immediate 'alter session set "_swrf_test_action" = 72';
 end;

What I do is drop the retention period to 8 days ( or maybe less) and then run this procedure once a day. It can take up to an hour but afterwards you can see the additional partitions have been created.

TABLE_NAME                       COUNT(*)
------------------------------ ----------
WRH$_ACTIVE_SESSION_HISTORY             3
WRH$_DB_CACHE_ADVICE                    3
WRH$_DLM_MISC                           3
WRH$_EVENT_HISTOGRAM                    3
WRH$_FILESTATXS                         3
WRH$_INST_CACHE_TRANSFER                3
WRH$_INTERCONNECT_PINGS                 3
WRH$_LATCH                              3
WRH$_LATCH_CHILDREN                     2
WRH$_LATCH_MISSES_SUMMARY               3
WRH$_LATCH_PARENT                       2
WRH$_OSSTAT                             3
WRH$_PARAMETER                          3
WRH$_ROWCACHE_SUMMARY                   3
WRH$_SEG_STAT                           3
WRH$_SERVICE_STAT                       3
WRH$_SERVICE_WAIT_CLASS                 3
WRH$_SGASTAT                            3
WRH$_SQLSTAT                            3
WRH$_SYSSTAT                            3
WRH$_SYSTEM_EVENT                       3
WRH$_SYS_TIME_MODEL                     3
WRH$_TABLESPACE_STAT                    3


Once the correct number of partitions have been created the old data may be automatically purged by the mmon process. If this does not occur, it will be necessary to manually purge the data from the AWR tables.

Firstly turn off the production of snapshots for the duration of this activity:

 

get the dbid for this database

select dbid from v$database;

<strong>-- get current snap interval (in minutes)</strong>

select extract( day from snap_interval) *24 *60

       + extract( hour from snap_interval) *60

       + extract( minute from snap_interval) snap_interval

       from wrm$_wr_control where dbid = ;

<strong>-</strong><strong>- disable snaps by setting interval to 0</strong>

exec dbms_workload_repository.modify_snapshot_settings(interval=> 0, dbid => <dbid>)

Then determine the maximum snapshot to be purged from the database by querying the maximum snap-id outside of the retention period:

select max(snap_id) max_snap

from wrm$_snapshot

where begin_interval_time

< (sysdate - (select retention

from wrm$_wr_control

where dbid = ));

This will return the maximum snapshot to be purged

exec dbms_workload_repository(1, <max_snap>, <dbid>)

The AWR tables will not purge if baselines exist, so check and remove as required. The following should return only 1 row called SYSTEM_MOVING_WINDOW:

select baseline_name,creation_time

from dba_hist_baseline;

If other baselines are found they will need to be removed:

exec dbms_workload_repository.drop_baseline(baseline_name => <baseline>);

If we look at the current retention period in this code it is set to 15 days, so we set it to 8 days

select d.dbid,w.snap_interval,w.retention from DBA_HIST_WR_CONTROL w, v$database d where w.dbid = d.dbid;

Begin DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(11520,0); end;
begin</strong> DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(129600,0); <strong>end</strong>;

Then we purge the snapshots ( – in ranges if there a are a lot as in this example). This can take several hours

EXEC dbms_workload_repository.drop_snapshot_range(low_snap_id=>100000, high_snap_id=>114159);

EXEC dbms_workload_repository.drop_snapshot_range(low_snap_id=>116261, high_snap_id=>116265);

Once all that is complete it is just a  matter of tidying up the indexes

ASH    SNAP    RETENTION —————————————————————————

+000000008 23:47:01.421    +000000008 23:46:55.865    +00008 00:00:00.0


Item                     Space Used (GB) Schema                   Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             93.70 SYS
select 'alter index '||segment_name||' rebuild online parallel (degree 4);' from dba_segments where tablespace_name= 'SYSAUX' and segment_name like 'WRH$_%' and segment_type='INDEX' order by segment_name;

double check for unusable indexes

 

select 'alter index '||s.segment_name||' rebuild online parallel (degree 4);'

from dba_segments s,  dba_indexes i where s.tablespace_name= 'SYSAUX'

and s.segment_name like 'WRH$_%'

and s.segment_type='INDEX'

and i.status = 'UNUSABLE'

and i.index_name = s.segment_name

order by s.segment_name

&nbsp;

select 'alter table '||segment_name||' move tablespace sysaux ;' from dba_segments where tablespace_name= 'SYSAUX' and segment_name like 'WRH$_%' and segment_type = 'TABLE'   order by segment_name
Item                     Space Used (GB) Schema                  Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             10.05 SYS

 

I hope that has been useful and I would be pleased to hear if others see the same issues on some databases whereby the actual data retained is much lonher that the snapshot retention period would suggest

Posted in Oracle | Tagged: , , , , , , , , | 11 Comments »

Bugs with rm -rf from within asmcmd

Posted by John Hallas on July 8, 2014

Performing an rm -rf operation is normally a simple operation albeit risky if you are in the wrong folder. However within ASMCMD there are a couple of bugs associated with it.

Here are two examples and workrounds

FRA full - warning in the alert log  - action needs to be taken - this is an 11.2.0.4 database on OEL (5.8)
ORA-19815: WARNING: db_recovery_file_dest_size of 10484711424 bytes is 100.00% used, and has 0 remaining bytes available.
 ARCH: Error 19809 Creating archive log file to '+FRA'
Errors in file /app/oracle/diag/rdbms/tst11204/TST11204/trace/TST11204_ora_13209.trc:
ORA-16038: log 1 sequence# 148 cannot be archived
ORA-19809: limit exceeded for recovery files
ORA-00312: online log 1 thread 1: '+DATA/tst11204/redo01.log'
USER (ospid: 13209): terminating the instance due to error 16038

 

Let’s look at the disk from asmcmd and we see a lot of old folders that we do not need

ASMCMD [+FRA/TST11204/ARCHIVELOG] > ls
2013_12_08/
2013_12_09/
2013_12_10/
2013_12_11/
2013_12_12/
2013_12_13/
2013_12_14/
2013_12_15/
2013_12_16/
2013_12_17/
2013_12_18/
2013_12_19/
2013_12_20/
2013_12_21/
2013_12_22/
2013_12_23/
2013_12_24/

Lets remove them all

 rm -r 2013*

You may delete multiple files and/or directories.
Are you sure? (y/n) y
ORA-29261: bad argument
ORA-00600: internal error code, arguments: [ORA_NPI_ERROR], [600], [ORA-00600: internal error code, arguments: [KFCHK_ERRRET], [kfa.c], [3025], [], [], [], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], []
ORA-06512: at line 4 (DBD ERROR: OCIStmtExecute)

rm -r 2013_12_08
You may delete multiple files and/or directories.
Are you sure? (y/n) y
ORA-29261: bad argument
ORA-00600: internal error code, arguments: [ORA_NPI_ERROR], [600], [ORA-00600: internal error code, arguments: [KFCHK_ERRRET], [kfa.c], [3025], [], [], [], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], []
ORA-06512: at line 4 (DBD ERROR: OCIStmtExecute)

We have hit a known bug – only applies to 11.2.0.4 but I am sure I have seen manifestations of this issue on previous version of 11GR2 Read the rest of this entry »

Posted in Oracle | Tagged: , , , , , | Leave a Comment »

Converting an SCN to a timestamp

Posted by John Hallas on June 27, 2014

Something nice and simple

It is easy to get the current SCN from the database and to map between SCN and timestamp because from 10g onwards there are 2 functions to convert between SCN and TIMESTAMP

In 9i there was also a means of mapping an SCN to a TIMESTAMP. This involves the system view SYS.SMON_SCN_TIME. The view no longer exists in 12c. One limitation is that it only holds 1440 rows (well it does at 9i but I can see 2711 rows in an 11GR1 database and 2500 in an 11GR2 one) and therefore cannot hold more than 5 days worth of history. There is a good blog entry explaining this at http://orainternals.wordpress.com/2012/01/19/scn-what-why-and-how/

I will demonstrate the methods below

 

col current_scn form 999999999999999999

col TIMESTAMP_TO_SCN form 9999999999999999

col NOW10G form 99999999999999999999

col NOW9I form 99999999999999999999

col SCN form 99999999999999999999

col TIMESTAMP form 99999999999999999999

alter session set NLS_DATE_FORMAT='DD-MON-YY HH:MI:SS';

 

Normal 10G onward syntax

 

 select current_scn from v$database;

 

       CURRENT_SCN

-------------------

     10900098356623

 

To convert an SCN to a TIMESTAMP the SCN_TO_TIMESTAMP (10g onwards) is used

 

select SCN_TO_TIMESTAMP(10900098356623) from dual;

 

SCN_TO_TIMESTAMP(10900098356623)

---------------------------------------------------------------------------

27-JUN-14 07.31.31.000000000 AM

 

 

To get a SCN from a timestamp

 

SELECT TIMESTAMP_TO_SCN(order_date) FROM orders
   WHERE order_id = 5000;

TIMESTAMP_TO_SCN(ORDER_DATE)
----------------------------
                      574107

 

 

To get the current SCN in 9i only

select max(scn_bas) SCN_BASE from smon_scn_time;

 

 SCN_BASE

----------

3766338378

 

select time_dp TIMESTAMP, scn_wrp*4294967296+SCN_bas SCN from sys.smon_scn_time where scn_bas = '3766338378'

TIMESTAMP                           SCN

------------------ ---------------------

27-JUN-14 06:44:26       10900098368330

 

 

 


 

 

 

 

 

Posted in Oracle | Tagged: , , , | Leave a Comment »

Another really enjoyable UKOUG Database SIG

Posted by John Hallas on May 8, 2014

Today’s venue was the Metropole Hotel in the centre of Leeds and there was a good attendance, encouraged by a strong agenda.

After introductions, health and safety and UKOUG information it was straight into techie talk with Phil Davies from Oracle doing his normal support update (although he does share duties with Owen Ireland). Invariably I make more notes from this session than most others I hear because of the wealth of information it contains. Snippets I jotted down that are worthy of sharing are :-

12.1.0.2 RDBMS is due out Jul-Dec 2014 and is likely to contain new functionality that did not get into the first 12c release – I think in-memory will be one such feature. A straw poll of members saw very little uptake of 12c in production and some but not much take-up in non-production.

Oracle Clud Control (OEM) 12.1.0.4 is due very soon – Q2 2014

I was intrigued enough about MoS note 1557478.1 about a bug with transparent huge pages to make a note of it for further investigation and the same goes for 17761775.8 – Bug 17761775  ORA-600 [kclchkblkdma_3] ORA-600 [3020] or ORA-600 [kcbchg1_16] Join of temp and permanent table in RAC might lead to corruption

Next up was Phil Brown of E-DBA who ran an open forum where the audience picked from a selection of topics.

The three topics covered in about 70 minutes were

1)      Consolidation v Virtualisation

2)      The worst SQL statement in the world

3)      DBA interviews

 

All three allowed plenty of audience participation and competing views and I think almost everyone in the room provided input. My input regarding DBA interviews was a tip given to me several years ago which I think is very valid. Even if you are not thinking of moving on keep updating your CV so it is an accurate record of what you have been doing over the last couple of years. Read the rest of this entry »

Posted in 12c new features, Oracle, UKOUG | Tagged: , , | Leave a Comment »

In praise of bloggers (along with sql_profiles and RAC interconnect pings)

Posted by John Hallas on March 12, 2014

Today I am going to mention two articles which I came across in the last few days whilst investigating problems and talk about the background to the problems but also praise the articles. I am sure many of us have run the SQL Tuning advisor from within OEM or using the DBMS_SQLTUNE package from the command line and often it recommends a sql_profile that should be applied, (invariably giving 99.98 perceived benefits). Now I see this as both a good thing and a bad thing. Yes it is nice to apply the profile and hopefully fix the problem but I would also like to know what the profile is doing and that is well-hidden.  I agree that there is the option to look at current and new explain plans but sometimes it is difficult to really understand what hints have been used to get the new plan, especially on a statement that has many lines in the explain plan. So why is this important might you ask, just go and select the apply button and forget about it. Well I have two very good reasons why it is important.

Read the rest of this entry »

Posted in Oracle | Tagged: , , , , , , , , , , , , | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 205 other followers