Oracle DBA – A lifelong learning experience

Optimizer statistics do not get automatically purged

Posted by John Hallas on August 23, 2011

You run a manual ADDM report for the most recent snapshot and it never finishes.

You notice that in the performance tab (historical data view) for the same database there are no ticks in the box that shows an ADDM report is available.

Creating a snapshot report takes an inordinately long time, even when the system is not that busy.

What do you do?

Well here is one approach. This blog entry uses many of the techniques used in another of my blog entries , which is all about managing optimizer stats.

My first step was to try and run an ADDM report and see what it is waiting for via a  10046 trace

SELECT cnt , EVENT_ID
FROM
( SELECT  count(*) as cnt , a.EVENT_ID FROM  WRH$_ACTIVE_SESSION_HISTORY a
WHERE  a.dbid = :dbid  AND a.instance_number = :inst  AND a.snap_id > :bid
AND a.snap_id <= :eid  AND a.session_type = 1  AND a.SQL_ID = :sqlid  AND
a.EVENT_ID > 0  AND a.WAIT_TIME = 0        GROUP BY EVENT_ID       HAVING
count(*) >= :rowcnt       ORDER BY 1 DESC, 2 ASC       )  WHERE rownum <=
:ret
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.01       0.00          0          0          0           0
Fetch        1     10.23     137.32     196522     732969          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3     10.24     137.33     196522     732969          0           0
Misses in library cache during parse: 1

Misses in library cache during execute: 1

Optimizer mode: ALL_ROWS
Parsing user id: SYS   (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
0  COUNT STOPKEY (cr=732969 pr=196522 pw=0 time=0 us)

0   VIEW  (cr=732969 pr=196522 pw=0 time=0 us cost=1370 size=26 card=1)
0    SORT ORDER BY STOPKEY (cr=732969 pr=196522 pw=0 time=0 us cost=1370 size=35 card=1)

0     FILTER  (cr=732969 pr=196522 pw=0 time=0 us)
3      HASH GROUP BY (cr=732969 pr=196522 pw=0 time=0 us cost=1370 size=35 card=1)
4       FILTER  (cr=732969 pr=196522 pw=0 time=0 us)
4        TABLE ACCESS FULL WRH$_ACTIVE_SESSION_HISTORY PARTITION: KEY KEY (cr=732969 pr=196522 pw=0 time=0 us cost=1368 size=35 card=1)

At least we know that a FTS scan on an optimizer stats table is causing a problem. Why would that be an issue on this database when it is not elsewhere. There seems to be a lot of activity on the table WRH$_ACTIVE_SESSION_HISTORY partition but it only has one partition. The retention period is set to 31 days but we seem to have almost 400 days worth of history. There only seems to be one partition on this database compared with many of other databases.  What is going on

PARTITION_NAME                 HIGH_VALUE
—————————— ——————————————————————————–
WRH$_ACTIVE_SES_MXDB_MXSN      MAXVALUE, MAXVALUE

WRH$_ACTIVE_3035595574_22385   3035595574, MAXVALUE
Which is odd compared with another system that has multiple partitions and it is split by snapshot id
PARTITION_NAME                 HIGH_VALUE
—————————— ——————————————————————————–
WRH$_ACTIVE_SES_MXDB_MXSN      MAXVALUE, MAXVALUE
WRH$_ACTIVE_952858567_23062    952858567, 23447
WRH$_ACTIVE_952858567_23447    952858567, 23829
WRH$_ACTIVE_952858567_23829    952858567, 24213
WRH$_ACTIVE_952858567_24213    952858567, 24598
WRH$_ACTIVE_952858567_24598    952858567, 24982
WRH$_ACTIVE_952858567_24982    952858567, 25366
WRH$_ACTIVE_952858567_25366    952858567, 25750
WRH$_ACTIVE_952858567_25750    952858567, 26134
WRH$_ACTIVE_952858567_26134    952858567, 26519
WRH$_ACTIVE_952858567_26519    952858567, 26903
WRH$_ACTIVE_952858567_26903    952858567, 27287
WRH$_ACTIVE_952858567_27287    952858567, 27670
WRH$_ACTIVE_952858567_27670    952858567, MAXVALUE

Very odd. Looking at  trace dumps there seemed to be a lot of mmon trace file  and they led to a conclusion that a process was timing out and not completing.

DDE rules only execution for: ORA 12751
—– START Event Driven Actions Dump —-
—- END Event Driven Actions Dump —-
—– START DDE Actions Dump —–
Executing SYNC actions
Executing ASYNC actions
—– START DDE Action: ‘ORA_12751_DUMP’ (Sync) —–
Runtime exceeded 300 seconds
Time limit violation detected at:
<-ksedsts()+800<-kspol_12751_dump()+224<-dbgdaExecuteAction()+768<-dbgerRunAction()+288<-dbgerRunActions()+2368<-dbgexPhaseII()+1584<-dbgexProcessError()+1920<-dbgeExecuteForError()+288<-dbgePostErrorKGE()+1936<-dbkePostKGE_kgsf()+128<-kgeade()+656<-kgesev()+112
<-ksesec0()+160<-ksucin()+320<-kdst_fetch()+3040<-kdstf1100101000km()+912<-kdsttgr()+2480<-qertbFetch()+992<-qergiFetch()+640<-qerflFetch()+352<-qerghFetch()+2416<-qerflFetchOutside()+256<-qersoProcessULS()+448<-qersoFetch()+2080<-$cold_qervwFetch()+2352<-qercoFetch()+576
<-$cold_qervwFetch()+2352<-qerjotFetch()+928<-qerjotFetch()+928<-qersoProcessULS()+448<-qersoFetch()+2080<-opifch2()+9472<-opiall0()+3824<-opikpr()+912<-opiodr()+2528<-rpidrus()+400<-skgmstack()+224<-rpidru()+240<-rpiswu2()+976<-kprball()+2240<-keasqlFetch()+256
<-kehssqlFetch()+32<-kehsGetSqlStatRows()+784<-kehsGetTssCLW()+160<-kehrwclSql()+352<-kehRunNode()+624<-kehRunNode()+1152<-kehRunNode()+1152<-kehmain()+784<-kehExecute()+32<-spefcmpa()+560<-spefmccallstd()+368<-pextproc()+128<-peftrusted()+288<-psdexsp()+448
<-rpiswu2()+976<-psdextp()+1296<-pefccal()+1120<-pefcal()+432<-pevm_FCAL()+256<-pfrinstr_FCAL()+176<-pfrrun_no_tool()+192<-pfrrun()+1568<-plsql_run()+1344<-peicnt()+528<-kkxexe()+1104<-opiexe()+15152<-opiall0()+2640<-opikpr()+912<-opiodr()+2528<-rpidrus()+400

*** 2011-08-22 11:08:29.136
<-skgmstack()+224<-rpidru()+240<-rpiswu2()+976<-kprball()+2240<-keasqlExecute()+176<-keh_slave_main_cb()+608<-kebm_slave_main()+896<-ksvrdp()+5312<-opirip()+1664<-opidrv()+1152<-sou2o()+240<-opimai_real()+304<-main()+224<-main_opd_entry()+80Current Wait Stack:
Not in wait; last wait ended 1.048072 sec ago
Wait State:
auto_close=0 flags=0x21 boundary=0x0000000000000000/-1
Session Wait History:
0: waited for ‘db file scattered read’
file#=2, block#=101483, blocks=80
wait_id=7451 seq_num=7454 snap_id=1
wait times: snap=0.022362 sec, exc=0.022362 sec, total=0.022362 sec
wait times: max=infinite
wait counts: calls=0 os=0

The thought was to see what data was in the database and why we had 400 days of it when we thought we were only keeping  31 days history.

The query we used was

select trunc(SAVTIME), count(*) from WRI$_OPTSTAT_HISTGRM_HISTORY group by trunc(SAVTIME) order by 1;

and the graph lookedvery odd for the first days in July 2010

The data, sorted by volume shows that we had 2 very big days in July  compared with the average of < 1M rows a day

08-Aug-10 6309491
03-Jul-10 4752810
09-Aug-10 2097012
06-Jul-10 1853661
07-Aug-10 1580404

I do recall that we had a problem when some code was being worked on and repeatedly gathered stats. That was about a year so so does fit in with this data. We have made a bit of progress. We have a lot more data than  thought and the mmon process seems to be timing out and there also do not appear to be multiple partitions for the stats data when I would have expected them.

I do know that the MMON process is responsible managing optimizer stats as well as AWR reports and,  as I recently found out, for purging ADR data.  Investigating other mmon trace file it is clear that both WRI (stats ) and WRH (AWR) tables are affected. I wonder why that is timing out.

Research on MoS provided me with two very good reasons.

1) The mmon process is only given 5 minute to complete the automatic purging of optimizer statistics data. If it has not  finished in that time then it aborts and does not produce any sort of error message or trace information. Bingo we have the answer. Data has built up because excessive amounts have been created due to a rogue programme and the mmon process has spent 5 minutes every day for a year in trying to  remove that excess but has failed miserably and not told anybody about it.

2) The second reason is that the mmon process also is responsible for creating new partitions for the WRH$ data (ASH) including WRH$_ACTIVE_SESSION_HISTORY. It firstly tries to drop old partitions with all data marked as expired. As we are never getting that far in the nightly purge all data continues going into the single partition and eventually it becomes to big to be manageable.

The solution is at hand to fix the problem.

Firstly I might as well stop the automatic mmon purging process until I have reduced data

exec dbms_stats.alter_stats_history_retention(-1);

Then I want to purge optimizer data out on a gradual basis as a large purge will take forever and make the tables unusable for that period as they will be locked. There will  also be large amounts of archive log generated.

The following script achieves all that


--show the data of the oldest data
select dbms_stats.get_stats_history_availability from dual;
-- drop indexes to speed up the work
DROP INDEX I_WRI$_OPTSTAT_TAB_OBJ#_ST;
DROP INDEX I_WRI$_OPTSTAT_TAB_ST;
DROP INDEX I_WRI$_OPTSTAT_IND_OBJ#_ST;
DROP INDEX I_WRI$_OPTSTAT_IND_ST;
DROP INDEX I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST;
DROP INDEX I_WRI$_OPTSTAT_HH_ST;
DROP INDEX I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST;
DROP INDEX I_WRI$_OPTSTAT_H_ST;

-- set the tables to use parallelism
ALTER TABLE WRI$_OPTSTAT_HISTGRM_HISTORY PARALLEL;
ALTER TABLE WRI$_OPTSTAT_HISTHEAD_HISTORY PARALLEL;

-- purge data   - be very selective here as the
-- The aim is to only remove the large data that is causing a problem in the first run
-- The second run might have sysdate -300 and each run can be progressively bigger

EXEC DBMS_STATS.PURGE_STATS(SYSDATE - 365);

-- rebuild index and remove parallelism
CREATE UNIQUE INDEX SYS.I_WRI$_OPTSTAT_TAB_OBJ#_ST
ON SYS.WRI$_OPTSTAT_TAB_HISTORY
(
OBJ#,
SYS_EXTRACT_UTC(SAVTIME)
)
TABLESPACE SYSAUX
PARALLEL;

CREATE INDEX SYS.I_WRI$_OPTSTAT_TAB_ST
ON SYS.WRI$_OPTSTAT_TAB_HISTORY
(
SYS_EXTRACT_UTC(SAVTIME)
)
TABLESPACE SYSAUX
PARALLEL;

CREATE UNIQUE INDEX SYS.I_WRI$_OPTSTAT_IND_OBJ#_ST
ON SYS.WRI$_OPTSTAT_IND_HISTORY
(
OBJ#,
SYS_EXTRACT_UTC(SAVTIME)
)
TABLESPACE SYSAUX
PARALLEL;
CREATE INDEX SYS.I_WRI$_OPTSTAT_IND_ST
ON SYS.WRI$_OPTSTAT_IND_HISTORY
(
SYS_EXTRACT_UTC(SAVTIME)
)
TABLESPACE SYSAUX
PARALLEL;

CREATE UNIQUE INDEX SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST
ON SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY
(
OBJ#,
INTCOL#,
SYS_EXTRACT_UTC(SAVTIME),
COLNAME
)
TABLESPACE SYSAUX
PARALLEL;

CREATE INDEX SYS.I_WRI$_OPTSTAT_HH_ST
ON SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY
(
SYS_EXTRACT_UTC(SAVTIME)
)
)
TABLESPACE SYSAUX
PARALLEL;
CREATE INDEX SYS.I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST
ON SYS.WRI$_OPTSTAT_HISTGRM_HISTORY
(
OBJ#,
INTCOL#,
SYS_EXTRACT_UTC(SAVTIME),
COLNAME
)
TABLESPACE SYSAUX
PARALLEL;

CREATE INDEX SYS.I_WRI$_OPTSTAT_H_ST
ON SYS.WRI$_OPTSTAT_HISTGRM_HISTORY
(
SYS_EXTRACT_UTC(SAVTIME)
)
TABLESPACE SYSAUX
PARALLEL;
ALTER TABLE WRI$_OPTSTAT_HISTGRM_HISTORY NOPARALLEL;
ALTER TABLE WRI$_OPTSTAT_HISTHEAD_HISTORY NOPARALLEL;

select dbms_stats.get_stats_history_availability from dual;

Hopefully this has been of some value and I know I have learned from the experience. I certainly know more about what the mmon process does but I still feel I have a lot more research to undertake on this process.

Advertisements

7 Responses to “Optimizer statistics do not get automatically purged”

  1. […] a database (ASM to ASM)How to move the central Inventory (oraInventory) to another location. Optimizer statistics do not get automatically purged « Purging ADR log and trace files automatically HPUX_SCHED_NOAGE parameter, improve […]

  2. Anand said

    Nice Post!!! Thanks for sharing 🙂

  3. Chuck said

    Great post John. Thanks for sharing this. We encountered a very similar issue, and your note was very helpful in explaining/validating our observations.

    I have found a related bug: Bug 10279045 – Slow Statistics purging (SYSAUX grows) [ID 10279045.8]. It’s fixed in 11.2.0.3, but there is a one-off patch for earlier versions.

    There is also this note about rebuilding AWR: How to Recreate The AWR ( AUTOMATIC WORKLOAD ) Repository ? [ID 782974.1]. Depending on the situation, this may be useful in addition to the OPSTAT table cleanup steps.

  4. […] INDEX 41 I_WRI$_OPTSTAT_HH_ST INDEX 96 I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST INDEX I have also added another post on a similar theme. It may be useful (Posted by John Hallas on https://jhdba.wordpress.com/) Like […]

    • John Hallas said

      That post is just a copy of much of my original post and then you have the cheek to link it to my post.
      If I knew how to complain to WordPress about this I would do so.

  5. […] can follow steps mentioned in https://jhdba.wordpress.com/2011/08/23/939/ to purge the table. I tried the same, stopping the automatic mmon purging , dropping the indexes […]

  6. […] https://jhdba.wordpress.com/2011/08/23/939/ […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: