Optimizer statistics do not get automatically purged
Posted by John Hallas on August 23, 2011
You run a manual ADDM report for the most recent snapshot and it never finishes.
You notice that in the performance tab (historical data view) for the same database there are no ticks in the box that shows an ADDM report is available.
Creating a snapshot report takes an inordinately long time, even when the system is not that busy.
What do you do?
Well here is one approach. This blog entry uses many of the techniques used in another of my blog entries , which is all about managing optimizer stats.
My first step was to try and run an ADDM report and see what it is waiting for via a 10046 trace
SELECT cnt , EVENT_ID FROM ( SELECT count(*) as cnt , a.EVENT_ID FROM WRH$_ACTIVE_SESSION_HISTORY a WHERE a.dbid = :dbid AND a.instance_number = :inst AND a.snap_id > :bid AND a.snap_id <= :eid AND a.session_type = 1 AND a.SQL_ID = :sqlid AND a.EVENT_ID > 0 AND a.WAIT_TIME = 0 GROUP BY EVENT_ID HAVING count(*) >= :rowcnt ORDER BY 1 DESC, 2 ASC ) WHERE rownum <= :ret call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.01 0.00 0 0 0 0 Fetch 1 10.23 137.32 196522 732969 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 10.24 137.33 196522 732969 0 0 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: SYS (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 0 COUNT STOPKEY (cr=732969 pr=196522 pw=0 time=0 us) 0 VIEW (cr=732969 pr=196522 pw=0 time=0 us cost=1370 size=26 card=1) 0 SORT ORDER BY STOPKEY (cr=732969 pr=196522 pw=0 time=0 us cost=1370 size=35 card=1) 0 FILTER (cr=732969 pr=196522 pw=0 time=0 us) 3 HASH GROUP BY (cr=732969 pr=196522 pw=0 time=0 us cost=1370 size=35 card=1) 4 FILTER (cr=732969 pr=196522 pw=0 time=0 us) 4 TABLE ACCESS FULL WRH$_ACTIVE_SESSION_HISTORY PARTITION: KEY KEY (cr=732969 pr=196522 pw=0 time=0 us cost=1368 size=35 card=1)
At least we know that a FTS scan on an optimizer stats table is causing a problem. Why would that be an issue on this database when it is not elsewhere. There seems to be a lot of activity on the table WRH$_ACTIVE_SESSION_HISTORY partition but it only has one partition. The retention period is set to 31 days but we seem to have almost 400 days worth of history. There only seems to be one partition on this database compared with many of other databases. What is going on
WRH$_ACTIVE_SES_MXDB_MXSN MAXVALUE, MAXVALUE
WRH$_ACTIVE_3035595574_22385 3035595574, MAXVALUE
Which is odd compared with another system that has multiple partitions and it is split by snapshot id
WRH$_ACTIVE_SES_MXDB_MXSN MAXVALUE, MAXVALUE
WRH$_ACTIVE_952858567_23062 952858567, 23447
WRH$_ACTIVE_952858567_23447 952858567, 23829
WRH$_ACTIVE_952858567_23829 952858567, 24213
WRH$_ACTIVE_952858567_24213 952858567, 24598
WRH$_ACTIVE_952858567_24598 952858567, 24982
WRH$_ACTIVE_952858567_24982 952858567, 25366
WRH$_ACTIVE_952858567_25366 952858567, 25750
WRH$_ACTIVE_952858567_25750 952858567, 26134
WRH$_ACTIVE_952858567_26134 952858567, 26519
WRH$_ACTIVE_952858567_26519 952858567, 26903
WRH$_ACTIVE_952858567_26903 952858567, 27287
WRH$_ACTIVE_952858567_27287 952858567, 27670
WRH$_ACTIVE_952858567_27670 952858567, MAXVALUE
Very odd. Looking at trace dumps there seemed to be a lot of mmon trace file and they led to a conclusion that a process was timing out and not completing.
DDE rules only execution for: ORA 12751
—– START Event Driven Actions Dump —-
—- END Event Driven Actions Dump —-
—– START DDE Actions Dump —–
Executing SYNC actions
Executing ASYNC actions
—– START DDE Action: ‘ORA_12751_DUMP’ (Sync) —–
Runtime exceeded 300 seconds
Time limit violation detected at:
*** 2011-08-22 11:08:29.136
<-skgmstack()+224<-rpidru()+240<-rpiswu2()+976<-kprball()+2240<-keasqlExecute()+176<-keh_slave_main_cb()+608<-kebm_slave_main()+896<-ksvrdp()+5312<-opirip()+1664<-opidrv()+1152<-sou2o()+240<-opimai_real()+304<-main()+224<-main_opd_entry()+80Current Wait Stack:
Not in wait; last wait ended 1.048072 sec ago
auto_close=0 flags=0×21 boundary=0×0000000000000000/-1
Session Wait History:
0: waited for ‘db file scattered read’
file#=2, block#=101483, blocks=80
wait_id=7451 seq_num=7454 snap_id=1
wait times: snap=0.022362 sec, exc=0.022362 sec, total=0.022362 sec
wait times: max=infinite
wait counts: calls=0 os=0
The thought was to see what data was in the database and why we had 400 days of it when we thought we were only keeping 31 days history.
The query we used was
select trunc(SAVTIME), count(*) from WRI$_OPTSTAT_HISTGRM_HISTORY group by trunc(SAVTIME) order by 1;
and the graph lookedvery odd for the first days in July 2010
I do recall that we had a problem when some code was being worked on and repeatedly gathered stats. That was about a year so so does fit in with this data. We have made a bit of progress. We have a lot more data than thought and the mmon process seems to be timing out and there also do not appear to be multiple partitions for the stats data when I would have expected them.
I do know that the MMON process is responsible managing optimizer stats as well as AWR reports and, as I recently found out, for purging ADR data. Investigating other mmon trace file it is clear that both WRI (stats ) and WRH (AWR) tables are affected. I wonder why that is timing out.
Research on MoS provided me with two very good reasons.
1) The mmon process is only given 5 minute to complete the automatic purging of optimizer statistics data. If it has not finished in that time then it aborts and does not produce any sort of error message or trace information. Bingo we have the answer. Data has built up because excessive amounts have been created due to a rogue programme and the mmon process has spent 5 minutes every day for a year in trying to remove that excess but has failed miserably and not told anybody about it.
2) The second reason is that the mmon process also is responsible for creating new partitions for the WRH$ data (ASH) including WRH$_ACTIVE_SESSION_HISTORY. It firstly tries to drop old partitions with all data marked as expired. As we are never getting that far in the nightly purge all data continues going into the single partition and eventually it becomes to big to be manageable.
The solution is at hand to fix the problem.
Firstly I might as well stop the automatic mmon purging process until I have reduced data
Then I want to purge optimizer data out on a gradual basis as a large purge will take forever and make the tables unusable for that period as they will be locked. There will also be large amounts of archive log generated.
The following script achieves all that
--show the data of the oldest data select dbms_stats.get_stats_history_availability from dual; -- drop indexes to speed up the work DROP INDEX I_WRI$_OPTSTAT_TAB_OBJ#_ST; DROP INDEX I_WRI$_OPTSTAT_TAB_ST; DROP INDEX I_WRI$_OPTSTAT_IND_OBJ#_ST; DROP INDEX I_WRI$_OPTSTAT_IND_ST; DROP INDEX I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST; DROP INDEX I_WRI$_OPTSTAT_HH_ST; DROP INDEX I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST; DROP INDEX I_WRI$_OPTSTAT_H_ST; -- set the tables to use parallelism ALTER TABLE WRI$_OPTSTAT_HISTGRM_HISTORY PARALLEL; ALTER TABLE WRI$_OPTSTAT_HISTHEAD_HISTORY PARALLEL; -- purge data - be very selective here as the -- The aim is to only remove the large data that is causing a problem in the first run -- The second run might have sysdate -300 and each run can be progressively bigger EXEC DBMS_STATS.PURGE_STATS(SYSDATE - 365); -- rebuild index and remove parallelism CREATE UNIQUE INDEX SYS.I_WRI$_OPTSTAT_TAB_OBJ#_ST ON SYS.WRI$_OPTSTAT_TAB_HISTORY ( OBJ#, SYS_EXTRACT_UTC(SAVTIME) ) TABLESPACE SYSAUX PARALLEL; CREATE INDEX SYS.I_WRI$_OPTSTAT_TAB_ST ON SYS.WRI$_OPTSTAT_TAB_HISTORY ( SYS_EXTRACT_UTC(SAVTIME) ) TABLESPACE SYSAUX PARALLEL; CREATE UNIQUE INDEX SYS.I_WRI$_OPTSTAT_IND_OBJ#_ST ON SYS.WRI$_OPTSTAT_IND_HISTORY ( OBJ#, SYS_EXTRACT_UTC(SAVTIME) ) TABLESPACE SYSAUX PARALLEL; CREATE INDEX SYS.I_WRI$_OPTSTAT_IND_ST ON SYS.WRI$_OPTSTAT_IND_HISTORY ( SYS_EXTRACT_UTC(SAVTIME) ) TABLESPACE SYSAUX PARALLEL; CREATE UNIQUE INDEX SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST ON SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY ( OBJ#, INTCOL#, SYS_EXTRACT_UTC(SAVTIME), COLNAME ) TABLESPACE SYSAUX PARALLEL; CREATE INDEX SYS.I_WRI$_OPTSTAT_HH_ST ON SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY ( SYS_EXTRACT_UTC(SAVTIME) ) ) TABLESPACE SYSAUX PARALLEL; CREATE INDEX SYS.I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST ON SYS.WRI$_OPTSTAT_HISTGRM_HISTORY ( OBJ#, INTCOL#, SYS_EXTRACT_UTC(SAVTIME), COLNAME ) TABLESPACE SYSAUX PARALLEL; CREATE INDEX SYS.I_WRI$_OPTSTAT_H_ST ON SYS.WRI$_OPTSTAT_HISTGRM_HISTORY ( SYS_EXTRACT_UTC(SAVTIME) ) TABLESPACE SYSAUX PARALLEL; ALTER TABLE WRI$_OPTSTAT_HISTGRM_HISTORY NOPARALLEL; ALTER TABLE WRI$_OPTSTAT_HISTHEAD_HISTORY NOPARALLEL; select dbms_stats.get_stats_history_availability from dual;
Hopefully this has been of some value and I know I have learned from the experience. I certainly know more about what the mmon process does but I still feel I have a lot more research to undertake on this process.