Oracle DBA – A lifelong learning experience

Performance problems with OEM AWR warehouse

Posted by John Hallas on December 20, 2016

The Enterprise Manager AWR Warehouse is designed to hold performance data from multiple databases  for long-term analysis. It promoses that it will save storage and improve performance on your production systems. In that it is indeed correct. However the warehouse itself does not seem to be performant when taking in multiple sources and retaining them long-term – 400 days in our case. Why 400 days is an obvious question that might be asked. Primarily because we are a Retail organisation and Easter is variable each year.

 

The AWR repository database is performing poorly during the insert stage of the upload process.
Just to quickly summarise the process:
  • A dmp file is extracted on the source database and transferred across to the AWR server
  • The dmp file is then imported into a temporary schema called AWR$XXXXXX (this loads quickly)
  • This data is then inserted into the main AWR tables inside the SYS schema. Is is this stage that is slow.

In order to completely isolate the issue, we altered a parameter, so only one AWR file gets loaded at once, cutting any contention / locking issues out of the equation:

UPDATE DBSNMP.CAW_PROPERTIES
SET PROPERTY_VALUE=1
where property_name=’max_parallel_loads’
;
We can now see 2 sql statements running taking up large amounts of i/o. For a 30 minute trace, these 2 take up 22 minutes, mainly i/o reads. These are:
 
SQL ID: 130dvvr5s8bgn Plan Hash: 1160622595
 
select obj#, dataobj#, part#, hiboundlen, hiboundval, ts#, file#, block#, 
  pctfree$, pctused$, initrans, maxtrans, flags, analyzetime, samplesize, 
  rowcnt, blkcnt, empcnt, avgspc, chncnt, avgrln, length(bhiboundval), 
  bhiboundval 
from
 tabpart$ where bo# = :1 order by part#
 
 
call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        3      0.00       0.00          0          0          0           0
Execute      3      0.00       0.00          0          0          0           0
Fetch        5     85.94     681.98      71875     260313        817      131255
——- ——  ——– ———- ———- ———- ———-  ———-
total       11     85.94     681.99      71875     260313        817      131255
 
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: CHOOSE
Parsing user id: SYS   (recursive depth: 4)
Number of plan statistics captured: 2
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
———- ———- ———-  —————————————————
     43847      43776      43847  TABLE ACCESS BY INDEX ROWID TABPART$ (cr=13860 pr=2 pw=0 time=1689938 us cost=4779 size=2879167 card=15907)
     43847      43776      43847   INDEX RANGE SCAN I_TABPART_BOPART$ (cr=400 pr=0 pw=0 time=311605 us cost=145 size=0 card=15907)(object id 571)
 
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  —————————————-   Waited  ———-  ————
  db file sequential read                     69961        0.22        591.85
  latch: shared pool                              8        0.00          0.00
  latch: row cache objects                        1        0.02          0.02
********************************************************************************
 
SQL ID: c3zymn7x3k6wy Plan Hash: 3446064519
 
select obj#, dataobj#, part#, hiboundlen, hiboundval, flags, ts#, file#, 
  block#, pctfree$, initrans, maxtrans, analyzetime, samplesize, rowcnt, 
  blevel, leafcnt, distkey, lblkkey, dblkkey, clufac, pctthres$, 
  length(bhiboundval), bhiboundval 
from
 indpart$ where bo# = :1 order by part#
 
 
call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.02       0.02          0          0          0           0
Fetch        5     65.12     578.58      62301     174812          0       87553
——- ——  ——– ———- ———- ———- ———-  ———-
total        9     65.14     578.61      62301     174812          0       87553
 
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: CHOOSE
Parsing user id: SYS   (recursive depth: 4)
Number of plan statistics captured: 2
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
———- ———- ———-  —————————————————
     43847      43776      43847  TABLE ACCESS BY INDEX ROWID INDPART$ (cr=13241 pr=762 pw=0 time=6313753 us cost=4024 size=2224953 card=12861)
     43847      43776      43847   INDEX RANGE SCAN I_INDPART_BOPART$ (cr=424 pr=421 pw=0 time=4089911 us cost=128 size=0 card=12861)(object id 576)
 
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  —————————————-   Waited  ———-  ————
  db file sequential read                     78826        0.28        655.63
  latch: shared pool                             16        0.00          0.00
********************************************************************************
 
Our guess is that AWR during inserts, tries to create a new partition for each snap shot, and it’s querying its metadata during this operation.
We have an SR open with Oracle Support, with no joy so far. There doesn’t appear to be much information available is on the web, so it may be that not many sites use this for the amount of source databases that we do – we upload and retain data from over 80 source databases.
Any help / advice re the above sqlid’s would be appreciated.

3 Responses to “Performance problems with OEM AWR warehouse”

  1. Hello,

    Do you have any notions of how big TAB_PART$ and IND_PART$ and when were they last analyzed? Do you have any notion of what the new partition might be called and could you create them in advance so cutting out that part of the process?

    Regards,
    Frank

    • John Hallas said

      Neither table is that big – ~1.5M records and they have been analyzed recently – and that is ongoing. We will look at pre-creating the partitions but it is not something that Oracle have suggested as yet.
      Oracle are now trying to force the queries to perform a FTS rather thna an index lookup. We have a performance guty on it now rather than a AWR guy from Oracle who was only interested in the flow of data and not how long it was taking.

  2. Ian Greenwood said

    Hi John,

    Couple of things worth investigating:
    1. Sequential reads are slow – > 8ms average per read. This is much slower than expected – I remember investigating an IO issue on one of your DBs a couple of years ago when, I believe, the average sequential read was around 7ms. I think that this turned out to be a SAN configuration issue.

    2. If the sequential reads cannot be speeded up, then try gathering system stats under a representative load to allow the optimiser to use more realistic io stats. This may help it to tip towards a FTS

    3. Check if there are sufficient histograms on the BO# column on the tabpart$ and indpart$ tables. Because this code is likely to be looking only at the WRH$ tables, there is a high likelihood that the object ids for these tables will be contiguous as they are all generated from the same script; and as the WRH$ partitioned tables are the only ‘real’ data in the AWR warehouse then the partition details are going to be heavily skewed towards those tables. Could be using index if not aware that data is skewed.

    4. Could look at altering _small_table_threshold parameter to increase likelihood of direct path reads, this is a last resort. But generally direct reads will be cheaper that scattered reads as not using buffer cache. Parameter adjusts threshold at which table is regarded as too big to fit into cache.

    Regards and Happy New Year

    Ian

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: