My colleague John Evans sits next to me and after hearing another grumble encountered while using the 12c OEM AWR Repository I asked him to write-up some of the issues he had come across and their fixes / workarounds. It does appear to me that the idea is fantastic but it is not been given sufficient exposure to sites that would be wanting to put a lot of data into it from various sources and therefore a lot of the necessary functionality is not robust enough as yet. This is John’s summary
Background – What is it and why you might want to make use of it
The AWR contains a wealth of useful information, primarily used for troubleshooting it could also be used for; trend analysis of system usage, capacity planning and forecasting, year on year comparisons… and probably many more tasks that the DBA usually ends up being assigned. It helps to be able to keep this information around for a considerable period of time, perhaps a year or more.
However with a great wealth of information comes a not inconsiderable volume of disk usage.. So how do you balance retaining data for extended periods against avoiding overburdening your databases? Use a repository! Keeping the data centrally reduces the overhead at source databases, means AWR reports can be run from a single location for your entire estate, and for the slightly more adventurous could mean a single database from which to draw information to feed an Apex dashboard.
The before
Previously at this site there was a in-house written solution which used a combination of PL/SQL packages, DBA_SCHEDULER jobs, UTL_FILE and database links. Our home-brewed version had 2.5Tb of data covering 400 days per database. While it worked well there were a number of issues that cropped up as the volume of source databases increased:
1) Work required to restart failed extracts (Updating control tables in the source database and repository then restarting the extracts)
2) Fetching extract files from remote sites which failed during transfer
3) All databases were experiencing a race on SCN use due to the use of DB links (There’s an Exadata data warehouse on site driving SCNs up…)
The after
As of Enterprise Manager 12.1.0.4 there is now a built-in AWR warehouse functionality, all one need do is donate a database in which to store the AWR data. Even better, if something breaks its Oracle supported
What it provides:
1) Job scheduling / retry mechanism using OEM’s job scheduler
2) A dashboard for viewing the volume of uploads as well as a central screen from which to run AWR reports from
3) Integration with OEM, utilising existing credential sets, you offload running AWR reports to a dedicated system etc
The caveats
Well there had to be a few as it is a new solution… but truth be told there are numerous issues which tend to result in the new solution taking more effort to maintain than the old one.
1) If you add a source database which contains snapshots > retention period in the AWR then they will not be purged.
-> This is fixed using dbms_workload_repository.drop_snapshot_range (and then updating dbsnmp.caw_load_metadata)
2)During the loading operation a schema is created to stage the snapshot data. This site uses a strong password policy, as a result the generated password used by the AWR is of an insufficient strength so the schema creation fails and the load as a whole fails. A fix to this is pending as enhancement request 21291950
3) Cannot specify an alternative location for the dump file extract. While there is a property in DBSNMP.CAW_EXTRACT_PROPERTIES named ‘dump_dir_1’ (Where 1 is the instance number so there are additional rows for RAC databases) this cannot be modified without breaking the process. This is currently raised as bug 21826657 – The short of it is that the process expects the dump file in the agent instance directory, if is written out anywhere else then it cannot be zipped up and transferred… Problematic as some systems at this site generate 4GB+ dump files and fill up the local file system.
4) Dump files will not be retrieved in a RAC installation if the agent executing the collection job is on a different node to where the extract was written to.
e.g.
4.1) Extract job runs on dbnode1, dump file is written to dbserver1:/u01/app/oracle/product/gc12.1/agent12c/1_27CBB7FC057C24F0E0533104580AC38A_4108517940_11518_11603.dmp
4.2) Collection job runs a number of hours later on dbserver2… DBSNMP.CAW_EXTRACT_METADATA has a row for the file but the file does not exist on dbserver2! Fail! This mainly tends to happen with the ‘Upload Snapshots Now’
(Note, you cannot use a shared file system for the dump files – see point 3. Running multiple agents from the same directory is not going to work either)
5) The retention period and the extract schedule cannot be modified once the AWR is configured, the Oracle provided fix is:
From Oracle Support:
“As a work around, you can execute the following steps to Reconfigure:
– Please remove all the source databases from the AWR Warehouse console UI .
– After removing _all_ the source databases, the ‘Configure’ button on ‘Settings’ page will be enabled.
– Click it to re-configure the warehouse.
– After re-configuring, add all the source databases”
That’s not particularly user-friendly when there are 100+ databases with a combined totally of nearly 1TB of snapshot data. The alternative is to modify the relevant values in the below tables
DBSNMP.CAW_PROPERTIES – Repository DB
DBSNMP.CAW_EXTRACT_PROPERTIES – Each source database
6) The AWR dashboard – The pane that displays the number of incidents also displays loading errors… but /only/ if there is an incident! The errors are inserted into DBSNMP.CAW_LOAD_ERRORS
After all that… if something breaks and your snapshots aren’t loading you have to go trace the problem through the OEM job log step by step, then the import log, then the repository error log… and still have to update all the control tables at both ends if dump files get corrupted (because the repository dump area filled up). More complexity, more to break, more to fix – when it works properly though it is rather nice!
Once you have fiddled with the parallel load parameter however you are firmly on your own
7) The alerting. After all the above issues, including where the AWR stopped loading snapshots entirely (point 2) there is no out of box alerting of when there are issues with loading data and you cannot readily review the errors via the dashboard (point 6)
Nice to haves in a future release
1) Automated cleanup of the expdp/impdp logs on the source and repository database servers
2) Ability to track switchovers – at the moment you need to add the new primary to the AWR when you switchover – and you cannot disable the OEM CAW_TRANSFER job once the target is switched over (So it runs against the now standby database and fails silently).
The details are stored in the repository table dbsnmp.caw_dbid_mapping but not used (yet)
As for the CAW_TRANSFER job, the outstanding runs can be deleted from the OEM job scheduler.
Quirks
For source databases older than 11.2.0.1 the capture time of the snapshots is not listed in the ‘Search and Select’ popup window (Though if your data source is the database itself and not the AWR repository it /does/ populate the times)
The worrisome bits
The documentation states that the DBA role be granted to the user which runs the extract (which will be SYSTEM):
From: http://docs.oracle.com/database/121/TDPPT/tdppt_awr_warehouse.htm#CACJBEBG
Adding and Removing Source Databases
A source database whose AWR data you want to upload to the warehouse must be the same or earlier (to 10.2.0.4) as the version of the warehouse database. You can add and remove source databases provided you have access to the database target and to database credentials with execute privileges on the sys.dbms_swrf_internal package and the DBA role.
Since giving DBA to any user is not a good idea, this is the alternative:
grant execute on sys.dbms_swrf_internal to system;
grant execute on utl_file to dbsnmp;
grant execute on dbms_random to dbsnmp;
For further information see AWR Master Note: 1907335.1