Oracle DBA – A lifelong learning experience

Managing the WINDOW_ID in Goldengate V11.2.1.0.33

Posted by John Hallas on August 16, 2017

When we import data into the landing schema on a Dat Warehouse via Goldengate, we add 3 fields to each record detailing when and how the record got loaded. This can be found in the *.inc files under $GG_HOME/dirinc on the target GG installation. An example of this is:
map xxx.DBA_GGCUTOVER_TEST, TARGET YYY.DBA_GGCUTOVER_TEST,  INSERTALLRECORDS, IGNOREDELETES
COLMAP (
USEDEFAULTS,
WINDOW_ID = @STRCAT(@GETENV(“RECORD”, “FILESEQNO”), @STRNUM(@GETENV(“RECORD”, “FILERBA”), RIGHTZERO, 10)),
OPER_TYPE = @GETENV (“GGHEADER”, “OPTYPE”),
CDC_LOAD_TIMESTAMP= @DATENOW());
map xxx.DBA_GGCUTOVER_TEST #EXCEPTION_HANDLER();
OPER_TYPE is Insert, update, delete etc
CDC_LOAD_TIMESTAMP is self explanatory
WINDOW_ID is more interesting, its made up of the number of the pump trail file that hold the record, and the RBA, padded to 10 digits.
So, if the record is in file $GG_HOME/dirdat/R1MOP101/rm077196, at RBA 78954, then the WINDOW_ID value in the staging table would be:
771960000078954
BI BATCH Issue due to above
We found out yesterday that the BI team (rightly or wrongly, discuss!!) use the WINDOW_ID field in each record on the landing schema’s data to check whether that record has been previously loaded by the batch:
  • For each table, there is a control record that stores the max WINDOW_ID value for that table.
  • The next time the batch is run, it looks at the control record, and then only loads data with a WINDOW_ID greater than the control value.
  • After the batch has completed, the control value is updated.
  • And so on each night.
BUT – what happens if this WINDOW_ID is reset ? The above logic fails!
After we had replatformed the source system, new data coming in from a fresh GG install, now had a low WINDOW_ID:
Old source WINDOW_ID in region of:
771960000078954
New source WINDOW_ID in region of:
   960000078954
To resolve this, we did the following:
  • Stopped the pump process.
  • Waited till replicat caught up, then stopped replicat.
  • Increased the GG Pump Sequence Number via a script like this (the below shows 2 increments of the sequence, we did 10,000’s of these, to get the value to exactly 90,000!:
ggsci <<EOF
alter extract P1MOP101 etrollover
alter extract P1MOP101 etrollover
EOF
 
** Note I didn’t know you could run ggsci commands via a shell script!
  • Started the pump process
  • Checked that a file was created on xxx called $GG_HME/dirdat/R1MOP101/rm090000
  • Altered the replicat to point to this new file with “alter replicat R1MOP101, extseqno 90000, extrba 0”, then started the replicat.
  • After this any records that came through, and had a new higher WINDOW_ID.
We still had an issue with data that had gone in between Sunday up to the time we stopped the pump above.
We knew the old data was no higher than:
799990000000000
We knew the new data was no higher than
  1990000000000
We also new the we didn’t want these records to be in the new 900000000000000 range.
So we added 800000000000000 to the new WINDOW_IDs so now the data loaded between Sunday morning and Monday evening would be in region of
800000000000000
to
801990000000000
 
…This WINDOW_ID would then be higher than the control table values, and therefore would get processed by the BI batch.
Advertisements

Posted in Goldengate | Tagged: , , , , , | Leave a Comment »

HP Systems Management vacancy

Posted by John Hallas on August 8, 2017

I know I do not have the right readership on this blog for a Systems Management vacancy but if any readers have colleagues who have experience using any of HP Openview, OpsBridge, HP Service Management and HP UCMDB then I have a vacancy at our Head Office in Bradford for a permanent position

Details below

https://apply.morrisons.jobs/vacancies/491/technology-specialist–system-management.html

Posted in Oracle | Leave a Comment »

Using DataGuard broker to show apply lag and throughput

Posted by John Hallas on June 20, 2017

To determine how much lag there is I normally run a script similar to this

select sequence#, applied, to_date(to_char(first_time,’DD-MON-YY:HH24:MI:SS’),
‘DD-MON-YY:HH24:MI:SS’) “First Time” ,
to_char(next_time,’DD-MON-YY:HH24:MI:SS’) “Next Time”
from v$archived_log
UNION
select NULL,database_role,NULL, db_unique_name from v$database
order by “First Time”;

However there is another way which I sometime use which actually gives a lot more information. This uses the dataguard broker command line. Use the show configuration parameter to determine database name if you are not sure

DGMGRL> show configuration

Configuration - DR

Protection Mode: MaxPerformance
 Databases:
 xxxxxx2a - Primary database
 xxxxxx2b - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Show specific database details.

DGMGRL> show database "xxxxxx2b"

Database - xxxxxx2b

Role: PHYSICAL STANDBY
 Intended State: APPLY-ON
 Transport Lag: 4 minutes 24 seconds (computed 6 seconds ago)
 Apply Lag: 5 minutes 16 seconds (computed 0 seconds ago)
 Apply Rate: 191.08 MByte/s
 Real Time Query: ON
 Instance(s):
 xxxxxx2b1 (apply instance)
 xxxxxx2b2
 xxxxxx2b3
 xxxxxx2b4
 xxxxxx2b5
 xxxxxx2b6
 xxxxxx2b7
 xxxxxx2b8

Database Status:
SUCCESS

Lots of good information there including which node is hosting the MRP process and also the apply rate. In our case that is normally between 150 and 400 MB per second.

Posted in Oracle | Tagged: , , , , , , | Leave a Comment »

Downgrading a RAC database from 11.2.0.4 to 11.2.0.3

Posted by John Hallas on May 4, 2017

It is not often that I see a database downgrade activity performed and so I thought it would be worthwhile just noting how it was done.
 2 node RAC 11.2.0.4 database to 11.2.0.3, downgraded the database only and not the grid home.
Downgrade taking place on HP-UX, any downgrades taking place on Windows OS have several additional steps and won’t be covered in this post. 
This database does not use Database Vault and pre-requisite compatibility checks were carried out

Assume all commands are being run on node 1, any commands that need to be run on node 2 will be explicitly stated. Any commands will be formatted in italics.

Set ORACLE_HOME to current 11.2.0.4 environment.
export ORACLE_HOME= /app/oracle/product/11.2.0.4/dbhome_SOA1
 
Tail alert log of both nodes in separate windows
tail -f /app/oracle/diag/rdbms/soapre2a/SOAPRE2A1/trace/alert_SOAPRE2A1.log
 
1. Stop database using srvctl on primary node
 
srvctl stop database -d SOAPRE2A
 
Monitor the alert logs to confirm when database has successfully shutdown.
2. Create pfile from spfile
 
sqlplus / as sysdba
SQL>create pfile=’/home/oracle/SOAPRE2_downgrade/SOAPRE2_clusterdisable.ora’ from spfile=’+DATA/SOAPRE2A/spfilesoapre2a.ora’;
 
3. Alter pfile CLUSTER_DATABASE parameter to FALSE
*.cluster_database=FALSE
*.compatible=’11.2.0.0.0′
 
4. Recreate spfile with new parameter
 
sqlplus / as sysdba
SQL>create spfile=’+DATA/SOAPRE2A/spfilesoapre2a.ora’ from pfile=’/home/oracle/SOAPRE2_downgrade/SOAPRE2_clusterdisable.ora’;
 
5. Startup database in downgrade mode using new spfile
 
cd $ORACLE_HOME/rdbms/admin
sqlplus / as sysdba
SQL>startup downgrade
 
6. Execute Oracle downgrade script
From the original OH –  11.2.0.4
SQL>spool /home/oracle/SOAPRE2_downgrade/downgrade.log
SQL>@catdwg.sql
SQL>spool off
SQL>shutdown immediate;
SQL>exit
 
This script can be run multiple times, in the event any errors are encountered correct them and rerun until completion.
7. Change environment variables and restore config files
 
Execute these steps on both nodes.
 
Alter ORACLE_HOME and PATH environment variable to point to downgraded directories, in our case for example:
export ORACLE_HOME=’/app/oracle/product/11.2.0.3/dbhome_SOA1_1
 
Ensure any entries in your oratab file are also altered to reference the downgraded directory.
Copy password files and config files from current ORACLE_HOME to downgraded directory.
8. Reload version specific components
 
change to downgraded release home  – 11.2.0.3
cd /app/oracle/product/11.2.0.3/dbhome_SOA1_1/rdbms/admin
SQL> sqlplus / as sysdba
SQL> startup upgrade
SQL>spool /home/oracle/SOAPRE2_donwgrade/reload.log
SQL>@catrelod.sql
SQL>spool off
This step can take quite some time to complete, in our case ~2.5 hours
9. Recompile invalid objects
 
SQL> shutdown immediate
SQL> startup
SQL> @utlprp.sql
SQL> exit
 
10. Downgrade cluster services
 
The final step was to downgrade cluster services to our old ORACLE_HOME and version, using the following srvctl command:
srvctl downgrade database -d db-unique-name -o old_ORACLE_HOME t to_old_versnum
 
in our case this was the following:
 
srvctl downgrade database -d SOAPRE2A -o /app/oracle/product/11.2.0.3/dbhome_SOA1_1 -t 11.2.0.3

Posted in 11g new features, Oracle | Tagged: , , | 3 Comments »

What is the future for an Oracle DBA?

Posted by John Hallas on April 10, 2017

I have worked with Oracle databases for around 25 years now and during that time I have been very fortunate in that there has always been work for DBAs and it has been one of the higher paying disciplines within IT.

I am not prophesying the end of the Oracle database engine but I do see the writing on the wall for some of the large corporate solutions sitting on physical equipment in a datacentre. I also have to criticise Oracle for their business practises which I know are seeing customers move away to other solutions.

Without any doubt there is pressure on those who wish to perform a purely Oracle DBA role. The growing use of Cloud does reduce the opportunities and whilst databases always need to be built the techniques used in the Cloud undoubtedly speed up that process and effectively de-skill it. The rise of SAAS style applications where the on-site DBA no longer performs upgrades, patching and similar work also reduces the requirement.

In conjunction with that there is a threat from the more established players in the market. I manage database teams that support a variety of databases and a few years ago I undoubtedly had the view that Oracle was good for large databases (I might have considered 1Tb to be the dividing line between large and medium) and SQL Server was suitable for smaller ones. I am aware that is a very basic dividing line and does not take into account functionality and advanced database requirements. I do not have that view in the slightest now and consider that Oracle is too expensive and does not offer value for money. SQL Server is much higher in my focus and we now include MySQL (which I know is owned by Oracle) and also PostGres and DynamoDB.

I referred to business practises as being a reason not to use Oracle. I am specifically referring to the change in licensing for databases in the Cloud but not in the Oracle Cloud. See this article by Tim Hall for more detail.  The comments also support the theme of this blog – that there are many more alternatives to Oracle these days.

If I was starting out now I think I would be trying to go down the Data Architect road and also grabbing myself a good overview of the benefits and risks of the various types of database solutions that are now available. That skill set would also assist in becoming an infrastructure architect.

Saying all of the above – in my view there is nothing more satisfying than taking a query and improving its performance, no matter what the underlying database technology is.

Posted in Oracle | 17 Comments »

GoldenGate – Restarting a replicat with the command filterduptransactions

Posted by John Hallas on April 4, 2017

If a Goldengate replicat process fails then occasionally on the restart it skips the correct RBA and ‘loses it’s position’. The relative byte address (RBA) is the location within the trail file to indicate the current transaction.

The old school method was to calculate which RBA was the correct one and then restart the replicat. However there is a new command on the block now (pun intended) and I will demonstrate how the two methods can be used to restart the trail file correctly

Today, we saw the following in the GG log file:
PS sorry if the format is a bit off. I normally spend me as much time formatting this blog as it I do writing it. However in this case much of the work was done by Alex Priestley – a fellow DBA

Read the rest of this entry »

Posted in Goldengate, Oracle | Tagged: , , , , | 1 Comment »

Problem with V$RECOVERY_AREA_USAGE view and FRA space not being reclaimed

Posted by John Hallas on March 16, 2017

We received the following alert from our alerting system
 
Flash Recovery Area for FOLPRD1A has 9.97 percent of usable space left.
 
This is a standby database:
 
HOST       INSTANCE   STATUS     VERSION      STARTED                   UPTIME
———- ———- ———- ———— ————————- ————————————————–
xxxxxxxx   INSTANCE   MOUNTED    11.2.0.3.0   18-JAN-2017 18:58:18      33 days(s) 13 hour(s) 4 minute(s) 1 seconds
 
DB_NAME     UNIQUE_NAME DB_ROLE            OPEN_MODE  PROTECTION_MODE      PROTECTION_LEVEL
———– ———– —————— ———- ——————– ——————–
xxxxxxxx    INSTANCE    PHYSICAL STANDBY   MOUNTED    MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY
 
Usually, when under space pressure the standby will delete archivelogs and flashback logs that it no longer needs so this alert isn’t normal for a standby.  However, in this scenario, none of the space is reclaimable.  Therefore, without intervention the FRA would eventually hit 100% and stop logs being transported to the standby.
 
NAME                                     SPACE_LIMIT SPACE_USED SPACE_RECLAIMABLE NUMBER_OF_FILES
—————————————- ———– ———- —————– —————
+FRA                                             400     379.48                 0           11268
 
FILE_TYPE               USED_GB RECLAIMABLE_GB PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
——————– ———- ————– —————— ————————- —————
CONTROL FILE                .12              0                .03                         0               1
REDO LOG                      0              0                  0                         0               0
ARCHIVED LOG             326.12              0              81.53                         0           11068
BACKUP PIECE                .12              0                .03                         0               1
IMAGE COPY                    0              0                  0                         0               0
FLASHBACK LOG             53.08              0              13.27                         0             197
FOREIGN ARCHIVED LOG          0              0                  0                         0               0
 
I checked the RMAN configuration suspecting it hadn’t been changed post switchover:
 
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;
 
Rather than being “NONE” it did look right but I did notice it was ‘ALL’.  I thought we usually had it as follows:
 
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY;
 
I assumed this wouldn’t make a difference as we only have one standby and all the logs had been applied.  However, I changed it anyway.  Straight away this had the desired effect and practically all the space became reclaimable.
 
NAME                                     SPACE_LIMIT SPACE_USED SPACE_RECLAIMABLE NUMBER_OF_FILES
—————————————- ———– ———- —————– —————
+FRA                                             400     379.48            325.92           11268
 
FILE_TYPE               USED_GB RECLAIMABLE_GB PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
——————– ———- ————– —————— ————————- —————
CONTROL FILE                .12              0                .03                         0               1
REDO LOG                      0              0                  0                         0               0
ARCHIVED LOG             326.12         325.88              81.53                     81.47           11068
BACKUP PIECE                .12              0                .03                         0               1
IMAGE COPY                    0              0                  0                         0               0
FLASHBACK LOG             53.08              0              13.27                         0             197
FOREIGN ARCHIVED LOG          0              0                  0                         0               0
 
The alert subsequently cleared.
 
Looking at the report “Archivelog Deletion Policy – Core Production Databases” we have many databases configured with the ALL parameter.  I checked another at random and it was fine.  I suspected maybe it was the action of changing the parameter rather than the parameter being wrong and thought maybe dbms_backup_restore.refreshagedfiles would have done the job.
 
After speaking to a colleague he said this alert came out weeks ago and the trick of lowering the db_recovery_file_dest_size to force the database to be under space pressure had cleared the old logs and the alert.  Therefore, the fact that this worked suggests that the space was always reclaimable, just not shown in view which the alert uses.  I found a nice blog that shows the same issue and alludes to a bug.
 
 
“V$RECOVERY_AREA_USAGE is an aggregate view. If we check its definition, we see that the reclaimable size comes from x$kccagf.rectype.”  It directs you to a bug (for this version) that describes that the standby “does not refresh reclaimable space automatically”.
 
Bug 14227959 : STANDBY DID NOT RELEASE SPACE IN FRA
 
The workaround is to run exec dbms_backup_restore.refreshagedfiles; 
 
The blog also claims…..”but I’ve found another workaround: just run the CONFIGURE ARCHIVELOG POLICY from RMAN as it refreshes the reclaimable flag – even when there is no change.”
 
This is effectively what I did.  Therefore, I’ve put the original parameter back and switched logs on the primary numerous times and the reclaimable space is being updated.  For now we shall keep an eye on this as it’s not an issue anywhere else.

Posted in Oracle | Tagged: , , , , | 1 Comment »

Xmas day -150 hits. What is wrong with the world

Posted by John Hallas on December 30, 2016

Yes, very tongue in cheek. I know everyone does not celebrate Xmas.

I was still surprised though. This is what was viewed

Best wishes for 2017 to all my readers.

xmasday

Posted in Oracle | Leave a Comment »

RMAN checksyntax function

Posted by John Hallas on December 29, 2016

I was looking at the RMAN DEBUG options and came across the CHECKSYNTAX function which I had not used before.

Firstly a quick recap on the DEBUG option.

This can be called using the following syntax

rman target / catalog rman12g1/xxxx@rmancat debug trace=rmantrace.log cmdfile=backup.rcv

or

rman target / catalog rman12g1/xxxx@rmancat debug trace=rmantrace.log then run RMAN> @backup.rcv (or just type in your run block of commands)

There are a number of options to DEBUG and one of the error messages lists them out quite neatly

RMAN-01009: syntax error: found "integer": expecting one of: "all, duplicate, recover, restore, resync, sql"

To be honest if I was tracing I would just stick with the DEBUG=ALL format. The DEBUG=SQL gives all the internal commands that RMAN calls and could be interesting if you were doing a deep dive into RMAN functionality

Anyway, back to the CHECKSYNTAX option.

I run it against the edited version of the command file used above

rman target / catalog rman12g1/xxxx@rmancat checksyntax cmdfile=backup.rcv 

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Dec 28 10:22:20 2016
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
connected to target database: T12TEST (DBID=1543168240)
connected to recovery catalog database
RMAN> run {
2> sql "alter session set nls_date_format=''YYYY-MM-DD:HH24:MI:SS''";
3> allocate channel c1 device type disk format '/app/oracle/backup/backup_db_%d_S_%s_P_%p_T_%t';
4> allocate channel c2 device type disk format '/app/oracle/backup/backup_db_%d_S_%s_P_%p_T_%t';
5> backup database INCLUDE CURRENT CONTROLFILEs;
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01009: syntax error: found "identifier": expecting one of: "controlfile"
RMAN-01008: the bad identifier was: CONTROLFILEs
RMAN-01007: at line 6 column 33 file: backup.rcv

Note the file backup.rcv has a blank line as the first line which confuses the line numbering

Pretty neat. I edit the file and put a different error in, a much more common missing semi-colon

RMAN> run {
2> sql "alter session set nls_date_format=''YYYY-MM-DD:HH24:MI:SS''";
3> allocate channel c1 device type disk format '/app/oracle/backup/backup_db_%d_S_%s_P_%p_T_%t';
4> allocate channel c2 device type disk format '/app/oracle/backup/backup_db_%d_S_%s_P_%p_T_%t'
5> backup
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01009: syntax error: found "backup": expecting one of: "auxiliary, connect, format, maxopenfiles, maxpiecesize, parms, rate, send, to, comma, ;"
RMAN-01007: at line 6 column 1 file: backup.rcv

Overall not a mind-shatteringly exciting find but something that might be of use one day

Posted in Oracle, RMAN | Tagged: , | 6 Comments »

Performance problems with OEM AWR warehouse

Posted by John Hallas on December 20, 2016

The Enterprise Manager AWR Warehouse is designed to hold performance data from multiple databases  for long-term analysis. It promoses that it will save storage and improve performance on your production systems. In that it is indeed correct. However the warehouse itself does not seem to be performant when taking in multiple sources and retaining them long-term – 400 days in our case. Why 400 days is an obvious question that might be asked. Primarily because we are a Retail organisation and Easter is variable each year.

 

The AWR repository database is performing poorly during the insert stage of the upload process.
Just to quickly summarise the process:
  • A dmp file is extracted on the source database and transferred across to the AWR server
  • The dmp file is then imported into a temporary schema called AWR$XXXXXX (this loads quickly)
  • This data is then inserted into the main AWR tables inside the SYS schema. Is is this stage that is slow.

In order to completely isolate the issue, we altered a parameter, so only one AWR file gets loaded at once, cutting any contention / locking issues out of the equation:

Read the rest of this entry »

Posted in 12c new features, Oracle | Tagged: , , , | 4 Comments »