Oracle DBA – A lifelong learning experience

Archive for the ‘Oracle’ Category

ORA-01450: maximum key length (3800) exceeded

Posted by John Hallas on December 13, 2017

This is an oddity as essentially I have an index on a table but I cannot rebuild it because it suggests the block size should be bigger than 8K – however both the database and tablespace block size are 16K already.

The maximum allowed index key length depends on your block size. So the minimum allowed size reported in ORA-01450 varies, depending on which block size your index is using:

ORA-01450 maximum key length (758) exceeded -> (2K Block)

ORA-01450 maximum key length (1578) exceeded -> (4K block)

ORA-01450 maximum key length (3218) exceeded -> (8K Block)

ORA-01450 maximum key length (6498) exceeded -> (16K Block)
See MOSC Note 136158.1 and MOSC Note 236329.1 for more details on the ORA-01450 error and key length.

Even more odd was that I could drop the index and recreate it with no issues and yet rebuild failed every time, even after the recreate. Studying the table and data there did not seem to be a value that was much bigger than the others and certainly nothing that would cause one or more rows to be much larger than the others. Read the rest of this entry »

Advertisements

Posted in Oracle | Tagged: | Leave a Comment »

How to add a metric extension and use a ‘control’ table to test results

Posted by John Hallas on November 27, 2017

I have previously blogged about creating a User Defined Metric (UDM) before  (7 years ago!)  but the method I am using today demonstrates how you can test out the alert using specific data.

The aim was to see if integration data is flowing through a table.  If there is a backlog we want to know about it.

Simon Ryan, a colleague did all the preparatory work and created all the 18 !! screen shots which I had to copy and paste into this entry

select * from sm7.EVENTOUTM1; -- Transient table so data not usually present.

The data is processed quickly but occasionally you can see a handful of rows for a few seconds.

select DESCRIPTOR ,
EVTYPE ,
TO_CHAR(EVTIME, 'DD-MON-YYYY HH24:MI:SS') ,
EVSYSSEQ ,
EVUSRSEQ from sm7.EVENTOUTM1;
DESCRIPTOR EVTYPE EVTIME EVSYSSEQ EVUSRSEQ
 (BLOB) email 08-NOV-2017 11:40:01 0000000005254328

A typical query for the ME would be very simple:

select count(*) from sm7.EVENTOUTM1;

However, we are going to plugin a ‘control’ table in order for us to test the alert by artificially adjusting the results. Read the rest of this entry »

Posted in Oracle | Tagged: , , , | Leave a Comment »

DBA vacancy Bradford, West Yorkshire

Posted by John Hallas on November 24, 2017

I have a vacancy in my team and the details are in the link below

https://apply.morrisons.jobs/vacancies/2137/technology-specialist–database.html

Lots of interesting work going including 85Tb Data Warehouse on Exadata and developing Cloud capability using AWS and Google.

 

Posted in Oracle | 3 Comments »

Baselines – session creating privs v session running privs

Posted by John Hallas on November 21, 2017

A colleague Richard Wilkinson was telling me about an issue he had come across with baselines and I asked him to write it up as it was an interesting experience.

The following MERGE SQL runs once a day. It always uses the same plan and roughly takes between 30 and 80 minutes:

SQL_ID/ PHV 6zs5dk6t6pkfs / 3064471754

 MERGE INTO DWB_X_PLNGRM_ATTR_DAY t
 USING (
 SELECT ilv.bsns_unit_cd,
 ilv.sku_item_nbr,
 ilv.cntnt_cd ,
 ilv.sel_units_cntnt_status,
 ilv.plngrm_item_grp_cd,
 dwr.bay_cnt_key,
 dwr.bay_cnt
 FROM (
 SELECT DISTINCT bsns_unit_cd,
 sku_item_nbr,
 cntnt_cd ,
 sel_units_cntnt_status,
 plngrm_item_grp_cd,
 --bay_cnt_key,
 max(bay_cnt_key) bay_cnt_key
 --bay_cnt
 FROM dwb_x_sel_item_units_day
 WHERE (1=1)
 AND day_key >= :v_daywkey
 GROUP BY bsns_unit_cd, sku_item_nbr, cntnt_cd, sel_units_cntnt_status, plngrm_item_grp_cd
 ) ilv,
 dwr_x_plngrm_loc_bay_cnt dwr
 WHERE ilv.bay_cnt_key = dwr.bay_cnt_key
 ) s
 ON (
 t.CNTNT_CD = s.CNTNT_CD
 AND t.SKU_ITEM_NBR = s.SKU_ITEM_NBR
 AND t.BSNS_UNIT_CD = s.BSNS_UNIT_CD
 AND t.PLNGRM_CNTNT_STATUS = s.SEL_UNITS_CNTNT_STATUS
 AND t.PLNGRM_ITEM_GRP_CD = s.PLNGRM_ITEM_GRP_CD
 --and t.DAY_KEY >= :v_daywkey
 )
 WHEN MATCHED
 THEN UPDATE SET
 t.bay_cnt_key = s.bay_cnt_key,
 t.bay_cnt = s.bay_cnt;
Plan hash value: 3064471754 
 
---------------------------------------------------------------------------------- 
| Id | Operation | Name | 
---------------------------------------------------------------------------------- 
| 0 | MERGE STATEMENT | | 
| 1 | PX COORDINATOR | | 
| 2 | PX SEND QC (RANDOM) | :TQ10007 | 
| 3 | INDEX MAINTENANCE | DWB_X_PLNGRM_ATTR_DAY | 
| 4 | PX RECEIVE | | 
| 5 | PX SEND RANGE | :TQ10006 | 
| 6 | MERGE | DWB_X_PLNGRM_ATTR_DAY | 
| 7 | PX RECEIVE | | 
| 8 | PX SEND HYBRID (ROWID PKEY) | :TQ10005 | 
| 9 | VIEW | | 
| 10 | HASH JOIN BUFFERED | | 
| 11 | PX RECEIVE | | 
| 12 | PX SEND HASH | :TQ10003 | 
| 13 | HASH JOIN BUFFERED | | 
| 14 | PX RECEIVE | | 
| 15 | PX SEND HASH | :TQ10001 | 
| 16 | PX BLOCK ITERATOR | | 
| 17 | TABLE ACCESS STORAGE FULL | DWR_X_PLNGRM_LOC_BAY_CNT | 
| 18 | PX RECEIVE | | 
| 19 | PX SEND HASH | :TQ10002 | 
| 20 | VIEW | | 
| 21 | SORT GROUP BY | | 
| 22 | PX RECEIVE | | 
| 23 | PX SEND HASH | :TQ10000 | 
| 24 | PX BLOCK ITERATOR | | 
| 25 | TABLE ACCESS STORAGE FULL| DWB_X_SEL_ITEM_UNITS_DAY | 
| 26 | PX RECEIVE | | 
| 27 | PX SEND HASH | :TQ10004 | 
| 28 | PX BLOCK ITERATOR | | 
| 29 | TABLE ACCESS STORAGE FULL | DWB_X_PLNGRM_ATTR_DAY | 
----------------------------------------------------------------------------------

One day this sql changes execution plan and runs for over 5 hours before being killed.

The new plan is 4185795484:

----------------------------------------------------------------------------- 
| Id | Operation | Name | 
----------------------------------------------------------------------------- 
| 0 | MERGE STATEMENT | | 
| 1 | MERGE | DWB_X_PLNGRM_ATTR_DAY | 
| 2 | PX COORDINATOR | | 
| 3 | PX SEND QC (RANDOM) | :TQ10005 | 
| 4 | VIEW | | 
| 5 | HASH JOIN BUFFERED | | 
| 6 | PX RECEIVE | | 
| 7 | PX SEND HASH | :TQ10003 | 
| 8 | PX BLOCK ITERATOR | | 
| 9 | TABLE ACCESS STORAGE FULL | DWR_X_PLNGRM_LOC_BAY_CNT | 
| 10 | PX RECEIVE | | 
| 11 | PX SEND HASH | :TQ10004 | 
| 12 | HASH JOIN BUFFERED | | 
| 13 | PX RECEIVE | | 
| 14 | PX SEND HASH | :TQ10001 | 
| 15 | VIEW | | 
| 16 | SORT GROUP BY | | 
| 17 | PX RECEIVE | | 
| 18 | PX SEND HASH | :TQ10000 | 
| 19 | PX BLOCK ITERATOR | | 
| 20 | TABLE ACCESS STORAGE FULL| DWB_X_SEL_ITEM_UNITS_DAY | 
| 21 | PX RECEIVE | | 
| 22 | PX SEND HASH | :TQ10002 | 
| 23 | PX BLOCK ITERATOR | | 
| 24 | TABLE ACCESS STORAGE FULL | DWB_X_PLNGRM_ATTR_DAY | 
-----------------------------------------------------------------------------

The main difference I can see is that the tables are being joined in a different order.

3064471754 (good): DWB_X_SEL_ITEM_UNITS_DAY joined to DWR_X_PLNGRM_LOC_BAY_CNT then on to the target of the MERGE, DWB_X_PLNGRM_ATTR_DAY.
4185795484 (bad): DWB_X_SEL_ITEM_UNITS_DAY joined to DWB_X_PLNGRM_ATTR_DAY then to DWR_X_PLNGRM_LOC_BAY_CNT.

The fix is simple, fix the good plan that has always been used (3064471754) using SQL baselines.

It looks like this statement already has a few couple of baselined plans , neither are fixed.

sql handle: SQL_0e4f45d7832dd5ab
SQL_PLAN_0wmu5uy1kvpdb8c1d711e 10-NOV-17 10.38.07.000000000 MANUAL-LOAD
SQL_PLAN_0wmu5uy1kvpdb1eaf5716 29-SEP-17 23.16.31.000000000 AUTO-CAPTURE

Looking at the detailed view of the baseline shows that one of the plans is the bad plan and the other has no history of use in ASH:

select * 
 from table(dbms_xplan.display_sql_plan_baseline(sql_handle => 'SQL_0e4f45d7832dd5ab',
 format => 'basic')); --detailed view


PLAN_TABLE_OUTPUT 
----------------------------------------------------------------------------- 
SQL handle: SQL_0e4f45d7832dd5ab 
SQL text: MERGE INTO DWB_X_PLNGRM_ATTR_DAY T USING ( SELECT ILV.BSNS_UNIT_CD, 
ILV.SKU_ITEM_NBR, ILV.CNTNT_CD , ILV.SEL_UNITS_CNTNT_STATUS, 
ILV.PLNGRM_ITEM_GRP_CD, DWR.BAY_CNT_KEY, DWR.BAY_CNT FROM ( SELECT 
DISTINCT BSNS_UNIT_CD, SKU_ITEM_NBR, CNTNT_CD , SEL_UNITS_CNTNT_STATUS, 
PLNGRM_ITEM_GRP_CD, MAX(BAY_CNT_KEY) BAY_CNT_KEY FROM 
DWB_X_SEL_ITEM_UNITS_DAY WHERE (1=1) AND DAY_KEY >= :B1 GROUP BY 
BSNS_UNIT_CD, SKU_ITEM_NBR, CNTNT_CD, SEL_UNITS_CNTNT_STATUS, 
PLNGRM_ITEM_GRP_CD ) ILV, DWR_X_PLNGRM_LOC_BAY_CNT DWR WHERE 
ILV.BAY_CNT_KEY = DWR.BAY_CNT_KEY ) S ON ( T.CNTNT_CD = S.CNTNT_CD AND 
T.SKU_ITEM_NBR = S.SKU_ITEM_NBR AND T.BSNS_UNIT_CD = S.BSNS_UNIT_CD AND 
T.PLNGRM_CNTNT_STATUS = S.SEL_UNITS_CNTNT_STATUS AND 
T.PLNGRM_ITEM_GRP_CD = S.PLNGRM_ITEM_GRP_CD ) WHEN MATCHED THEN UPDATE 
SET T.BAY_CNT_KEY = S.BAY_CNT_KEY, T.BAY_CNT = S.BAY_CNT 
-------------------------------------------------------------------------------- 
 
-------------------------------------------------------------------------------- 
Plan name: SQL_PLAN_0wmu5uy1kvpdb1eaf5716 Plan id: 514807574 
Enabled: YES Fixed: NO Accepted: NO Origin: AUTO-CAPTURE 
-------------------------------------------------------------------------------- 
 
Plan hash value: 4185795484 /******BAD PLAN*******/ 
 
----------------------------------------------------------------------------- 
| Id | Operation | Name | 
----------------------------------------------------------------------------- 
| 0 | MERGE STATEMENT | | 
| 1 | MERGE | DWB_X_PLNGRM_ATTR_DAY | 
| 2 | PX COORDINATOR | | 
| 3 | PX SEND QC (RANDOM) | :TQ10005 | 
| 4 | VIEW | | 
| 5 | HASH JOIN BUFFERED | | 
| 6 | PX RECEIVE | | 
| 7 | PX SEND HASH | :TQ10003 | 
| 8 | PX BLOCK ITERATOR | | 
| 9 | TABLE ACCESS STORAGE FULL | DWR_X_PLNGRM_LOC_BAY_CNT | 
| 10 | PX RECEIVE | | 
| 11 | PX SEND HASH | :TQ10004 | 
| 12 | HASH JOIN BUFFERED | | 
| 13 | PX RECEIVE | | 
| 14 | PX SEND HASH | :TQ10001 | 
| 15 | VIEW | | 
| 16 | SORT GROUP BY | | 
| 17 | PX RECEIVE | | 
| 18 | PX SEND HASH | :TQ10000 | 
| 19 | PX BLOCK ITERATOR | | 
| 20 | TABLE ACCESS STORAGE FULL| DWB_X_SEL_ITEM_UNITS_DAY | 
| 21 | PX RECEIVE | | 
| 22 | PX SEND HASH | :TQ10002 | 
| 23 | PX BLOCK ITERATOR | | 
| 24 | TABLE ACCESS STORAGE FULL | DWB_X_PLNGRM_ATTR_DAY | 
----------------------------------------------------------------------------- 
 
-------------------------------------------------------------------------------- 
Plan name: SQL_PLAN_0wmu5uy1kvpdb8c1d711e Plan id: 2350739742 
Enabled: YES Fixed: NO Accepted: YES Origin: MANUAL-LOAD 
-------------------------------------------------------------------------------- 
 
Plan hash value: 1932271691 /**********no record of use in ASH*********/ 
 
----------------------------------------------------------------------------- 
| Id | Operation | Name | 
----------------------------------------------------------------------------- 
| 0 | MERGE STATEMENT | | 
| 1 | MERGE | DWB_X_PLNGRM_ATTR_DAY | 
| 2 | PX COORDINATOR | | 
| 3 | PX SEND QC (RANDOM) | :TQ10005 | 
| 4 | VIEW | | 
| 5 | HASH JOIN BUFFERED | | 
| 6 | PX RECEIVE | | 
| 7 | PX SEND HASH | :TQ10003 | 
| 8 | HASH JOIN BUFFERED | | 
| 9 | PX RECEIVE | | 
| 10 | PX SEND HASH | :TQ10001 | 
| 11 | PX BLOCK ITERATOR | | 
| 12 | TABLE ACCESS STORAGE FULL | DWR_X_PLNGRM_LOC_BAY_CNT | 
| 13 | PX RECEIVE | | 
| 14 | PX SEND HASH | :TQ10002 | 
| 15 | VIEW | | 
| 16 | HASH GROUP BY | | 
| 17 | PX RECEIVE | | 
| 18 | PX SEND HASH | :TQ10000 | 
| 19 | PX BLOCK ITERATOR | | 
| 20 | TABLE ACCESS STORAGE FULL| DWB_X_SEL_ITEM_UNITS_DAY | 
| 21 | PX RECEIVE | | 
| 22 | PX SEND HASH | :TQ10004 | 
| 23 | PX BLOCK ITERATOR | | 
| 24 | TABLE ACCESS STORAGE FULL | DWB_X_PLNGRM_ATTR_DAY | 
-----------------------------------------------------------------------------

Plan 1932271691 joins the table in the same order as 3064471754 and looks logically the same, so I think this is preferable to the bad plan (which is also in the baseline).
I’m not sure why the optimizer isn’t choosing plan 1932271691.

I created an SQL Tuning Set from a snapshot range containing yesterday’s good run and confirmed that the only plan in there was 3064471754.

I uploaded the STS to the SQL baseline, however it does not add a new plan, it still shows:

sql handle: SQL_0e4f45d7832dd5ab
SQL_PLAN_0wmu5uy1kvpdb8c1d711e
SQL_PLAN_0wmu5uy1kvpdb1eaf5716

The only difference is that the timestamp of SQL_PLAN_0wmu5uy1kvpdb8c1d711e is now current so it looks like it’s been updated.

When querying the baseline in detail I expect to see SQL_PLAN_0wmu5uy1kvpdb8c1d711e now contains 3064471754, however it doesn’t.
It still shows 1932271691, nothing has changed except the timestamp, it looks like we can’t fix the good plan that has been used every day for months.

I fixed the potentially good plan and flushed the bad plan from all instances, however upon retrying the MERGE the bad plan was still chosen.

As an experiment I ran an explain plan of the MERGE in SQL Developer replacing the bind variable with the literal that was being passed.
The explain plan returns the good plan (same as 3064471754).

As a workaround I ask the team to run the MERGE from SQL Developer with the literal (after confirming their explain plan also returns the good plan).

Upon monitoring the SQL Developer session it is confirmed that it is using the good plan with the exact same plan ID (3064471754) and completes in 1 hour 15 minutes.

As it is Friday afternoon we agree on this workaround for the weekend (support team is 24 hours / shifts).

The next day the batch job MERGE has picked a different bad plan.
The support team kill it and run the workaround.

The workaround then stops working well as it has picked the same bad plan as the batch job.
The SQL Developer MERGE runs for 5 hours before being killed.

The new bad plan is 1501257745.

1501257745 looks logically the same as the good plan (3064471754) and the potentially good plan in the baseline (1932271691) as it has the same join order.

However we now have a breakthrough – the new bad plan has now given us a new angle to look at.

1501257745 starts by spawning many parallel processes, however it soon drops off to one serial process.
This one serial process is performing the top-level MERGE operation which flatlines for hours before being killed.

I decide to retry fixing the original good plan.
I start with a clean slate and drop the SQL baseline and both plans contained within.
I create a new STS from a different snapshot period than before which only contains 3064471754.
I load the STS into the baseline and a new baseline and plan are created.

When querying the contents of the baseline it still just contains the potentially good plan from before (1932271691).

Oracle is now deliberately and consistently changing my plan in transit from 3064471754 to 1932271691.

With the new insight given by the new bad plan (1501257745) I now look at the top-level DML parts of the plans.

Logically the potentially good plan (1932271691) and the new bad plan (1501257745) are the same statement as 3064471754 but without the parallel MERGE.

Plan hash value: 1932271691 and 1501257745 
----------------------------------------------------------------------------- 
 | Id | Operation | Name | 
----------------------------------------------------------------------------- 
 | 0 | MERGE STATEMENT | | 
 | 1 | MERGE | DWB_X_PLNGRM_ATTR_DAY |


Plan hash value: 3064471754 
---------------------------------------------------------------------------------- 
| Id | Operation | Name | 
 ---------------------------------------------------------------------------------- 
| 0 | MERGE STATEMENT | | 
| 1 | PX COORDINATOR | | 
| 2 | PX SEND QC (RANDOM) | :TQ10007 | 
| 3 | INDEX MAINTENANCE | DWB_X_PLNGRM_ATTR_DAY | 
| 4 | PX RECEIVE | | 
| 5 | PX SEND RANGE | :TQ10006 | 
| 6 | MERGE | DWB_X_PLNGRM_ATTR_DAY |

Now we are concentrating on the parallelism of DML, I remember working at a site previously and having to set ENABLE PDML at the session level.

I ran an explain plan of the MERGE before and after issuing the following statement:

ALTER SESSION ENABLE PARALLEL DML;

Afterwards the plan changed to include parallel steps for the MERGE.

I told the support team to do this in their SQL Developer session and the MERGE began running with the usual good plan and even had the same plan id (3064471754).

Now that PDML is enabled in my session, I reloaded the same STS as before into the same baseline.

Once again the only high-level change was the changing of the timestamp on the plan, however when querying the detail of the plan it no longer contained 1932271691 but a new plan, 2197295520.

Plan hash value: 2197295520 
 
---------------------------------------------------------------------------------- 
| Id | Operation | Name | 
---------------------------------------------------------------------------------- 
| 0 | MERGE STATEMENT | | 
| 1 | PX COORDINATOR | | 
| 2 | PX SEND QC (RANDOM) | :TQ10007 | 
| 3 | INDEX MAINTENANCE | DWB_X_PLNGRM_ATTR_DAY | 
| 4 | PX RECEIVE | | 
| 5 | PX SEND RANGE | :TQ10006 | 
| 6 | MERGE | DWB_X_PLNGRM_ATTR_DAY | 
| 7 | PX RECEIVE | | 
| 8 | PX SEND HYBRID (ROWID PKEY) | :TQ10005 | 
| 9 | VIEW | | 
| 10 | HASH JOIN BUFFERED | | 
| 11 | PX RECEIVE | | 
| 12 | PX SEND HASH | :TQ10003 | 
| 13 | HASH JOIN BUFFERED | | 
| 14 | PX RECEIVE | | 
| 15 | PX SEND HASH | :TQ10001 | 
| 16 | PX BLOCK ITERATOR | | 
| 17 | TABLE ACCESS STORAGE FULL | DWR_X_PLNGRM_LOC_BAY_CNT | 
| 18 | PX RECEIVE | | 
| 19 | PX SEND HASH | :TQ10002 | 
| 20 | VIEW | | 
| 21 | HASH GROUP BY | | /****only difference from original good plan (which has SORT GROUP BY instead)****/ 
| 22 | PX RECEIVE | | 
| 23 | PX SEND HASH | :TQ10000 | 
| 24 | PX BLOCK ITERATOR | | 
| 25 | TABLE ACCESS STORAGE FULL| DWB_X_SEL_ITEM_UNITS_DAY | 
| 26 | PX RECEIVE | | 
| 27 | PX SEND HASH | :TQ10004 | 
| 28 | PX BLOCK ITERATOR | | 
| 29 | TABLE ACCESS STORAGE FULL | DWB_X_PLNGRM_ATTR_DAY | 
----------------------------------------------------------------------------------

The tables are joined in the correct order and the MERGE is parallel.
The only difference is line 21 which has HASH GROUP BY instead of SORT GROUP BY.

I fixed this plan hoping that the batch version of the MERGE will pick it up tomorrow.
It doesn’t, however it now runs with the original good plan every day (3064471754).

Whilst I’m still looking into this a key point to take away is that when using SQL baselines the environment of the session creating the baseline and the of the session accessing the baseline counts toward the eligibility of the plan.

Posted in Oracle | Tagged: , , | 3 Comments »

HP Systems Management vacancy

Posted by John Hallas on August 8, 2017

I know I do not have the right readership on this blog for a Systems Management vacancy but if any readers have colleagues who have experience using any of HP Openview, OpsBridge, HP Service Management and HP UCMDB then I have a vacancy at our Head Office in Bradford for a permanent position

Details below

https://apply.morrisons.jobs/vacancies/491/technology-specialist–system-management.html

Posted in Oracle | Leave a Comment »

Using DataGuard broker to show apply lag and throughput

Posted by John Hallas on June 20, 2017

To determine how much lag there is I normally run a script similar to this

select sequence#, applied, to_date(to_char(first_time,’DD-MON-YY:HH24:MI:SS’),
‘DD-MON-YY:HH24:MI:SS’) “First Time” ,
to_char(next_time,’DD-MON-YY:HH24:MI:SS’) “Next Time”
from v$archived_log
UNION
select NULL,database_role,NULL, db_unique_name from v$database
order by “First Time”;

However there is another way which I sometime use which actually gives a lot more information. This uses the dataguard broker command line. Use the show configuration parameter to determine database name if you are not sure

DGMGRL> show configuration

Configuration - DR

Protection Mode: MaxPerformance
 Databases:
 xxxxxx2a - Primary database
 xxxxxx2b - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Show specific database details.

DGMGRL> show database "xxxxxx2b"

Database - xxxxxx2b

Role: PHYSICAL STANDBY
 Intended State: APPLY-ON
 Transport Lag: 4 minutes 24 seconds (computed 6 seconds ago)
 Apply Lag: 5 minutes 16 seconds (computed 0 seconds ago)
 Apply Rate: 191.08 MByte/s
 Real Time Query: ON
 Instance(s):
 xxxxxx2b1 (apply instance)
 xxxxxx2b2
 xxxxxx2b3
 xxxxxx2b4
 xxxxxx2b5
 xxxxxx2b6
 xxxxxx2b7
 xxxxxx2b8

Database Status:
SUCCESS

Lots of good information there including which node is hosting the MRP process and also the apply rate. In our case that is normally between 150 and 400 MB per second.

Posted in Oracle | Tagged: , , , , , , | Leave a Comment »

Downgrading a RAC database from 11.2.0.4 to 11.2.0.3

Posted by John Hallas on May 4, 2017

It is not often that I see a database downgrade activity performed and so I thought it would be worthwhile just noting how it was done.
 2 node RAC 11.2.0.4 database to 11.2.0.3, downgraded the database only and not the grid home.
Downgrade taking place on HP-UX, any downgrades taking place on Windows OS have several additional steps and won’t be covered in this post. 
This database does not use Database Vault and pre-requisite compatibility checks were carried out

Assume all commands are being run on node 1, any commands that need to be run on node 2 will be explicitly stated. Any commands will be formatted in italics.

Set ORACLE_HOME to current 11.2.0.4 environment.
export ORACLE_HOME= /app/oracle/product/11.2.0.4/dbhome_SOA1
 
Tail alert log of both nodes in separate windows
tail -f /app/oracle/diag/rdbms/soapre2a/SOAPRE2A1/trace/alert_SOAPRE2A1.log
 
1. Stop database using srvctl on primary node
 
srvctl stop database -d SOAPRE2A
 
Monitor the alert logs to confirm when database has successfully shutdown.
2. Create pfile from spfile
 
sqlplus / as sysdba
SQL>create pfile=’/home/oracle/SOAPRE2_downgrade/SOAPRE2_clusterdisable.ora’ from spfile=’+DATA/SOAPRE2A/spfilesoapre2a.ora’;
 
3. Alter pfile CLUSTER_DATABASE parameter to FALSE
*.cluster_database=FALSE
*.compatible=’11.2.0.0.0′
 
4. Recreate spfile with new parameter
 
sqlplus / as sysdba
SQL>create spfile=’+DATA/SOAPRE2A/spfilesoapre2a.ora’ from pfile=’/home/oracle/SOAPRE2_downgrade/SOAPRE2_clusterdisable.ora’;
 
5. Startup database in downgrade mode using new spfile
 
cd $ORACLE_HOME/rdbms/admin
sqlplus / as sysdba
SQL>startup downgrade
 
6. Execute Oracle downgrade script
From the original OH –  11.2.0.4
SQL>spool /home/oracle/SOAPRE2_downgrade/downgrade.log
SQL>@catdwg.sql
SQL>spool off
SQL>shutdown immediate;
SQL>exit
 
This script can be run multiple times, in the event any errors are encountered correct them and rerun until completion.
7. Change environment variables and restore config files
 
Execute these steps on both nodes.
 
Alter ORACLE_HOME and PATH environment variable to point to downgraded directories, in our case for example:
export ORACLE_HOME=’/app/oracle/product/11.2.0.3/dbhome_SOA1_1
 
Ensure any entries in your oratab file are also altered to reference the downgraded directory.
Copy password files and config files from current ORACLE_HOME to downgraded directory.
8. Reload version specific components
 
change to downgraded release home  – 11.2.0.3
cd /app/oracle/product/11.2.0.3/dbhome_SOA1_1/rdbms/admin
SQL> sqlplus / as sysdba
SQL> startup upgrade
SQL>spool /home/oracle/SOAPRE2_donwgrade/reload.log
SQL>@catrelod.sql
SQL>spool off
This step can take quite some time to complete, in our case ~2.5 hours
9. Recompile invalid objects
 
SQL> shutdown immediate
SQL> startup
SQL> @utlprp.sql
SQL> exit
 
10. Downgrade cluster services
 
The final step was to downgrade cluster services to our old ORACLE_HOME and version, using the following srvctl command:
srvctl downgrade database -d db-unique-name -o old_ORACLE_HOME t to_old_versnum
 
in our case this was the following:
 
srvctl downgrade database -d SOAPRE2A -o /app/oracle/product/11.2.0.3/dbhome_SOA1_1 -t 11.2.0.3

Posted in 11g new features, Oracle | Tagged: , , | 3 Comments »

What is the future for an Oracle DBA?

Posted by John Hallas on April 10, 2017

I have worked with Oracle databases for around 25 years now and during that time I have been very fortunate in that there has always been work for DBAs and it has been one of the higher paying disciplines within IT.

I am not prophesying the end of the Oracle database engine but I do see the writing on the wall for some of the large corporate solutions sitting on physical equipment in a datacentre. I also have to criticise Oracle for their business practises which I know are seeing customers move away to other solutions.

Without any doubt there is pressure on those who wish to perform a purely Oracle DBA role. The growing use of Cloud does reduce the opportunities and whilst databases always need to be built the techniques used in the Cloud undoubtedly speed up that process and effectively de-skill it. The rise of SAAS style applications where the on-site DBA no longer performs upgrades, patching and similar work also reduces the requirement.

In conjunction with that there is a threat from the more established players in the market. I manage database teams that support a variety of databases and a few years ago I undoubtedly had the view that Oracle was good for large databases (I might have considered 1Tb to be the dividing line between large and medium) and SQL Server was suitable for smaller ones. I am aware that is a very basic dividing line and does not take into account functionality and advanced database requirements. I do not have that view in the slightest now and consider that Oracle is too expensive and does not offer value for money. SQL Server is much higher in my focus and we now include MySQL (which I know is owned by Oracle) and also PostGres and DynamoDB.

I referred to business practises as being a reason not to use Oracle. I am specifically referring to the change in licensing for databases in the Cloud but not in the Oracle Cloud. See this article by Tim Hall for more detail.  The comments also support the theme of this blog – that there are many more alternatives to Oracle these days.

If I was starting out now I think I would be trying to go down the Data Architect road and also grabbing myself a good overview of the benefits and risks of the various types of database solutions that are now available. That skill set would also assist in becoming an infrastructure architect.

Saying all of the above – in my view there is nothing more satisfying than taking a query and improving its performance, no matter what the underlying database technology is.

Posted in Oracle | 17 Comments »

GoldenGate – Restarting a replicat with the command filterduptransactions

Posted by John Hallas on April 4, 2017

If a Goldengate replicat process fails then occasionally on the restart it skips the correct RBA and ‘loses it’s position’. The relative byte address (RBA) is the location within the trail file to indicate the current transaction.

The old school method was to calculate which RBA was the correct one and then restart the replicat. However there is a new command on the block now (pun intended) and I will demonstrate how the two methods can be used to restart the trail file correctly

Today, we saw the following in the GG log file:
PS sorry if the format is a bit off. I normally spend me as much time formatting this blog as it I do writing it. However in this case much of the work was done by Alex Priestley – a fellow DBA

Read the rest of this entry »

Posted in Goldengate, Oracle | Tagged: , , , , | 1 Comment »

Problem with V$RECOVERY_AREA_USAGE view and FRA space not being reclaimed

Posted by John Hallas on March 16, 2017

We received the following alert from our alerting system
 
Flash Recovery Area for FOLPRD1A has 9.97 percent of usable space left.
 
This is a standby database:
 
HOST       INSTANCE   STATUS     VERSION      STARTED                   UPTIME
———- ———- ———- ———— ————————- ————————————————–
xxxxxxxx   INSTANCE   MOUNTED    11.2.0.3.0   18-JAN-2017 18:58:18      33 days(s) 13 hour(s) 4 minute(s) 1 seconds
 
DB_NAME     UNIQUE_NAME DB_ROLE            OPEN_MODE  PROTECTION_MODE      PROTECTION_LEVEL
———– ———– —————— ———- ——————– ——————–
xxxxxxxx    INSTANCE    PHYSICAL STANDBY   MOUNTED    MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY
 
Usually, when under space pressure the standby will delete archivelogs and flashback logs that it no longer needs so this alert isn’t normal for a standby.  However, in this scenario, none of the space is reclaimable.  Therefore, without intervention the FRA would eventually hit 100% and stop logs being transported to the standby.
 
NAME                                     SPACE_LIMIT SPACE_USED SPACE_RECLAIMABLE NUMBER_OF_FILES
—————————————- ———– ———- —————– —————
+FRA                                             400     379.48                 0           11268
 
FILE_TYPE               USED_GB RECLAIMABLE_GB PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
——————– ———- ————– —————— ————————- —————
CONTROL FILE                .12              0                .03                         0               1
REDO LOG                      0              0                  0                         0               0
ARCHIVED LOG             326.12              0              81.53                         0           11068
BACKUP PIECE                .12              0                .03                         0               1
IMAGE COPY                    0              0                  0                         0               0
FLASHBACK LOG             53.08              0              13.27                         0             197
FOREIGN ARCHIVED LOG          0              0                  0                         0               0
 
I checked the RMAN configuration suspecting it hadn’t been changed post switchover:
 
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;
 
Rather than being “NONE” it did look right but I did notice it was ‘ALL’.  I thought we usually had it as follows:
 
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY;
 
I assumed this wouldn’t make a difference as we only have one standby and all the logs had been applied.  However, I changed it anyway.  Straight away this had the desired effect and practically all the space became reclaimable.
 
NAME                                     SPACE_LIMIT SPACE_USED SPACE_RECLAIMABLE NUMBER_OF_FILES
—————————————- ———– ———- —————– —————
+FRA                                             400     379.48            325.92           11268
 
FILE_TYPE               USED_GB RECLAIMABLE_GB PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
——————– ———- ————– —————— ————————- —————
CONTROL FILE                .12              0                .03                         0               1
REDO LOG                      0              0                  0                         0               0
ARCHIVED LOG             326.12         325.88              81.53                     81.47           11068
BACKUP PIECE                .12              0                .03                         0               1
IMAGE COPY                    0              0                  0                         0               0
FLASHBACK LOG             53.08              0              13.27                         0             197
FOREIGN ARCHIVED LOG          0              0                  0                         0               0
 
The alert subsequently cleared.
 
Looking at the report “Archivelog Deletion Policy – Core Production Databases” we have many databases configured with the ALL parameter.  I checked another at random and it was fine.  I suspected maybe it was the action of changing the parameter rather than the parameter being wrong and thought maybe dbms_backup_restore.refreshagedfiles would have done the job.
 
After speaking to a colleague he said this alert came out weeks ago and the trick of lowering the db_recovery_file_dest_size to force the database to be under space pressure had cleared the old logs and the alert.  Therefore, the fact that this worked suggests that the space was always reclaimable, just not shown in view which the alert uses.  I found a nice blog that shows the same issue and alludes to a bug.
 
 
“V$RECOVERY_AREA_USAGE is an aggregate view. If we check its definition, we see that the reclaimable size comes from x$kccagf.rectype.”  It directs you to a bug (for this version) that describes that the standby “does not refresh reclaimable space automatically”.
 
Bug 14227959 : STANDBY DID NOT RELEASE SPACE IN FRA
 
The workaround is to run exec dbms_backup_restore.refreshagedfiles; 
 
The blog also claims…..”but I’ve found another workaround: just run the CONFIGURE ARCHIVELOG POLICY from RMAN as it refreshes the reclaimable flag – even when there is no change.”
 
This is effectively what I did.  Therefore, I’ve put the original parameter back and switched logs on the primary numerous times and the reclaimable space is being updated.  For now we shall keep an eye on this as it’s not an issue anywhere else.

Posted in Oracle | Tagged: , , , , | 1 Comment »