Oracle DBA – A lifelong learning experience

Identifying corrupt blocks

Posted by John Hallas on April 14, 2010

Recently we had a datafile with a number of corrupt blocks and once it was fixed I resolved to try out all the methods available of identifying block corruption. Searching around I came across an excellent blog by   Asif Momen – Practising block recovery which provided me pretty much what I wanted. However I did think it worth adding a blog entry as I wanted to document all the various methods available to identify corruption.  I also wanted to cover how to test causing a corruption when using ASM datafiles.

Create a datafile, add data and find out which block that data is in

 create smallfile tablespace bad_data datafile '+DATA' size 10M;

Create table test (username varchar2(9), password varchar2(6)) tablespace bad_data;

DECLARE

u  VARCHAR2(9);

p  VARCHAR2(6);

BEGIN

FOR jump  IN 1 ..10000  LOOP

u := 'TEST'||jump;

p := 'P'||jump;

insert into test values (u,p);

END LOOP;

commit;

END;

/

PL/SQL procedure successfully completed.

select

rowid ,

to_number(utl_encode.base64_decode(utl_raw.cast_to_raw(lpad(substr(rowid, 1, 6), 8, 'A'))), 'XXXXXXXXXXXX') as objid,

to_number(utl_encode.base64_decode(utl_raw.cast_to_raw(lpad(substr(rowid, 7, 3), 4, 'A'))), 'XXXXXX') as filenum,

to_number(utl_encode.base64_decode(utl_raw.cast_to_raw(lpad(substr(rowid, 10, 6), 8, 'A'))), 'XXXXXXXXXXXX') as blocknum,

to_number(utl_encode.base64_decode(utl_raw.cast_to_raw(lpad(substr(rowid, 16, 3), 4, 'A'))), 'XXXXXX') as rowslot

from test where password='P7777'

ROWID                   OBJID    FILENUM   BLOCKNUM    ROWSLOT

—————— ———- ———- ———- ———-

AAAXaiAAHAAAACdAAr      95906          7        156         43

Copy datafile from ASM to filesystem as it is easier to manipulate there

alter tablespace bad_data offline;

Tablespace altered

RMAN> copy datafile 7 to '/home/oracle/bad_data_01.dbf';

Starting backup at 2010-04-14:10:04:29

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=306 device

channel ORA_DISK_1: starting datafile copy

input datafile file number=00007 name=+DATA/oemdev1a/datafile/bad_data.266.716292163

output file name=/home/oracle/bad_data_01.dbf tag=TAG20100414T100432 RECID=3 STAMP=716292274

channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03

Finished backup at 2010-04-14:10:04:36

Starting Control File and SPFILE Autobackup at 2010-04-14:10:04:36

piece handle=+FRA/oemdev1a/autobackup/2010_04_14/s_716292277.459.716292277 comment=NONE

Finished Control File and SPFILE Autobackup at 2010-04-14:10:04:38

RMAN> exit

Corrupt the datafile using dd

/home/oracle $cp bad_data_01.dbf bad_data_01.dbf_good

/home/oracle $dd if=/home/oracle/bad_data_01.dbf bs=8k count=156 of=/home/oracle/bad_data_01.dbf_new

156+0 records in

156+0 records out

/home/oracle $dd if=/home/oracle/bad_data_01.dbf bs=8k count=1 >> /home/oracle/bad_data_01.dbf_new

1+0 records in

1+0 records out

/home/oracle $dd if=/home/oracle/bad_data_01.dbf bs=8k skip=157 >> /home/oracle/bad_data_01.dbf_new

1124+0 records in

1124+0 records out

/home/oracle $mv /home/oracle/bad_data_01.dbf_new /home/oracle/bad_data_01.dbf

Put the datafile online and see if any corruption exist

SQL*Plus: Release 11.1.0.7.0 - Production on Wed Apr 14 10:06:30 2010

Copyright (c) 1982, 2008, Oracle.  All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production

With the Partitioning option

SQL>select file_name from dba_data_files where tablespace_name = 'BAD_DATA';

FILE_NAME

--------------------------------------------------------------------------------

+DATA/oemdev1a/datafile/bad_data.266.716292163

SQL>alter database rename file '+DATA/oemdev1a/datafile/bad_data.266.716292163' to '/home/oracle/bad_data_01.dbf';

Database altered.

SQL>alter tablespace bad_data online;

Tablespace altered.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production

With the Partitioning option

SQL>select * from test where password='P7777;

select * from test where password='P7777'

*

ERROR at line 1:

ORA-01578: ORACLE data block corrupted (file # 7, block # 156)

ORA-01110: data file 7: '/home/oracle/bad_data_01.dbf' 

SUCCESS – we have corruption, now let’s get the datafile back into ASM before we test for block corruption using various methods

Recovery Manager: Release 11.1.0.7.0 - Production on Wed Apr 14 10:14:12 2010

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: OEMDEV1A (DBID=63225982)

using target database control file instead of recovery catalog

RMAN> copy datafile '/home/oracle/bad_data_01.dbf' to '+DATA';

Starting backup at 2010-04-14:10:14:38

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=289 device

channel ORA_DISK_1: starting datafile copy

input datafile file number=00007 name=/home/oracle/bad_data_01.dbf

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 04/14/2010 10:14:41

ORA-19566: exceeded limit of 0 corrupt blocks for file /home/oracle/bad_data_01.dbf

OOPS, RMAN does an implicit check of the datafile and does not allow it to be moved into ASM. Seems very reasonable

DB verify

/home/oracle $dbv file=/home/oracle/bad_data_01.dbf  blocksize=8192

DBVERIFY: Release 11.1.0.7.0 - Production on Wed Apr 14 10:16:58 2010

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/bad_data_01.dbf

Page 156 is marked corrupt

Corrupt block relative dba: 0x01c0009c (file 7, block 156)

Bad header found during dbv:

Data in bad block:

type: 0 format: 2 rdba: 0xffc00000

last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00

spare1: 0x0 spare2: 0x0 spare3: 0x0

consistency value in tail: 0x00000000

check value in block header: 0x5dc4

block checksum disabled

DBVERIFY - Verification complete

Total Pages Examined         : 1280

Total Pages Processed (Data) : 27

Total Pages Failing   (Data) : 0

Total Pages Processed (Index): 0

Total Pages Failing   (Index): 0

Total Pages Processed (Other): 131

Total Pages Processed (Seg)  : 0

Total Pages Failing   (Seg)  : 0

Total Pages Empty            : 1121

Total Pages Marked Corrupt   : 1

Total Pages Influx           : 0

Total Pages Encrypted        : 0

Highest block SCN            : 8780478 (0.8780478) 

export the database to /dev/null – forces a database read but produces no output file

$exp

Export: Release 11.1.0.7.0 - Production on Wed Apr 14 10:19:12 2010

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

Username: / as sysdba

Connected to: Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production

With the Partitioning option

Enter array fetch buffer size: 4096 >

Export file: expdat.dmp > /dev/null

Volume size (<ret> for no restriction) >

(1)E(ntire database), (2)U(sers), or (3)T(ables): (2)U > U

Export grants (yes/no): yes >

Export table data (yes/no): yes >

Compress extents (yes/no): yes >

Export done in US7ASCII character set and AL16UTF16 NCHAR character set

server uses AL32UTF8 character set (possible charset conversion)

About to export specified users ...

User to be exported: (RETURN to quit) > TEST

. about to export TEST's tables via Conventional Path ...

. . exporting table                           TEST

EXP-00056: ORACLE error 1578 encountered

ORA-01578: ORACLE data block corrupted (file # 7, block # 156)

ORA-01110: data file 7: '/home/oracle/bad_data_01.dbf'

RMAN to validate the database and CHECK LOGICAL.

The validate database command does not perform a backup but checks each block to see if any physical corruption can be detected. Logical corruption can also be checked at the same time by using the CHECK LOGICAL command to RMAN BACKUP DATABASE VALIDATE. Logical corruption is commonly associated with a database recovery when NOLOGGING has been used.

RMAN> <strong>BACKUP VALIDATE DATABASE ;</strong>

Starting backup at 2010-04-14:10:26:10
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=+DATA/oemdev1a/datafile/sysaux.261.710070859
input datafile file number=00001 name=+DATA/oemdev1a/datafile/system.260.710070855
input datafile file number=00005 name=/app/oracle/oradata/OEMDEV1A/mgmt.dbf
input datafile file number=00003 name=+DATA/oemdev1a/datafile/undotbs1.262.710070861
input datafile file number=00006 name=/app/oracle/oradata/OEMDEV1A/mgmt_ecm_depot1.dbf
input datafile file number=00007 name=/home/oracle/bad_data_01.dbf
input datafile file number=00004 name=+DATA/oemdev1a/datafile/users.264.710070871
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25

List of Datafiles
=================

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7    FAILED 0              1121         1280            8780478
File Name: /home/oracle/bad_data_01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data       0              27
Index      0              0
Other      1              132

validate found one or more corrupt blocks
See trace file /app/oracle/diag/rdbms/oemdev1a/OEMDEV1A/trace/OEMDEV1A_ora_5559.trc for details
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Control File and SPFILE

RMAN> <strong>backup validate check logical database;</strong>

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7    FAILED 0              1121         1280            8780478
File Name: /home/oracle/bad_data_01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data       0              27
Index      0              0
Other      1              132

validate found one or more corrupt blocks
See trace file /app/oracle/diag/rdbms/oemdev1a/OEMDEV1A/trace/OEMDEV1A_ora_5559.trc for details
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01

Using DBMS_REPAIR to check for corruption

Create a table to hold the data

  execute dbms_repair.admin_tables( 'REPAIR_TABLE',dbms_repair.repair_table,dbms_repair.create_action);

Run the dbms_repair package to check for corruption

  set serveroutput on
declare corr_count binary_integer;
begin
corr_count := 0;
dbms_repair.CHECK_OBJECT (
schema_name => 'TEST',
object_name => 'TEST',
partition_name => null,
object_type => dbms_repair.table_object,
repair_table_name => 'REPAIR_TABLE',
flags => null,
relative_fno => null,
block_start => null,
block_end => null,
corrupt_count => corr_count
);
dbms_output.put_line(to_char(corr_count));
end;
/

col object_id form 9999999999
col object_name form 20
col tablespace_id form 999
col block_id form 999999999
SQL>select object_id,object_name, tablespace_id ,block_id from repair_table;

OBJECT_ID OBJE TABLESPACE_ID   BLOCK_ID
----------- ---- ------------- ----------
95906 TEST             7        156

V$DATABASE_BLOCK_CORRUPTION

After all this testing we should be able to find something in the V$DATABASE_BLOCK_CORRUPTION view

SQL>select * from  V$DATABASE_BLOCK_CORRUPTION

FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
7        156          1                  0 CORRUPT

4 Responses to “Identifying corrupt blocks”

  1. Nathan Dhital said

    Great example of the real world scenario; I need to read it carefully again to grasp it. Does the V$DATABSE_BLOCK_CORRUPTION populates after u run the above process? In this case, the clocks corruption_change is one, however, # of corrupt is 0. I am under the impression that you suspected the corruption in the database or it is a daily routine check for pro-active maintenance?

    • John Hallas said

      The manual shows CORRUPTION_CHANGE# NUMBER Change number at which the logical corruption was detected. Set to 0 to indicate media corruption. In the example above it was a media corrupt (because I had done it !!!)

      Thanks for the nice comment Nathan

  2. Hi John. Nice example, thanks for the sharing

    I’ve also posted some articles on Block Recovery on my blog. I’d like to watch my “Block Media Recovery” video tutorial, you can watch it from the following link:

    RMAN VIDEO Tutorial series – Performing Block Media Recovery with RMAN

    Keep posting such interesting blog posts 🙂

  3. […] https://jhdba.wordpress.com/2010/04/14/identifying-corrupt-blocks/ […]

Leave a comment