ASM space marked as internal – use check all repair
Posted by John Hallas on July 30, 2009
The problem was that the FRA diskgroup seemed to be using a lot of space and and yet there was hardly files on disk as far as I could tell. HP Itanium – 11.1.0.7
ASM Disk Groups =============== Group Group Name State Type Total GB Free GB ---------- ------------------------- --------------- ------- ---------- ------- 1 DATA MOUNTED EXTERN 2581 109 2 FRA MOUNTED EXTERN 1602 127 Group Disk Header Mode State Redundancy Total MB Free MB Disk Name Failure Gr ----- ---- --------- -------- --------------- ---------- ---------- ---------- ------------------------------ ---------- 2 1 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0001 FRA_0001 2 2 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0002 FRA_0002 2 4 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0004 FRA_0004 2 5 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0005 FRA_0005 2 6 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0006 FRA_0006 2 7 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0007 FRA_0007 2 8 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0008 FRA_0008 2 9 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0009 FRA_0009 2 10 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0010 FRA_0010 2 11 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0011 FRA_0011 2 12 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0012 FRA_0012 2 13 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0013 FRA_0013 2 14 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0014 FRA_0014 2 15 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0015 FRA_0015 2 16 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0016 FRA_0016 2 17 MEMBER ONLINE NORMAL UNKNOWN 91138 0 FRA_0017 FRA_0017 2 18 MEMBER ONLINE NORMAL UNKNOWN 91138 65242 FRA_0018 FRA_0018 2 19 MEMBER ONLINE NORMAL UNKNOWN 91138 65112 FRA_0019 FRA_0019I had removed one disk from FRA and forced a rebalance but that still did not release space. I had checked the FRA disks using asmcmd but that only showed 2 online logs which were 2Gb each (we were not in archive log mode).
I had checked the diskgroup using the EM Gui which issued the command
ALTER DISKGROUP FRA CHECK ALL:
but still no success. Looking at the documentation the check command taking NOREPAIR as the default should have reported any issues into the ASM alert log. Which indeed it did (and I would have seen them if I had bothered to look)
Wed Jul 29 16:37:55 2009 SQL> ALTER DISKGROUP FRA CHECK DISK FRA_0001,FRA_0002,FRA_0004,FRA_0005,FRA_0006,FRA_0007,FRA_0008,FRA_0009,FRA_0010,FRA_0011,FRA_0012,FRA_0013,FRA_0014,FRA_0015,FRA_0016,FRA_0017,FRA_0018,FRA_0019 WARNING: Deprecated privilege SYSDBA for command 'ALTER DISKGROUP CHECK' kfdp_checkDsk(): 33 NOTE: disk FRA_0001, used AU total mismatch: DD=4294965892, AT=2283 kfdp_checkDsk(): 34 NOTE: disk FRA_0002, used AU total mismatch: DD=4294966012, AT=2283 kfdp_checkDsk(): 35 NOTE: disk FRA_0004, used AU total mismatch: DD=4294966210, AT=2283 kfdp_checkDsk(): 36 NOTE: disk FRA_0005, used AU total mismatch: DD=4294966625, AT=2283 kfdp_checkDsk(): 37 NOTE: disk FRA_0006, used AU total mismatch: DD=4294965615, AT=2280 kfdp_checkDsk(): 38 NOTE: disk FRA_0007, used AU total mismatch: DD=4294964292, AT=2283 kfdp_checkDsk(): 39 NOTE: disk FRA_0008, used AU total mismatch: DD=4294967237, AT=2284 kfdp_checkDsk(): 40 NOTE: disk FRA_0009, used AU total mismatch: DD=4294965595, AT=2280 kfdp_checkDsk(): 41 NOTE: disk FRA_0010, used AU total mismatch: DD=4294966758, AT=2279 kfdp_checkDsk(): 42 NOTE: disk FRA_0011, used AU total mismatch: DD=4294966324, AT=2279 kfdp_checkDsk(): 43 NOTE: disk FRA_0012, used AU total mismatch: DD=4294966743, AT=2279 kfdp_checkDsk(): 44 NOTE: disk FRA_0013, used AU total mismatch: DD=4294964514, AT=2278 kfdp_checkDsk(): 45 NOTE: disk FRA_0014, used AU total mismatch: DD=4294965734, AT=2278 kfdp_checkDsk(): 46 NOTE: disk FRA_0015, used AU total mismatch: DD=4294965204, AT=2279 kfdp_checkDsk(): 47 NOTE: disk FRA_0016, used AU total mismatch: DD=4294965599, AT=2279 kfdp_checkDsk(): 48 NOTE: disk FRA_0017, used AU total mismatch: DD=4294964950, AT=2280 kfdp_checkDsk(): 49 NOTE: disk FRA_0018, used AU total mismatch: DD=25896, AT=2278 kfdp_checkDsk(): 50 NOTE: disk FRA_0019, used AU total mismatch: DD=26026, AT=2280 WARNING: deprecated use of ALTER DISKGROUP CHECK arguments SUCCESS: ALTER DISKGROUP FRA CHECK DISK FRA_0001,FRA_0002,FRA_0004,FRA_0005,FRA_0006,FRA_0007,FRA_0008,FRA_0009,FRA_0010,FRA_0011,FRA_0012,FRA_0013,FRA_0014,FRA_0015,FRA_0016,FRA_0017,FRA_0018,FRA_0019From the command line I then ran the comand
ALTER DISKGROUP FRA CHECK ALL REPAIR;
and I had immediate success. It took about 1 minute and corrected the issues and then released the space that had been grabbed internally.
Group Group Name State Type Total GB Free GB ---------- ------------------------- --------------- ------- ---------- ------- 1 DATA MOUNTED EXTERN 2581 109 2 FRA MOUNTED EXTERN 1602 1,562 Group Disk Header Mode State Redundancy Total MB Free MB Disk Name Failure Gr ----- ---- --------- -------- --------------- ---------- ---------- ---------- ------------------------------ ---------- 2 1 MEMBER ONLINE NORMAL UNKNOWN 91138 88855 FRA_0001 FRA_0001 2 2 MEMBER ONLINE NORMAL UNKNOWN 91138 88855 FRA_0002 FRA_0002 2 4 MEMBER ONLINE NORMAL UNKNOWN 91138 88855 FRA_0004 FRA_0004 2 5 MEMBER ONLINE NORMAL UNKNOWN 91138 88855 FRA_0005 FRA_0005 2 6 MEMBER ONLINE NORMAL UNKNOWN 91138 88858 FRA_0006 FRA_0006 2 7 MEMBER ONLINE NORMAL UNKNOWN 91138 88855 FRA_0007 FRA_0007 2 8 MEMBER ONLINE NORMAL UNKNOWN 91138 88854 FRA_0008 FRA_0008 2 9 MEMBER ONLINE NORMAL UNKNOWN 91138 88858 FRA_0009 FRA_0009 2 10 MEMBER ONLINE NORMAL UNKNOWN 91138 88859 FRA_0010 FRA_0010 2 11 MEMBER ONLINE NORMAL UNKNOWN 91138 88859 FRA_0011 FRA_0011 2 12 MEMBER ONLINE NORMAL UNKNOWN 91138 88859 FRA_0012 FRA_0012 2 13 MEMBER ONLINE NORMAL UNKNOWN 91138 88860 FRA_0013 FRA_0013 2 14 MEMBER ONLINE NORMAL UNKNOWN 91138 88860 FRA_0014 FRA_0014 2 15 MEMBER ONLINE NORMAL UNKNOWN 91138 88859 FRA_0015 FRA_0015 2 16 MEMBER ONLINE NORMAL UNKNOWN 91138 88859 FRA_0016 FRA_0016 2 17 MEMBER ONLINE NORMAL UNKNOWN 91138 88858 FRA_0017 FRA_0017 2 18 MEMBER ONLINE NORMAL UNKNOWN 91138 88860 FRA_0018 FRA_0018 2 19 MEMBER ONLINE NORMAL UNKNOWN 91138 88858 FRA_0019 FRA_0019 SQL> alter diskgroup fra check all repair NOTE: starting check of diskgroup FRA kfdp_checkDsk(): 51 WARNING: disk FRA_0001, changing DD used AUs from 4294965892 to 2283 kfdp_checkDsk(): 52 WARNING: disk FRA_0002, changing DD used AUs from 4294966012 to 2283 kfdp_checkDsk(): 53 WARNING: disk FRA_0004, changing DD used AUs from 4294966210 to 2283 kfdp_checkDsk(): 54 WARNING: disk FRA_0005, changing DD used AUs from 4294966625 to 2283 kfdp_checkDsk(): 55 WARNING: disk FRA_0006, changing DD used AUs from 4294965615 to 2280 kfdp_checkDsk(): 56 WARNING: disk FRA_0007, changing DD used AUs from 4294964292 to 2283 kfdp_checkDsk(): 57 WARNING: disk FRA_0008, changing DD used AUs from 4294967237 to 2284 kfdp_checkDsk(): 58 WARNING: disk FRA_0009, changing DD used AUs from 4294965595 to 2280 kfdp_checkDsk(): 59 WARNING: disk FRA_0010, changing DD used AUs from 4294966758 to 2279 kfdp_checkDsk(): 60 WARNING: disk FRA_0011, changing DD used AUs from 4294966324 to 2279 kfdp_checkDsk(): 61 WARNING: disk FRA_0012, changing DD used AUs from 4294966743 to 2279 kfdp_checkDsk(): 62 WARNING: disk FRA_0013, changing DD used AUs from 4294964514 to 2278 kfdp_checkDsk(): 63 WARNING: disk FRA_0014, changing DD used AUs from 4294965734 to 2278 kfdp_checkDsk(): 64 WARNING: disk FRA_0015, changing DD used AUs from 4294965204 to 2279 kfdp_checkDsk(): 65 WARNING: disk FRA_0016, changing DD used AUs from 4294965599 to 2279 kfdp_checkDsk(): 66 WARNING: disk FRA_0017, changing DD used AUs from 4294964950 to 2280 kfdp_checkDsk(): 67 WARNING: disk FRA_0018, changing DD used AUs from 25896 to 2278 kfdp_checkDsk(): 68 WARNING: disk FRA_0019, changing DD used AUs from 26026 to 2280 SUCCESS: check of diskgroup FRA found no errors WARNING: deprecated use of ALTER DISKGROUP CHECK arguments SUCCESS: alter diskgroup fra check all repair Thu Jul 30 10:31:33 2009
I would have expected the gui to have reported errors but it didn't. Equally I can see no good reason for not fixing errors it identified when a check is run. That was the default in 10G so there must be a good reason why it was changed. Perhaps it had a performance impact which I would not have noticed on a pretty empty diskgroup. The other reason is that it might make the assumption that the DBA would want a good backup before repairing.
As we had nothing in there and we backup ASM metadata every day then I was happy to go ahead. http://jhdba.wordpress.com/2009/06/11/script-to-backup-asm-metadata/
So the lessons learned are
- Use the check all feature
- Check the alert log afterwards
- Then use the check all repair statement from command line
rahul said
Hi,
Alas. Wonderfull links sharing.
Andy said
Life Saving article, thanks for sharing it.
thanks! said
thanks!