Oracle DBA – A lifelong learning experience

Posts Tagged ‘external redundancy’

ASM disks – lsdg compared with the v$asm_diskgroup view

Posted by John Hallas on July 8, 2016

What is the difference between the summaries of disk space on these two systems? Look at the free_usable_file_MB column and free space

System 1

ASMCMD [+] > lsdg
State    Type    Rebal  Sector  Block       AU   Total_MB   Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304   72769536   2031032           433152          798940              0             N  DATA_PEX0/
MOUNTED  NORMAL  N         512   4096  4194304  128286720  20004028          2672640         8665694              0             N  DATA_PEX1/
         Group                 Diskgroup     Total  Req'd     Free   Percent Disk Size Percent Percent  Disk
     Group Name       State      Redundancy       GB     GB       GB Imbalance  Variance    Free    Free Count
---------- ---------- ---------- ---------- -------- ------ -------- --------- --------- ------- ------- -----
         1 DATA_PEX0  MOUNTED    NORMAL       71,064    423    1,560       1.4        .0     2.2     3.5   168
         2 DATA_PEX1  MOUNTED    NORMAL      125,280  2,610   16,925        .1        .0    15.6    15.6    48

System 2

ASMCMD [+] > lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   2621412   595744                0          595744              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576   1048568   700029                0          700029              0             N  FRA/

                                                     Mirror                      Percent Minimum Maximum
           Group                 Diskgroup     Total  Req'd     Free   Percent Disk Size Percent Percent  Disk
     Group Name       State      Redundancy       GB     GB       GB Imbalance  Variance    Free    Free Count
---------- ---------- ---------- ---------- -------- ------ -------- --------- --------- ------- ------- -----
         1 DATA       MOUNTED    EXTERN        2,560      0      582        .2        .0    22.7    22.8    20
         2 FRA        MOUNTED    EXTERN        1,024      0      684        .0        .0    66.8    66.8     8

System 1 is an Exadata stack with NORMAL redundancy whereas System 2 uses EXTERNAL redundancy. All of our systems bar Exadata are configured with dynamic multi-pathing with external redundancy to ensure high availability. – we allow the SAN to manage redundancy

In system 1 when using the second query to interrogate the diskgroups it would appear that we have 1.5Tb of free space in DATA_PEX0  and yet the lsdg command indicates we only have 800Gb free. Quite a significant difference when we have a weekly growth rate of ~250Gb in that diskgroup

SELECT g.group_number  "Group" ,      g.name          "Group Name" ,      g.state         "State" ,      g.type          "Type" ,      g.total_mb/1024 "Total GB" ,      g.free_mb/1024  "Free GB" ,      100*(max((d.total_mb-d.free_mb)/d.total_mb)-min((d.total_mb-d.free_mb)/d.total_mb))/max((d.total_mb-d.free_mb)/d.total_mb) "Imbalance" ,      100*(max(d.total_mb)-min(d.total_mb))/max(d.total_mb) "Variance" ,      100*(min(d.free_mb/d.total_mb)) "MinFree" ,      100*(max(d.free_mb/d.total_mb)) "MaxFree" ,      count(*)        "DiskCnt" FROM v$asm_disk d, v$asm_diskgroup g WHERE d.group_number = g.group_number and d.group_number <> 0 and d.state = 'NORMAL' and d.mount_status = 'CACHED' GROUP BY g.group_number, g.name, g.state, g.type, g.total_mb, g.free_mb ORDER BY 1;

PS the code comes from a very good ASM script from this very blog

This blog by Harald van Breederode explains in a much better way than I could why the mirroring in normal redundancy uses additional space .

As we are quite space challenged on the Exadata storage at the moment I have been asked several times to explain the different values of free space that are being reported – now I can just point them to this blog entry which will re-direct them to Harald’s very good explanation.

Job done

 

 

 

 

Posted in ASM, Blogroll, Oracle | Tagged: , , , , | 6 Comments »

Much faster performance with ASM – a real world example

Posted by John Hallas on October 9, 2009

We have a  Peoplesoft database that  manages HR for ~80,000 staff.  Oracle 10.2.03.  Peoplesoft Tools 8.49 PS HR 9 HPUX 11.31 Itanium

There has been long running problems caused by I/O bottlenecks writing down to Hitachi XP SAN disks. We were using filesystem storage and disk_asynch_io was set to false

The root cause of our problem was that the Time and Labour batches contend when they need to acquire exclusive access to write dirty blocks to data files during truncate operations. 

David Kurtz has written about a specific Peoplesoft feature around truncate http://www.go-faster.co.uk/docs.htm#local_write_wait

 We migrated to using ASM disk and achieved significant benefits.

 The testing stages were

1)       Prepare a 2 hour Time and labour batch run which would generate significant throughput

2)       Capture a baseline at 10.2.0.3

3)       Upgrade to 10.2.0.4 and retest

4)       Upgrade to 10.2.0.4 and migrate to ASM (11.1.0.7) and retest

5)       Upgrade to 11.1.0.7 and migrate to ASM (11.1.0.7) and retest

 

Test 3 did not show any difference at all. We were still blocked on disk, which was expected. We probably would have remained at 10.2.0.3 but gone to ASM but there are problems with seeing ASM disk when using 10.2.0.3 (Metalink Note 434500, ORA-15059) so we had the option of applying a fix to get ASM working with 10.2.0.3 or upgrading to 10.2.0.4. We decided that the benefits of getting onto a terminal release and suggestions from Oracle that small changes in the truncate process had been made between 10.2.0.3 and 10.2.0.4 out-weighed the overhead of any application testing that we needed to do.

Test 4 gave us big performance benefits. Test 5 was not as clear cut but we knew some code had to be optimised and we did not have the luxury of time to address the issues. Remaining on 10g minimised the application testing, certainly compared to the testing require to upgrade to 11g.

The benefits can be seen quite clearly in the graph below

asynch_io

This shows a 2 hour period where a constant throughput of T&L jobs were being processed. In parallel a constant online load was being simulated

The black line shows throughput and elapsed time. The run consists of a number of jobs which take 2 hours to complete and batch time is around the 450 – 600 second mark

The same processing is then run with 10.2.0.4 database and ASM. disk_aysnch_io is set to FALSE. The jobs complete 40 minutes faster and elapsed time is around the 300-400 second mark but showing spikes and inconsistent performance

Finally the orange line shows 10.2.0.4 database and ASM. disk_aysnch_io =TRUE. The processing time has now been reduced to just over an hour and the elapsed time is significantly better at an average of 250 seconds and with a very consistent profile.

As with all things in life we have now moved onto the next bottleneck which is the wait flashback buf free caused because we have enabled flashback logging and as the redo log buffer fills up we wait for flashback to be written out. However this is a miniscule issue compared to the original problems.  We have taken flashback off some tablespaces primarily used for PS temporary tables (aka working storage tables) although that does give us an issue when when rebuilding the primary database after a DataGuard failover using flashback logs

I hope this real world example is of use

Posted in ASM, Oracle | Tagged: , , , , , , , , | 1 Comment »