New ASM power levels in 11.2.0.2 and beyond

Posted by John Hallas on February 13, 2015

I recently saw the following command in a script that was to be run and thought an error had been made and the power level should have been 5 not 500.

ALTER DISKGROUP DATA REBALANCE POWER 500;

Upon doing some research it was not a mistype but a new method of disk balancing which came in from 11.2.0.2

Previously setting the power limit from 0 to 11 basically caused an additional number of ARBx process to be created to match the power level and these were removed once the rebalance had finished.

That was a nice simple situation which I had no problem with. The range was adequate and I normally used between 4 and 7 depending on the usage of the system and any performance impact that might be caused. The impact was easy to monitor using a variety of tools as top or glance on a *nix platform

Now from 11GR2 onwards and when a database has disk group ASM compatibility set to 11.2.0.2 or greater the operational range of values is 0 to 1024 for the rebalance power. Note that if the value of the POWER clause is specified larger than 11 for a disk group with ASM compatibility set to less than 11.2.0.2, then a warning is displayed and a POWER value equal to 11 is used for rebalancing. Second point to note is that if a disk group is altered to a higher RDBMS value this operation cannot be reversed.

So what does that mean in practise? Well in my eyes it seems to be a basic change but it now seems very hard, well-nigh impossible to see the impact that the re-balance is having on the server and consequently I do not see the advantages of it other than possibly on massively high-end systems.

Firstly how did you know pre 11.2.0.2 what the power level was

Look in V$asm_operation view for the power value
Do a ps –ef|grep –i ARB |wc –l and see how many processes were running
Look in the asm trace log

However finding the power level in use is not that important, it is monitoring what it was doing that was key. So I would use Glance or top and see if ARB processes were key consumers of resource. I would look at disk and HBA throughput and take a view on how busy the system was and if it looked like it might be affecting performance I would drop the power level.

From 11GR2 onwards when a rebalance operation is started the RBAL process will spawn only one ARB process. This ARB process uses OS level parallel threads. The number of threads corresponds to the value of the power limit setting. The thread allocated can be observed in the ASM alert log as in the example below: (using a power level of 5)

Tue Dec 23 14:17:48 2014

ARB0 started with pid=30, OS id=22462

NOTE: assigning ARB0 to group 3/0xa1488c8 (TSTDATA) with 5 parallel I/Os

This is a section from the ASM alert log after using a rebalance power of 40

NOTE: starting rebalance of group 1/0xc04488bb (DATA) at power 40

Starting background process ARB0

Mon Feb 02 11:59:24 2015

WARNING: failed to retrieve ASM password file location (unable to communicate with CRSD/OHASD)

Mon Feb 02 11:59:24 2015

ARB0 started with pid=27, OS id=22535

Mon Feb 02 11:59:25 2015

NOTE: Attempting voting file refresh on diskgroup DATA

NOTE: assigning ARB0 to group 1/0xc04488bb (DATA) with 40 parallel I/Os

Mon Feb 02 12:02:41 2015

Mon Feb 02 12:04:23 2015

NOTE: ASMB process exiting due to lack of ASM file activity for 301 seconds

NOTE: ASMB clearing idle groups before exit

Mon Feb 02 12:04:24 2015

Mon Feb 02 12:04:24 2015

NOTE: client +ASM:+ASM deregistered

Mon Feb 02 12:06:05 2015

NOTE: requesting all-instance membership refresh for group=1

Mon Feb 02 12:06:05 2015

GMON updating for reconfiguration, group 1 at 895 for pid 21, osid 23281

Mon Feb 02 12:06:05 2015

NOTE: group 1 PST updated.

SUCCESS: grp 1 disk DATA_0012 emptied

NOTE: erasing header (replicated) on grp 1 disk DATA_0012

NOTE: erasing header on grp 1 disk DATA_0012

NOTE: process _x000_+asm (23281) initiating offline of disk 12.1503951013 (DATA_0012) with mask 0x7e in group 1 (DATA) without client assisting

NOTE: initiating PST update: grp 1 (DATA), dsk = 12/0x59a478a5, mask = 0x6a, op = clear

Mon Feb 02 12:06:06 2015

GMON updating disk modes for group 1 at 896 for pid 21, osid 23281

Mon Feb 02 12:06:06 2015

NOTE: PST update grp = 1 completed successfully

NOTE: initiating PST update: grp 1 (DATA), dsk = 12/0x59a478a5, mask = 0x7e, op = clear

Mon Feb 02 12:06:06 2015

GMON updating disk modes for group 1 at 897 for pid 21, osid 23281

Mon Feb 02 12:06:06 2015

NOTE: cache closing disk 12 of grp 1: DATA_0012

Mon Feb 02 12:06:06 2015

NOTE: PST update grp = 1 completed successfully

Mon Feb 02 12:06:08 2015

GMON updating for reconfiguration, group 1 at 898 for pid 21, osid 23281

Mon Feb 02 12:06:08 2015

NOTE: cache closing disk 12 of grp 1: (not open) DATA_0012

Mon Feb 02 12:06:08 2015

NOTE: group 1 PST updated.

Mon Feb 02 12:06:09 2015

NOTE: membership refresh pending for group 1/0xc04488bb (DATA)

Mon Feb 02 12:06:09 2015

GMON querying group 1 at 899 for pid 14, osid 7263

GMON querying group 1 at 900 for pid 14, osid 7263

Mon Feb 02 12:06:09 2015

NOTE: Disk DATA_0012 in mode 0x0 marked for de-assignment

SUCCESS: refreshed membership for 1/0xc04488bb (DATA)

NOTE: Attempting voting file refresh on diskgroup DATA

Mon Feb 02 12:06:43 2015

NOTE: stopping process ARB0

Mon Feb 02 12:06:45 2015

What interested me was how to decide which power level to use and how to determine the impact and difference between each of them. Some sample numbers are below

Power level	Minutes
1	175
5	27
11	15
20	14
40	13
400	12

If I can find a suitable system I might do some more testing in this area but what I was really concerned with was identifying whether the overall system impact was affected and by how much. I could not determine any metrics to show me that other than the HBA/disk activity was very busy but there did not seem to be an awful lot of variance whichever power levels I plugged in. I asked our Unix team to investigate, using both HPUX and OEL servers and again they could not find anything that shows to them what was happening under the cover and what the line” with xx parallel I/Os” actually meant.

I finally raised a call with Oracle and kept on asking the question to be finally told that “there is no way of identifying how many parallel I/O threads are in operation or what the impact is”.

I must admit I find this totally unsatisfactory. I cannot see how I could consider using a very high power level – anything more than ~20 on a production or other important system where a performance impact might be noticed and I think I will be tempted to stick to around 10 for future use. I along with many other DBAs do not have detailed access to storage metrics and I generally do not know how may disks have been mapped to a lun or what type of striping is on then – that is the domain of the SAN team and therefore given the lack of information available about the new power levels I cannot really see what benefits the new much higher limits provide.

I am sure this is a change for the positive but I do think it could be have been documented much better than it has been.

This entry was posted on February 13, 2015 at 9:30 am and is filed under 11g new features, ASM, Oracle. Tagged: ASM, parallel I/Os, rebance power. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “New ASM power levels in 11.2.0.2 and beyond”

Emre Baransel said

February 16, 2015 at 1:30 pm
i totally agree with you. New power limits are mystery, i’m still using the old values.

Reply
- John Hallas said
  
  February 17, 2015 at 12:13 pm
  Thanks Emre, I appreciate the comment, especially coming from you who knows a lot about ASM
  John
  
  Reply
jkstill said

March 19, 2018 at 9:56 pm
Hi John,

Did you use ps -m or -H to see threads at the OS level?

Reply
- John Hallas said
  
  April 19, 2018 at 8:00 am
  Probably not Jared. I will go back and see if I can still see the issue and try those commands
  
  Reply

Oracle DBA – A lifelong learning experience

Meta

Categories

Blog Stats

Email Subscription

Top Posts