We started using Power Path 5.7 and then comes something new which we were not aware or prepared for.
Well first point is we only asked all the server team to upgrade the whole Power path environment to 5.7 as we need to move the servers to VPLEX and to latest PP version. One way it is always good to be on the most latest versions to get proper vendor support. So upgrades done mission accomplished. But then comes the problem, on many AIX servers some on the device paths started showing asb:iopf instead of active or dead. This was something new for us. In this scenario first thing we could do is open case with EMC as we could not find much help from our best helper GOOGLE.
How it looks:
Logical device ID=6006016014D0350060447AA8AEB6E411 [LUN 364]
state=alive; policy=CLAROpt; queued-IOs=0
Owner: default=SP A, current=SP A Array failover mode: 4
=====================================================================————— Host ————— – Stor – — I/O Path — — Stats —
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
===================================================================== 3 fscsi3 hdisk29 SP B7 asb:iopf alive 0 3
3 fscsi3 hdisk28 SP A7 active alive 0 1
2 fscsi2 hdisk27 SP B7 active alive 0 2
2 fscsi2 hdisk26 SP A7 asb:iopf alive 0 4
1 fscsi1 hdisk25 SP B6 asb:iopf alive 0 5
1 fscsi1 hdisk24 SP A6 asb:iopf alive 0 7
0 fscsi0 hdisk23 SP B6 active alive 0 2
0 fscsi0 hdisk22 SP A6 asb:iopf alive 0 6
Later after multiple days of discussion with EMC support we got to know that this is new feature introduced. We have got multiple explanations and as explained by EMC this is one beneficial features from power path to avoid performance or path loss issues.
Below are the summarizes benefits.
1: The purpose of the feature is to act as an “early warning system” and highlight the possibility of a connectivity issue between the host and array. If there is 1 i/o failure for every 6000 successful i/o’ delivered, then the individual path goes into the asb:iopf mode.
2: This feature will provide users time to troubleshoot the connectivity problem before it gets to a point where underlying database/application processes are impacted performance wise.
3: Without this feature, when a connectivity issue occurred, inflight i/o being sent down the “bad” path at the time of the i/o failure would need to be backed out, and redirected down the alternative paths. During this backing-out phase, applications may appear to freeze depending on the volume of i/o that needs to be backed out.
4: The bad path will stay in the asb:iopf mode for a default period of 7 days. Again this can be adjusted to suit your own preference.
5: With this feature when there is issue in the path not complete device path will be offline it will be just few paths when we have the actual issue will be in standby mode. while the path is not considered dead, it is put into a standby mode and i/o is not sent down this path unless there are no other valid paths left.
Now this feature actually do not work the way it is explained that is what we have found while troubleshooting.
1: Its was mentioned that this will happen when there is issue in the network path but in our case we have done complete analysis of the path from Server to switch to array and there was no issue at all. Even EMC tech have confirmed about the clean path. Still we have seen some paths in ASB mode.
2: We have found that some one the devices which are just lying idle on the server were showing ASB paths. When we start using he idle device the ASB mode of the path became active. This contradict with the EMC explanation.
3: The issue of paths going ASB when there was no I/O and possibly this was due to TUR [Test-Unit-Ready] test failing when its initiated by power path to check a path but there is not documentation to certify this fact.
4: Talking about the paths, if one device has 8 paths this feature will make 1 or 2 path out of 8 paths from the LUN in ASP mode. Which do not add up technically. If there is path connectivity issue or even fluctuation we should see once complete set of path either offline or in ASB which is not happening.
5: EMC says that this is command feature across all platforms but we have seen this only on AIX server.
6: The 7 day refresh of the ASB is not seems to be a resolution as we are talking about path error and if there is path error refreshing after 7 days is just hiding the problem as the ASB mode will disappear.
Now every new features has pros and cons same with Power Path ASB mode. Only problem is we could not find much documentation for this and at the same time vendor is not able to explain all the point in detail which can give comfort feeling to SAN and server admins as well as client. So at the end we have made decision to disable the feature.
Command to disable this feature.
Powermt set autostanby=off trigger=iopf
However this command will be persistent across reboot and is non-disruptive. This would not make any other changes on the host , however the failing IOPS would still be able to use that path instead of marking it at auto-standby.
We are still looking for more explanations to understand this new feature more better way.