Multipathd hangs with long iscsi target names in Ubuntu 18.04
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
multipath-tools (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Eoan |
Won't Fix
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
* Users of some storage devices (for instance NetApp E-Series RDAC) due
to very long names increasing the VDP page and and cause multipathd to
hang
* Introduced as side effect of other changes in 7.4 and fixed later this
is backporting the HW exploitation to read an enhanced VDP in this case
which in such cases works fine again.
[Test Case]
* #1 Case: Test on affected hardware. The NetApp E-Series RDAC is an
example, the reporter has such HW and is able to do so.
* #2 regression tests: since this touches multipath discovery any such
cases would be interesting tests. From installers using multipath on
mainframes to multipath-iscsi disks on an raspi. The Range is huge and
unclear how much exactly should be done. I can myself do said mainframe
checks and utilize the pkgtests to run through iscsid.
In addition we should keep this one a bit longer in -proposed IMHO.
[Regression Potential]
* All cases asking for vpd 0x83 should be unchanged and only those asking
for an 0xc9 should get the new improved behavior. But I'm not a 100%
sure on potential side effects yet. Changes and thereby regressions
would be limited to the device discovery of multipathd, so that is what
one needs to take an extra eye on e.g. pre/post list of multipath
devices (and attributes as those are populated via vpd).
[Other Info]
* n/a
---
I've notice recently that multipathd started hanging when trying to handle long iscsi target names. In my case these targets are generated by the OpenStack cinder driver for SolidFire.
My guess is that the problem is occurring due to a bug introduced in multipath-tools 0.7.4, by commit "limbultipath: prefer RDAC checker with detect_
This change adds a routine to check RDAC compatibility during the detect_checker procedure, which runs into an infinite loop in the piece of code that tries to gather SCSI VPD pages larger than 254 bytes (this is my case due to long iSCSI Target names).
This issue was later addressed by several commits in 0.7.5 and 0.7.6: "libmultipath: sgio_get_vpd: add page argument" [2], "libmultipath: Fix sgio_get_vpd()" [3], "libmultipath: fix return code of sgio_get_vpd()" [4], "libmultipath: get_vpd_sgio: support VPD 0xc9" [5].
I've found a temporary work around for this by using select_checker = “no” in Ubuntu 18. This appears to avoid the problem by skipping RDAC compatibility check code.
Also, I've also tested 0.8.3 in a RH environment and found no issues running default select_checker = “yes” (Ubuntu 20 also delivers 0.8.3).
So, in short:
- Looks like Ubuntu multipath-tools is broken,
- There is a fix in newer versions that aren’t in the ubuntu 18 repo yet.
- The OpenStack Cinder driver for Solidfire just happen to hit this bug because of the long names
- Setting select_checker = “no” is an possible work-around until Ubuntu publishes 0.7.6
Bellow "multipath -r -v3" output from my env:
:/$ cat /etc/multipath.conf
devices {
device {
vendor "SolidFir" product "SSD SAN" detect_checker "no"
}
}
:/$ sudo multipath -r -v3 Jul 09 19:06:24 | loading //lib/multipath
Related branches
- Bryce Harrington (community): Needs Information
- Canonical Server: Pending requested
- Canonical Server packageset reviewers: Pending requested
-
Diff: 223 lines (+183/-0)6 files modifieddebian/changelog (+7/-0)
debian/patches/lp-1891202-libmultipath-Fix-sgio_get_vpd.patch (+55/-0)
debian/patches/lp-1891202-libmultipath-fix-return-code-of-sgio_get_vpd.patch (+32/-0)
debian/patches/lp-1891202-libmultipath-get_vpd_sgio-support-VPD-0xc9.patch (+35/-0)
debian/patches/lp-1891202-libmultipath-sgio_get_vpd-add-page-argument.patch (+50/-0)
debian/patches/series (+4/-0)
Changed in multipath-tools (Ubuntu): | |
assignee: | Rafael David Tinoco (rafaeldtinoco) → nobody |
description: | updated |
tags: | removed: server-next |
tags: | added: verification-done |
Thanks a lot for reporting this Fernando. Let me take a deeper look in what you provided and will get back to you soon.