Nova falling back to single path when multipath device already exists
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
os-brick |
New
|
Undecided
|
Unassigned |
Bug Description
Our environment is Ubuntu 16.04.
OpenStack 16.0.5.
Nova is configured to use multipath and we're trying to start an instance with attached volumes that was previously stopped.
Before starting the instance, things on the compute node look as follows:
* iSCSI session is still up from the previous run of the instance
* Device Mapper dm-X multipath device remains in /dev/
* multipath -ll shows that all paths for dm-X are active and running
* There are links in /dev/disk/by-id (scsi-* and wwn-*) that point to /dev/dm-X
* However, no links by WWN to the underlying block devices (sd*) exist in /dev/disk/by-id (because of udev rules, see below)
While starting the instance, nova performs iSCSI discovery and obtains a list of 4 (ip, iqn, lun) tuples that it should use for constructing a multipath device.
4 iSCSI sessions are started, and the LUNs appear as block devices in Linux - let's call them /dev/sda, /dev/sdb, /dev/sdc and /dev/sdd.
Then, apparently, os-brick tries to find the WWN of any of our devices by globbing for "/dev/disk/
The above fails (yields no result), because no links from /dev/disk/by-id point to our devices anymore - udev replaces those with links to /dev/dm-* devices because of udev rules.
The lack of a WWN causes os-brick's multipath logic to be skipped completely (the multipath branch is only entered if at least one device's WWN can be found).
In nova-compute.log, this manifests itself as this particular log entry: "No dm was created, connection to volume is probably bad and will perform poorly."
(In fact, of course, the dm device is still there, only its parts are not represented by proper links in /dev/disk/by-id/.)
All of the above causes os-brick to attach our volume in a "degraded" mode - in virsh, it can be observed that only one of the underlying "single-path" devices, such as /dev/sda, is used as the volume backing store, instead of using the multipath device.
Note that:
* The WWN discovery succeeds if this is the first iSCSI connection to these LUNs, because there is no multipath device yet and so there are still links from /dev/disk/
* Some other links (in the form of /dev/disk/
In short, this bug causes silent multipath failures by converting them into single-path volume attachments due to unfounded reliance of os-brick on udev links.
I created the patch. This is temporary solution, but it's works.
--- /openstack/ venvs/nova- 16.0.8/ lib/python2. 7/site- packages/ os_brick/ initiator/ connectors/ iscsi.py 2018-03-12 18:02:48.973477364 +0100 venvs/nova- 16.0.8/ lib/python2. 7/site- packages/ os_brick/ initiator/ connectors/ iscsi.py. tomcsi 2018-03-12 18:04:44.133474261 +0100
data['failed_ logins' ])): .get_sysfs_ wwn(found) .get_sysfs_ wwn(self. _linuxscsi. find_sysfs_ multipath_ dm(found) ) .get_sysfs_ wwn(found)
mpath = self._linuxscsi .find_sysfs_ multipath_ dm(found)
+++ /openstack/
@@ -705,7 +705,10 @@
# We have devices but we don't know the wwn yet
if not wwn and found:
- wwn = self._linuxscsi
+ if not mpath:
+ wwn = self._linuxscsi
+ else:
+ wwn = self._linuxscsi
# We have the wwn but not a multipath
if wwn and not mpath: