Nova iSCSI volume attach fails in cDOT failover

Bug #1437419 reported by Tom Barron
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Tom Barron
Juno
Fix Released
Undecided
Unassigned

Bug Description

Even with multipathing properly configured on the Nova node and libvirt.iscsi_use_multipath set to True in nova.conf,
when controller A is taken over by controller B in a NetApp cDOT cluster, 'nova volume-attach <instance-id> <volume-id>'
errors out. Examination of the nova cpu logs in this circumstance shows that iscsiadm discovery is being run against
a portal whose IP belongs to the LIF on the downed controller:

2015-03-23 16:41:40.591 ERROR nova.virt.block_device [req-7eb90c08-93a8-4713-88e6-5931798597e1 admin demo] [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] Driver failed to attac
h volume f792cc36-5f49-488d-ae77-79fe25012ac2 at /dev/vdc
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] Traceback (most recent call last):
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] File "/opt/stack/nova/nova/virt/block_device.py", line 251, in attach
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] device_type=self['device_type'], encryption=encryption)
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1036, in attach_volume
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] self._connect_volume(connection_info, disk_info)
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 987, in _connect_volume
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] driver.connect_volume(connection_info, disk_info)
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 445, in inner
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] return f(*args, **kwargs)
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] File "/opt/stack/nova/nova/virt/libvirt/volume.py", line 403, in connect_volume
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] out = self._run_iscsiadm_discover(iscsi_properties)
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] File "/opt/stack/nova/nova/virt/libvirt/volume.py", line 539, in _run_iscsiadm_discover
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] check_exit_code=[0, 255])[0] or ""
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] File "/opt/stack/nova/nova/virt/libvirt/volume.py", line 809, in _run_iscsiadm_bare
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] check_exit_code=check_exit_code)
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] File "/opt/stack/nova/nova/utils.py", line 206, in execute
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] return processutils.execute(*cmd, **kwargs)
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 233, in execute
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] cmd=sanitized_cmd)
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] ProcessExecutionError: Unexpected error while running command.
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] Command: sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m discovery -t sendtargets -p 172.20.124.43:3260
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] Exit code: 4
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] Stdout: u''
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024] Stderr: u'iscsiadm: cannot make connection to 172.20.124.43: No route to host\niscsiadm: cannot make connection to 172.20.124.43: No route to host\niscsiadm: cannot make connection to 172.20.124.43: No route to host\niscsiadm: cannot make connection to 172.20.124.43: No route to host\niscsiadm: cannot make connection to 172.20.124.43: No route to host\niscsiadm: cannot make connection to 172.20.124.43: No route to host\niscsiadm: connection login retries (reopen_max) 5 exceeded\niscsiadm: Could not perform SendTargets discovery: encountered connection failure\n'
2015-03-23 16:41:40.591 TRACE nova.virt.block_device [instance: 1546a98d-6da6-4b04-b0b6-9ead3a511024]

Tom Barron (tpb)
Changed in cinder:
assignee: nobody → Tom Barron (tpb)
tags: added: drivers
tags: added: netapp
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/169812

Changed in cinder:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/169812
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=9d77c0e1435d377790eaf812691b29e8b7a6b2c7
Submitter: Jenkins
Branch: master

commit 9d77c0e1435d377790eaf812691b29e8b7a6b2c7
Author: Tom Barron <email address hidden>
Date: Wed Mar 25 11:28:08 2015 -0400

    Only use operational LIFs for iscsi target details

    When Nova invokes our cDOT driver's initialize_connection() method
    in order to attach an iSCSI volume to an instance, we have been
    returning the first target from the list of targets returned from
    the filer API call to get iscsi target details. In failover mode,
    this may be a target with a LIF that is down, causing the attach
    attempt to fail in Nova.

    This commit fixes that issue so that attaches still work in
    failover by returning the first target with an *operational* LIF.

    Closes-bug: 1437419

    Change-Id: I3f53c3644f7d94f5188151c16d70e565b745ad4a

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → kilo-rc1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/172172

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/juno)

Reviewed: https://review.openstack.org/172172
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=78dc8fa9b2e60be1f901a50249d8e9b6274e3fa4
Submitter: Jenkins
Branch: stable/juno

commit 78dc8fa9b2e60be1f901a50249d8e9b6274e3fa4
Author: Tom Barron <email address hidden>
Date: Wed Mar 25 11:28:08 2015 -0400

    Only use operational LIFs for iscsi target details

    When Nova invokes our cDOT driver's initialize_connection() method
    in order to attach an iSCSI volume to an instance, we have been
    returning the first target from the list of targets returned from
    the filer API call to get iscsi target details. In failover mode,
    this may be a target with a LIF that is down, causing the attach
    attempt to fail in Nova.

    This commit fixes that issue so that attaches still work in
    failover by returning the first target with an *operational* LIF.

    Code here is a literal copy of code merged into Kilo but because
    our Cinder drivers were reorganized and refactored in Kilo the
    code appears in different files.

    Closes-bug: 1437419

    Change-Id: I3f53c3644f7d94f5188151c16d70e565b745ad4a
    (cherry picked from commit 9d77c0e1435d377790eaf812691b29e8b7a6b2c7)

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in cinder:
milestone: kilo-rc1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.