live-migration of instances fail with Cinder volume by LVM backend and TgtAdm

Bug #1383509 reported by Tomoki Sekiyama
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
Tomoki Sekiyama
Juno
Fix Released
Undecided
Unassigned

Bug Description

With LVM driver backend configured as 'iscsi_helper = tgtadm', live migration of nova instances fail if cinder volumes are attached.

On the compute node which the instance migrates to, nova-compute fails to login to the attached iSCSI volume due to CHAP authentication failure. Because of this, the live-migration process is aborted.

At that time, nova-compute outputs the log below:

2014-10-20 18:20:39.134 DEBUG nova.openstack.common.lockutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Created new semaphore "connect_volume" from (pid=20313) internal_lock /opt/stack/nova/nova/openstack/common/lockutils.py:206
2014-10-20 18:20:39.134 DEBUG nova.openstack.common.lockutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Acquired semaphore "connect_volume" from (pid=20313) lock /opt/stack/nova/nova/openstack/common/lockutils.py:229
2014-10-20 18:20:39.134 DEBUG nova.openstack.common.lockutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Got semaphore / lock "connect_volume" from (pid=20313) inner /opt/stack/nova/nova/openstack/common/lockutils.py:271
2014-10-20 18:20:39.134 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806 -p control:3260 from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:158
2014-10-20 18:20:39.173 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Result was 21 from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:192
2014-10-20 18:20:39.174 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806 -p control:3260 --op new from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:158
2014-10-20 18:20:39.214 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Result was 0 from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:192
2014-10-20 18:20:39.214 DEBUG nova.virt.libvirt.volume [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] iscsiadm ('--op', 'new'): stdout=New iSCSI node [tcp:[hw=,ip=,net_if=,iscsi_if=default] control,3260,-1 iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806] added stderr= from (pid=20313) _run_iscsiadm /opt/stack/nova/nova/virt/libvirt/volume.py:242
2014-10-20 18:20:39.214 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806 -p control:3260 --op update -n node.session.auth.authmethod -v CHAP from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:158
2014-10-20 18:20:39.256 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Result was 0 from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:192
2014-10-20 18:20:39.256 DEBUG nova.virt.libvirt.volume [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] iscsiadm ('--op', 'update', '-n', 'node.session.auth.authmethod', '-v', u'CHAP'): stdout= stderr= from (pid=20313) _run_iscsiadm /opt/stack/nova/nova/virt/libvirt/volume.py:242
2014-10-20 18:20:39.256 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806 -p control:3260 --op update -n node.session.auth.username -v SrV6Fksds497Rteg97kc from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:158
2014-10-20 18:20:39.297 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Result was 0 from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:192
2014-10-20 18:20:39.297 DEBUG nova.virt.libvirt.volume [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] iscsiadm ('--op', 'update', '-n', 'node.session.auth.username', '-v', u'SrV6Fksds497Rteg97kc'): stdout= stderr= from (pid=20313) _run_iscsiadm /opt/stack/nova/nova/virt/libvirt/volume.py:242
2014-10-20 18:20:39.298 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806 -p control:3260 --op update -n node.session.auth.password -v *** from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:158
2014-10-20 18:20:39.339 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Result was 0 from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:192
2014-10-20 18:20:39.340 DEBUG nova.virt.libvirt.volume [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] iscsiadm ('--op', 'update', '-n', 'node.session.auth.password', '-v', u'***'):stdout= stderr= from (pid=20313) _run_iscsiadm /opt/stack/nova/nova/virt/libvirt/volume.py:242
2014-10-20 18:20:39.340 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m session from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:158
2014-10-20 18:20:39.380 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Result was 21 from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:192
2014-10-20 18:20:39.380 DEBUG nova.virt.libvirt.volume [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] iscsiadm ['-m', 'session']: stdout= stderr=iscsiadm: No active sessions. from (pid=20313) _run_iscsiadm_bare /opt/stack/nova/nova/virt/libvirt/volume.py:585
2014-10-20 18:20:39.381 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806 -p control:3260 --login from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:158
2014-10-20 18:20:40.570 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Result was 24 from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:192

( According to 'man iscsiadm', the result code 24 means "ISCSI_ERR_LOGIN_AUTH_FAILED - login failed due to authorization failure". By some reason, nova seems to ignore this error code... )

2014-10-20 18:20:40.571 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806 -p control:3260 --op update -n node.startup -v automatic from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:158
2014-10-20 18:20:40.610 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Result was 0 from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:192
2014-10-20 18:20:40.610 DEBUG nova.virt.libvirt.volume [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] iscsiadm ('--op', 'update', '-n', 'node.startup', '-v', 'automatic'): stdout= stderr= from (pid=20313) _run_iscsiadm /opt/stack/nova/nova/virt/libvirt/volume.py:242
2014-10-20 18:20:40.611 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806 -p control:3260 --rescan from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:158
2014-10-20 18:20:40.650 DEBUG nova.openstack.common.processutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Result was 21 from (pid=20313) execute /opt/stack/nova/nova/openstack/common/processutils.py:192
2014-10-20 18:20:40.651 DEBUG nova.openstack.common.lockutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Releasing semaphore "connect_volume" from (pid=20313) lock /opt/stack/nova/nova/openstack/common/lockutils.py:238
2014-10-20 18:20:40.651 DEBUG nova.openstack.common.lockutils [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Semaphore / lock released "connect_volume" from (pid=20313) inner /opt/stack/nova/nova/openstack/common/lockutils.py:275
2014-10-20 18:20:40.664 ERROR oslo.messaging.rpc.dispatcher [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Exception during message handling: Unexpected error while running command.
Command: sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806 -p control:3260 --rescan
Exit code: 21
Stdout: u''
Stderr: u'iscsiadm: No session found.\n'
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 137, in _dispatch_and_reply
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 180, in _dispatch
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 126, in _do_dispatch
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 416, in decorated_function
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/exception.py", line 88, in wrapped
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher payload)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/utils/excutils.py", line 82, in __exit__
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/exception.py", line 71, in wrapped
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher return f(self, context, *args, **kw)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 328, in decorated_function
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher kwargs['instance'], e, sys.exc_info())
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/utils/excutils.py", line 82, in __exit__
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 316, in decorated_function
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 4899, in pre_live_migration
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher migrate_data)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5499, in pre_live_migration
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher self._connect_volume(connection_info, disk_info)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1309, in _connect_volume
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher return driver.connect_volume(connection_info, disk_info)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/openstack/common/lockutils.py", line 272, in inner
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher return f(*args, **kwargs)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/virt/libvirt/volume.py", line 300, in connect_volume
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher self._run_iscsiadm(iscsi_properties, ("--rescan",))
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/virt/libvirt/volume.py", line 237, in _run_iscsiadm
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher check_exit_code=check_exit_code)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/utils.py", line 163, in execute
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher return processutils.execute(*cmd, **kwargs)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/openstack/common/processutils.py", line 200, in execute
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher cmd=sanitized_cmd)
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher ProcessExecutionError: Unexpected error while running command.
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher Command: sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806 -p control:3260 --rescan
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher Exit code: 21
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher Stdout: u''
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher Stderr: u'iscsiadm: No session found.\n'
2014-10-20 18:20:40.664 TRACE oslo.messaging.rpc.dispatcher
2014-10-20 18:20:40.665 ERROR oslo.messaging._drivers.common [req-308f9698-d9a6-472c-a06d-126759bcca5a admin demo] Returning exception Unexpected error while running command.
Command: sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-4b9e99d2-fb24-4c59-b48e-f2e60a1a0806 -p control:3260 --rescan
Exit code: 21
Stdout: u''
Stderr: u'iscsiadm: No session found.\n' to caller

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/129756

Changed in cinder:
assignee: nobody → Tomoki Sekiyama (tsekiyama)
status: New → In Progress
Jay Bryant (jsbryant)
tags: added: drivers lvm
Revision history for this message
Eric Harney (eharney) wrote :

Is this broken for LIO and IET as well?

Revision history for this message
Tomoki Sekiyama (tsekiyama) wrote :

Eric,

No, this is tgtd specific.
But LIO has another bug that make second live-migration fail:
   https://bugs.launchpad.net/cinder/+bug/1369541

Mike Perez (thingee)
Changed in cinder:
importance: Undecided → High
milestone: none → kilo-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/129756
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=c22038b9005070e51224f5057aac9f73cf4d0340
Submitter: Jenkins
Branch: master

commit c22038b9005070e51224f5057aac9f73cf4d0340
Author: Tomoki Sekiyama <email address hidden>
Date: Mon Oct 20 14:32:55 2014 -0400

    TgtAdm: Don't change CHAP username/password on live migration

    As tgtd doesn't update CHAP username/password while the initiator is
    connected, CHAP username/password must not be changed while a Nova
    instance are performing live-migration; otherwise the compute node
    which the instance migrates to cannot login to the volume and the
    migration process is aborted.

    This fixes TgtAdm implementation not to regenerate random
    username/password every time initialize_connection is called.
    Also, it enables CHAP auth in unit tests of TargetAdmin helpers.

    Change-Id: I48729a33fada62a7c8e4fe500b1e39533d898701
    Closes-Bug: #1383509

Changed in cinder:
status: In Progress → Fix Committed
tags: added: juno-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/135405

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/juno)

Reviewed: https://review.openstack.org/135405
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=6fac1268dd5b144eb4032048fb83b8c28247473a
Submitter: Jenkins
Branch: stable/juno

commit 6fac1268dd5b144eb4032048fb83b8c28247473a
Author: Tomoki Sekiyama <email address hidden>
Date: Mon Oct 20 14:32:55 2014 -0400

    TgtAdm: Don't change CHAP username/password on live migration

    As tgtd doesn't update CHAP username/password while the initiator is
    connected, CHAP username/password must not be changed while a Nova
    instance are performing live-migration; otherwise the compute node
    which the instance migrates to cannot login to the volume and the
    migration process is aborted.

    This fixes TgtAdm implementation not to regenerate random
    username/password every time initialize_connection is called.
    Also, it enables CHAP auth in unit tests of TargetAdmin helpers.

    Change-Id: I48729a33fada62a7c8e4fe500b1e39533d898701
    Closes-Bug: #1383509
    (cherry picked from commit c22038b9005070e51224f5057aac9f73cf4d0340)

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in cinder:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/145391

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/145391
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=9df3bcead5e9ba89fc8a4130729679fb17a27133
Submitter: Jenkins
Branch: master

commit 9df3bcead5e9ba89fc8a4130729679fb17a27133
Author: Tomoki Sekiyama <email address hidden>
Date: Tue Jan 6 19:58:13 2015 -0500

    TgtAdm: Don't change CHAP username/password on live migration

    This fixes TgtAdm Target Objects not to regenerate random
    username/password every time initialize_connection is called,
    as TgtAdm doesn't accept the change while the initiator is
    connected. This fixes the live-migration of nova instances which
    have volumes attached.

    Change-Id: I1b48a66da5a0d8726490f50c2ee58352c42d9587
    Closes-Bug: #1408162
    Related-Bug: #1383509

Thierry Carrez (ttx)
Changed in cinder:
milestone: kilo-1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.