libvirt: concurrent detach_volume and terminate fails

Bug #1057719 reported by Vish Ishaya
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Vish Ishaya
Folsom
Fix Released
High
Chuck Short
nova (Ubuntu)
Fix Released
Undecided
Unassigned
Quantal
Fix Released
Undecided
Unassigned

Bug Description

If you detach a volume from an instance and then terminate the instance concurrently, the two execution paths can stomp on each other:

Traceback from terminate greenthread:

2012-09-27 18:32:40 ERROR nova.openstack.common.rpc.amqp [-] Exception during message handling
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 275, in _process_data
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp rval = self.proxy.dispatch(ctxt, version, method, **args)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 145, in dispatch
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 117, in wrapped
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp temp_level, payload)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 92, in wrapped
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp return f(*args, **kw)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 181, in decorated_function
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp pass
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 167, in decorated_function
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 202, in decorated_function
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp kwargs['instance']['uuid'], e, sys.exc_info())
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 196, in decorated_function
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 917, in terminate_instance
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp do_terminate_instance(instance)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 752, in inner
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp retval = f(*args, **kwargs)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 909, in do_terminate_instance
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp self._delete_instance(context, instance)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 881, in _delete_instance
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp self._shutdown_instance(context, instance)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 845, in _shutdown_instance
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp block_device_info)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 845, in _shutdown_instance
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp block_device_info)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 492, in destroy
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp self._cleanup(instance, network_info, block_device_info)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 535, in _cleanup
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp mount_device)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 610, in volume_driver_method
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp return method(connection_info, *args, **kwargs)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 752, in inner
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp retval = f(*args, **kwargs)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/volume.py", line 202, in disconnect_volume
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp check_exit_code=[0, 255])
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/volume.py", line 115, in _iscsiadm_update
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp return self._run_iscsiadm(iscsi_properties, iscsi_command, **kwargs)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/volume.py", line 106, in _run_iscsiadm
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp check_exit_code=check_exit_code)
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 198, in execute
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp cmd=' '.join(cmd))
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp ProcessExecutionError: Unexpected error while running command.
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp Command: sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-f8731ea4-afb5-48f2-91b8-b64570f893e0 -p 10.129.28.4:3260 --op update -n node.startup -v manual
2012-09-27 18:32:40 TRACE nova.openstack.common.rpc.amqp Exit code: 21

(other traceback reported as separate bug here: https://bugs.launchpad.net/nova/+bug/1057730

The error on the detach isn't particularly worrisome as the detach is actually failing because the volume is gone. Might be nice to catch the error and just log a warning instead of traceback. The other issue seems to be a lack of idempotency in the iscsi disconnect code.

description: updated
Changed in nova:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Vish Ishaya (vishvananda)
tags: added: folsom-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/13788

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/13788
Committed: http://github.com/openstack/nova/commit/628a993ce29cfdee00463c16b932f23e30bc52bf
Submitter: Jenkins
Branch: master

commit 628a993ce29cfdee00463c16b932f23e30bc52bf
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Sep 27 12:32:32 2012 -0700

    libvirt: Improve the idempotency of iscsi detach

    When detaching an iscsi volume it is possible for the iscsi commands
    to run concurrently, causing a target to be deleted by one greenthread
    while the other is continuing. When removing the iscsi connection,
    we should always ignore exit code 21 because that means that the
    target has already been removed.

    Fixes bug 1057719

    Change-Id: I0c9f2623f85a817e2be506f9a6d523d45c76848a

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/14061

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/folsom)

Reviewed: https://review.openstack.org/14061
Committed: http://github.com/openstack/nova/commit/0af4dd0120270592e02d3afa1be278a834bf470a
Submitter: Jenkins
Branch: stable/folsom

commit 0af4dd0120270592e02d3afa1be278a834bf470a
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Sep 27 12:32:32 2012 -0700

    libvirt: Improve the idempotency of iscsi detach

    When detaching an iscsi volume it is possible for the iscsi commands
    to run concurrently, causing a target to be deleted by one greenthread
    while the other is continuing. When removing the iscsi connection,
    we should always ignore exit code 21 because that means that the
    target has already been removed.

    Fixes bug 1057719

    Change-Id: I0c9f2623f85a817e2be506f9a6d523d45c76848a

Chuck Short (zulcss)
tags: removed: folsom-backport-potential
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-1
status: Fix Committed → Fix Released
Changed in nova (Ubuntu):
status: New → Fix Released
Changed in nova (Ubuntu Quantal):
status: New → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Hello Vish, or anyone else affected,

Accepted nova into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/2012.2.1+stable-20121212-a99a802e-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Quantal):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.3 KiB)

This bug was fixed in the package nova - 2012.2.1+stable-20121212-a99a802e-0ubuntu1

---------------
nova (2012.2.1+stable-20121212-a99a802e-0ubuntu1) quantal-proposed; urgency=low

  * Ubuntu updates:
    - debian/control: Ensure novaclient is upgraded with nova,
      require python-keystoneclient >= 1:2.9.0. (LP: #1073289)
    - d/p/avoid_setuptools_git_dependency.patch: Refresh.
  * Dropped patches, applied upstream:
    - debian/patches/CVE-2012-5625.patch: [a99a802]
  * Resynchronize with stable/folsom (b55014ca) (LP: #1085255):
    - [a99a802] create_lvm_image allocates dirty blocks (LP: #1070539)
    - [670b388] RPC exchange name defaults to 'openstack' (LP: #1083944)
    - [3ede373] disassociate_floating_ip with multi_host=True fails
      (LP: #1074437)
    - [22d7c3b] libvirt imagecache should handle shared image storage
      (LP: #1075018)
    - [e787786] Detached and deleted RBD volumes remain associated with insance
      (LP: #1083818)
    - [9265eb0] live_migration missing migrate_data parameter in Hyper-V driver
      (LP: #1066513)
    - [3d99848] use_single_default_gateway does not function correctly
      (LP: #1075859)
    - [65a2d0a] resize does not migrate DHCP host information (LP: #1065440)
    - [102c76b] Nova backup image fails (LP: #1065053)
    - [48a3521] Fix config-file overrides for nova-dhcpbridge
    - [69663ee] Cloudpipe in Folsom: no such option: cnt_vpn_clients
      (LP: #1069573)
    - [6e47cc8] DisassociateAddress can cause Internal Server Error
      (LP: #1080406)
    - [22c3d7b] API calls to dis-associate an auto-assigned floating IP should
      return proper warning (LP: #1061499)
    - [bd11d15] libvirt: if exception raised during volume_detach, volume state
      is inconsistent (LP: #1057756)
    - [dcb59c3] admin can't describe all images in ec2 api (LP: #1070138)
    - [78de622] Incorrect Exception raised during Create server when metadata
      over 255 characters (LP: #1004007)
    - [c313de4] Fixed IP isn't released before updating DHCP host file
      (LP: #1078718)
    - [f4ab42d] Enabling Return Reservation ID with XML create server request
      returns no body (LP: #1061124)
    - [3db2a38] 'BackupCreate' should accept rotation parameter greater than or
      equal to zero (LP: #1071168)
    - [f7e5dde] libvirt reboot sometimes fails to reattach volumes
      (LP: #1073720)
    - [ff776d4] libvirt: detaching volume may fail while terminating other
      instances on the same host concurrently (LP: #1060836)
    - [85a8bc2] Used instance uuid rather than id in remove-fixed-ip
    - [42a85c0] Fix error on invalid delete_on_termination value
    - [6a17579] xenapi migrations fail w/ swap (LP: #1064083)
    - [97649b8] attach-time field for volumes is not updated for detach volume
      (LP: #1056122)
    - [8f6a718] libvirt: rebuild is not using kernel and ramdisk associated with
      the new image (LP: #1060925)
    - [fbe835f] live-migration and volume host assignement (LP: #1066887)
    - [c2a9150] typo prevents volume_tmp_dir flag from working (LP: #1071536)
    - [93efa21] Instances deleted during spawn leak network allocations
      (LP: #1068716)
    - [ebabd02] After restartin...

Read more...

Changed in nova (Ubuntu Quantal):
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-1 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.