xenapi: VBD detach failure

Bug #1217972 reported by Mate Lakat
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
John Garbutt

Bug Description

This text appears in compute log:

DEBUG nova.virt.xenapi.vm_utils [req-909291ef-dbf7-4ed7-a1f4-f4b613117ac9 demo demo] Plugging VBD OpaqueRef:8d37c8d2-5a00-1442-8f19-3c97c3d9a751 ... from (pid=26384) vdi_attached_here /opt/stack/nova/nova/virt/xenapi/vm_utils.py:1911
[req-e64ceb14-3cd9-4734-b1d1-9cb416034503 demo demo] ['INTERNAL_ERROR', 'File "xapi_xenops.ml", line 2088, characters 3-9: Assertion failed']
Traceback (most recent call last):
  File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 352, in unplug_vbd
    session.call_xenapi('VBD.unplug', vbd_ref)
  File "/opt/stack/nova/nova/virt/xenapi/driver.py", line 719, in call_xenapi
    return session.xenapi_request(method, args)
  File "/usr/local/lib/python2.7/dist-packages/XenAPI.py", line 133, in xenapi_request
    result = _parse_result(getattr(self, methodname)(*full_params))
  File "/usr/local/lib/python2.7/dist-packages/XenAPI.py", line 203, in _parse_result
    raise Failure(result['ErrorDescription'])
Failure: ['INTERNAL_ERROR', 'File "xapi_xenops.ml", line 2088, characters 3-9: Assertion failed']
...
[req-e64ceb14-3cd9-4734-b1d1-9cb416034503 demo demo] ['OPERATION_NOT_ALLOWED', "VBD '0d03a77b-6ae3-2e16-1a63-d771e374f513' still attached to '98bbd5ba-dc99-6c96-32be-b170cf8c9dd6'"]
Traceback (most recent call last):
  File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 379, in destroy_vbd
    session.call_xenapi('VBD.destroy', vbd_ref)
  File "/opt/stack/nova/nova/virt/xenapi/driver.py", line 719, in call_xenapi
    return session.xenapi_request(method, args)
  File "/usr/local/lib/python2.7/dist-packages/XenAPI.py", line 133, in xenapi_request
    result = _parse_result(getattr(self, methodname)(*full_params))
  File "/usr/local/lib/python2.7/dist-packages/XenAPI.py", line 203, in _parse_result
    raise Failure(result['ErrorDescription'])
Failure: ['OPERATION_NOT_ALLOWED', "VBD '0d03a77b-6ae3-2e16-1a63-d771e374f513' still attached to '98bbd5ba-dc99-6c96-32be-b170cf8c9dd6'"]

[req-e64ceb14-3cd9-4734-b1d1-9cb416034503 demo demo] Destroying VBD for VDI OpaqueRef:d93e1482-aba5-e919-7733-7db2f3d2ccd6 done. from (pid=26384) vdi_attached_here /opt/stack/nova/nova/virt/xenapi/vm_utils.py:1934
[req-e64ceb14-3cd9-4734-b1d1-9cb416034503 demo demo] [instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] Failed to spawn, rolling back
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] Traceback (most recent call last):
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 498, in spawn
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] kernel_file, ramdisk_file = create_kernel_ramdisk_step(undo_mgr)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 153, in inner
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] rv = f(*args, **kwargs)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 390, in create_kernel_ramdisk_step
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] instance, context, name_label)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 191, in _create_kernel_and_ramdisk
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] vm_utils.ImageType.KERNEL)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 1019, in create_kernel_image
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] image_id, image_type)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 1331, in _fetch_disk_image
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] session, image.stream_to, image_type, virtual_size, dev)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] self.gen.next()
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 1927, in vdi_attached_here
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] unplug_vbd(session, vbd_ref)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 367, in unplug_vbd
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] _('Unable to unplug VBD %s') % vbd_ref)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] StorageError: Unable to unplug VBD OpaqueRef:f0fafa75-763d-db2d-d022-6e16b9808e44
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8]
[req-e64ceb14-3cd9-4734-b1d1-9cb416034503 demo demo] [instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] Instance failed to spawn
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] Traceback (most recent call last):
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/compute/manager.py", line 1286, in _spawn
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] block_device_info)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/driver.py", line 180, in spawn
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] admin_password, network_info, block_device_info)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 514, in spawn
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] undo_mgr.rollback_and_reraise(msg=msg, instance=instance)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/utils.py", line 981, in rollback_and_reraise
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] self._rollback()
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 498, in spawn
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] kernel_file, ramdisk_file = create_kernel_ramdisk_step(undo_mgr)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 153, in inner
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] rv = f(*args, **kwargs)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 390, in create_kernel_ramdisk_step
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] instance, context, name_label)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 191, in _create_kernel_and_ramdisk
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] vm_utils.ImageType.KERNEL)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 1019, in create_kernel_image
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] image_id, image_type)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 1331, in _fetch_disk_image
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] session, image.stream_to, image_type, virtual_size, dev)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] self.gen.next()
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 1927, in vdi_attached_here
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] unplug_vbd(session, vbd_ref)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 367, in unplug_vbd
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] _('Unable to unplug VBD %s') % vbd_ref)
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] StorageError: Unable to unplug VBD OpaqueRef:f0fafa75-763d-db2d-d022-6e16b9808e44
[instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8]
[req-e64ceb14-3cd9-4734-b1d1-9cb416034503 demo demo] [instance: f2b36c4f-0c67-4a26-954c-aef3d2b9f7d8] Aborting claim: [Claim: 512 MB memory, 1 GB disk, 1 VCPUS] from (pid=26384) abort /opt/stack/nova/nova/compute/claims.py:97

Looking at vm_utils.py:

def unplug_vbd(session, vbd_ref):
    """Unplug VBD from VM."""
    # Call VBD.unplug on the given VBD, with a retry if we get
    # DEVICE_DETACH_REJECTED. For reasons which we don't understand,
    # we're seeing the device still in use, even when all processes
    # using the device should be dead.
    max_attempts = CONF.xenapi_num_vbd_unplug_retries + 1
    for num_attempt in xrange(1, max_attempts + 1):
        try:
            session.call_xenapi('VBD.unplug', vbd_ref)
            return
        except session.XenAPI.Failure as exc:
            err = len(exc.details) > 0 and exc.details[0]
            if err == 'DEVICE_ALREADY_DETACHED':
                LOG.info(_('VBD %s already detached'), vbd_ref)
                return
            elif err == 'DEVICE_DETACH_REJECTED':
                LOG.info(_('VBD %(vbd_ref)s detach rejected, attempt'
                           ' %(num_attempt)d/%(max_attempts)d'),
                         {'vbd_ref': vbd_ref, 'num_attempt': num_attempt,
                          'max_attempts': max_attempts})
            else:
                LOG.exception(exc)
                raise volume_utils.StorageError(
                        _('Unable to unplug VBD %s') % vbd_ref)

        greenthread.sleep(1)

    raise volume_utils.StorageError(
            _('Reached maximum number of retries trying to unplug VBD %s')
            % vbd_ref)

It is prepared to handle such situations. Maybe xapi returns with a different error code?

My XenServer is 6.2:

[root@megadodo ~]# cat /etc/xensource-inventory
BUILD_NUMBER='70446c'
PLATFORM_VERSION='1.8.0'
DOM0_VCPUS='4'
INSTALLATION_UUID='4d944fd3-8b48-47a1-9b13-155ae6923a37'
MANAGEMENT_ADDRESS_TYPE='IPv4'
PRODUCT_VERSION_TEXT_SHORT='6.2'
BRAND_CONSOLE='XenCenter'
PRIMARY_DISK='/dev/disk/by-id/scsi-SATA_SAMSUNG_HE253GJ_S2B5J90ZC12322'
PRODUCT_BRAND='XenServer'
INSTALLATION_DATE='2013-08-21 08:05:18.227510'
PLATFORM_NAME='XCP'
COMPANY_NAME_SHORT='Citrix'
PRODUCT_VERSION_TEXT='6.2'
BACKUP_PARTITION='/dev/disk/by-id/scsi-SATA_SAMSUNG_HE253GJ_S2B5J90ZC12322-part2'
PRODUCT_VERSION='6.2.0'
XEN_VERSION='4.1.5'
KERNEL_VERSION='2.6.32.43-0.4.1.xs1.8.0.835.170778xen'
MANAGEMENT_INTERFACE='xenbr0'
DOM0_MEM='752'
COMPANY_NAME='Citrix Systems, Inc.'
PRODUCT_NAME='xenenterprise'
CONTROL_DOMAIN_UUID='4e04643c-0506-408c-9a84-e45fe055ce90'

Tags: xenserver
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/44099

Changed in nova:
assignee: nobody → Mate Lakat (mate-lakat)
status: New → In Progress
Changed in nova:
importance: Undecided → Medium
Revision history for this message
John Garbutt (johngarbutt) wrote :

progress has stalled, review is abandoned.
de-assigning for now.

Changed in nova:
status: In Progress → Triaged
assignee: Mate Lakat (mate-lakat) → nobody
Revision history for this message
Mate Lakat (mate-lakat) wrote :

Let me handle this. This is important, to keep the builds stable.

Changed in nova:
assignee: nobody → Mate Lakat (mate-lakat)
Revision history for this message
John Garbutt (johngarbutt) wrote :
Changed in nova:
assignee: Mate Lakat (mate-lakat) → John Garbutt (johngarbutt)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/51218
Committed: http://github.com/openstack/nova/commit/f9d2b06c6c6fd9cac91993bb38f0d96edc385366
Submitter: Jenkins
Branch: master

commit f9d2b06c6c6fd9cac91993bb38f0d96edc385366
Author: John Garbutt <email address hidden>
Date: Fri Oct 11 14:46:40 2013 +0100

    xenapi: workaround for failing vbd detach

    Sometimes xapi returns with an internal error during unplug_vbd. This
    patch adds a workaround for XenServer 6.2 by inspecting the error
    message, and retry the unplug.

    In addition, this also ensures that sleep is no longer called after the
    last retry before raising an exception.

    Fixes bug 1217972
    Change-Id: I7b86325714e38b00a2f51bbf083b5532a6c51fa9

Changed in nova:
status: In Progress → Fix Committed
Bob Ball (bob-ball)
Changed in nova:
status: Fix Committed → Confirmed
milestone: none → icehouse-2
Revision history for this message
Bob Ball (bob-ball) wrote :

The fix uploaded above does not fully address this issue; The root cause of the issue is multiple plug/unplugs occurring simultaneously.

Complete patch at https://review.openstack.org/59856

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/59856
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=766dff6b70c0dd5cfe4bd42ba654ec7964bc6128
Submitter: Jenkins
Branch: master

commit 766dff6b70c0dd5cfe4bd42ba654ec7964bc6128
Author: Bob Ball <email address hidden>
Date: Tue Dec 3 22:27:38 2013 +0000

    XenAPI: Synchronize on all VBD plug/unplug per VM

    Dynamically adding or removing a device needs to be synchronized
    to work around a bug in XenServer 6.1 and 6.2 where simultaneous
    events can be partially ignored, leading to a failure to plug or
    unplug. The bug is between two internal components in XenServer
    which pass messages - the first component will ignore all messages
    from the second while a plug/unplug is in progress, expecting
    only one message to be received. If two plug/unplugs occur in
    this time, an internal assertion is raised as XAPIs internal state
    is inconsistent.

    Change I7b86325714e38b00a2f51bbf083b5532a6c51fa9 attempted to
    work around the issue by retrying - however that does not catch
    all failures. This change should eliminate some of the known
    INTERNAL_ERROR failures, but not all, so I7b8 need not be
    reverted.

    Change Ie7a64c481a9e3874df9d5924fbb050ac7ab117db does the same
    as this, but only for VBD.plug.

    Change-Id: If78d4e5f1dd84f1d4a588f25b8f887a7b125bbea
    Closes-bug: 1217972

Changed in nova:
status: Confirmed → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-2 → 2014.1
Revision history for this message
Richard Lee (3630034-y) wrote :

I have the same log when i boot a vm first time by 2014.1.2 version. Is this bug fixed or is there something wrong else in my config file?

Revision history for this message
Richard Lee (3630034-y) wrote :

sorry, it is not all the same with this bug. i will post a new question.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.