missing special-case libvirt exception during device detach

Bug #1815949 reported by Chris Friesen
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Chris Friesen
Pike
Fix Committed
Medium
Chris Friesen
Queens
Fix Committed
Medium
Chris Friesen
Rocky
Fix Committed
Medium
Chris Friesen

Bug Description

In Pike a customer has run into the following issue:

2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall [-] Dynamic interval looping call 'oslo_service.loopingcall._func' failed: libvirtError: internal error: unable to execute QEMU command 'device_del': Device 'virtio-disk15' not found
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall Traceback (most recent call last):
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 143, in _run_loop
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 363, in _func
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall result = f(*args, **kwargs)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 505, in _do_wait_and_retry_detach
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall _try_detach_device(config, persistent=False, host=host)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 467, in _try_detach_device
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall device=alternative_device_name)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall self.force_reraise()
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall six.reraise(self.type_, self.value, self.tb)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 451, in _try_detach_device
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall self.detach_device(conf, persistent=persistent, live=live)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 530, in detach_device
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall self._domain.detachDeviceFlags(device_xml, flags=flags)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall result = proxy_call(self._autowrap, f, *args, **kwargs)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall rv = execute(f, *args, **kwargs)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall six.reraise(c, e, tb)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall rv = meth(*args, **kwargs)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1217, in detachDeviceFlags
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall if ret == -1: raise libvirtError ('virDomainDetachDeviceFlags() failed', dom=self)
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall libvirtError: internal error: unable to execute QEMU command 'device_del': Device 'virtio-disk15' not found
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall

Based on discussion with Melanie Witt, it seems likely that nova is missing a special-case in Guest.detach_device_with_retry(). It seems likely we need to modify the conditional at line 409 of virt/libvirt/guest.py to look like 'if errcode in (libvirt.VIR_ERR_OPERATION_FAILED, libvirt.VIR_ERR_INTERNAL_ERROR):'

Revision history for this message
melanie witt (melwitt) wrote :

Marking this as Confirmed based on the traceback -- we can see that libvirt is raising an 'internal error' for a 'not found' condition, which our current code does not expect. To be thorough, we should locate where in the libvirt source this is raised, to see if we can learn the origin of this new error combo (if it's a change from a newer version of libvirt, etc).

tags: added: libvirt
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Chris Friesen (cbf123) wrote :

The " unable to execute QEMU command 'device_del'..." error message is coming from the libvirt code that calls into qemu, so I think it's actually something in qemu that's triggering it.

Revision history for this message
Chris Friesen (cbf123) wrote :
Revision history for this message
Chris Friesen (cbf123) wrote :

Which I think means it'll be this code in libvirt that raises the actual exception:
https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_monitor_json.c#L392

which in turn is called from here:

https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_monitor_json.c#L4194

Changed in nova:
assignee: nobody → Eric Fried (efried)
status: Confirmed → In Progress
Revision history for this message
Eric Fried (efried) wrote :

Okay, looks like lp still doesn't auto-comment for a patch that was originally proposed without the (correct) Closes-Bug tag. The fix is here: https://review.openstack.org/#/c/641480/

Changed in nova:
assignee: Eric Fried (efried) → Chris Friesen (cbf123)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/641480
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2a8ee40fccc65b177275d6fe80c10fdb83b86e1f
Submitter: Zuul
Branch: master

commit 2a8ee40fccc65b177275d6fe80c10fdb83b86e1f
Author: Chris Friesen <email address hidden>
Date: Wed Mar 6 14:19:01 2019 -0600

    Add missing libvirt exception during device detach

    It turns out that when detaching a device libvirt can raise a
    libvirt.VIR_ERR_INTERNAL_ERROR exception with an error log of
    "unable to execute QEMU command 'device_del': Device <foo> not found".

    Add this exception to the existing "not found" case which currently
    handles only libvirt.VIR_ERR_OPERATION_FAILED.

    Change-Id: I3055cd7641de92ab188de73733ca9288a9ca730a
    Closes-Bug: #1815949
    Signed-off-by: Chris Friesen <email address hidden>

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/651637

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/651639

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/651642

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/651637
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6a7e45c750f2759c6b813042413afb9aaa8d1ca4
Submitter: Zuul
Branch: stable/rocky

commit 6a7e45c750f2759c6b813042413afb9aaa8d1ca4
Author: Chris Friesen <email address hidden>
Date: Wed Mar 6 14:19:01 2019 -0600

    Add missing libvirt exception during device detach

    It turns out that when detaching a device libvirt can raise a
    libvirt.VIR_ERR_INTERNAL_ERROR exception with an error log of
    "unable to execute QEMU command 'device_del': Device <foo> not found".

    Add this exception to the existing "not found" case which currently
    handles only libvirt.VIR_ERR_OPERATION_FAILED.

    Change-Id: I3055cd7641de92ab188de73733ca9288a9ca730a
    Closes-Bug: #1815949
    Signed-off-by: Chris Friesen <email address hidden>
    (cherry picked from commit 2a8ee40fccc65b177275d6fe80c10fdb83b86e1f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/651639
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5433dc6a7525a5c356b4b5cd400bbea4e8edd0b0
Submitter: Zuul
Branch: stable/queens

commit 5433dc6a7525a5c356b4b5cd400bbea4e8edd0b0
Author: Chris Friesen <email address hidden>
Date: Wed Mar 6 14:19:01 2019 -0600

    Add missing libvirt exception during device detach

    It turns out that when detaching a device libvirt can raise a
    libvirt.VIR_ERR_INTERNAL_ERROR exception with an error log of
    "unable to execute QEMU command 'device_del': Device <foo> not found".

    Add this exception to the existing "not found" case which currently
    handles only libvirt.VIR_ERR_OPERATION_FAILED.

    Change-Id: I3055cd7641de92ab188de73733ca9288a9ca730a
    Closes-Bug: #1815949
    Signed-off-by: Chris Friesen <email address hidden>
    (cherry picked from commit 2a8ee40fccc65b177275d6fe80c10fdb83b86e1f)
    (cherry picked from commit 6a7e45c750f2759c6b813042413afb9aaa8d1ca4)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.opendev.org/651642
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cc697d676c14a4abbf82b063910736897bddbdac
Submitter: Zuul
Branch: stable/pike

commit cc697d676c14a4abbf82b063910736897bddbdac
Author: Chris Friesen <email address hidden>
Date: Wed Mar 6 14:19:01 2019 -0600

    Add missing libvirt exception during device detach

    It turns out that when detaching a device libvirt can raise a
    libvirt.VIR_ERR_INTERNAL_ERROR exception with an error log of
    "unable to execute QEMU command 'device_del': Device <foo> not found".

    Add this exception to the existing "not found" case which currently
    handles only libvirt.VIR_ERR_OPERATION_FAILED.

    Change-Id: I3055cd7641de92ab188de73733ca9288a9ca730a
    Closes-Bug: #1815949
    Signed-off-by: Chris Friesen <email address hidden>
    (cherry picked from commit 2a8ee40fccc65b177275d6fe80c10fdb83b86e1f)
    (cherry picked from commit 6a7e45c750f2759c6b813042413afb9aaa8d1ca4)
    (cherry picked from commit 5433dc6a7525a5c356b4b5cd400bbea4e8edd0b0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.8

This issue was fixed in the openstack/nova 16.1.8 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.1

This issue was fixed in the openstack/nova 18.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.11

This issue was fixed in the openstack/nova 17.0.11 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.