NotImplementedError(_('direct_snapshot() is not implemented')) stacktraces in n-cpu logs

Bug #1722571 reported by Matt Riedemann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Hongbin Lu

Bug Description

After we enabled live snapshot by default for the libvirt driver, we get these stacktraces all over the n-cpu logs anytime we create a snapshot image:

http://logs.openstack.org/65/509465/18/check/gate-tempest-dsvm-py35-ubuntu-xenial/0e88c0a/logs/screen-n-cpu.txt#_Oct_10_13_48_24_125578

Oct 10 13:48:24.125578 ubuntu-xenial-infracloud-chocolate-11309928 nova-compute[26979]: ERROR nova.compute.manager [instance: 8cd13eb3-54cc-4ca3-9bfc-689efd768baf] Traceback (most recent call last):
Oct 10 13:48:24.125728 ubuntu-xenial-infracloud-chocolate-11309928 nova-compute[26979]: ERROR nova.compute.manager [instance: 8cd13eb3-54cc-4ca3-9bfc-689efd768baf] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1697, in snapshot
Oct 10 13:48:24.125890 ubuntu-xenial-infracloud-chocolate-11309928 nova-compute[26979]: ERROR nova.compute.manager [instance: 8cd13eb3-54cc-4ca3-9bfc-689efd768baf] instance.image_ref)
Oct 10 13:48:24.126025 ubuntu-xenial-infracloud-chocolate-11309928 nova-compute[26979]: ERROR nova.compute.manager [instance: 8cd13eb3-54cc-4ca3-9bfc-689efd768baf] File "/opt/stack/new/nova/nova/virt/libvirt/imagebackend.py", line 412, in direct_snapshot
Oct 10 13:48:24.126158 ubuntu-xenial-infracloud-chocolate-11309928 nova-compute[26979]: ERROR nova.compute.manager [instance: 8cd13eb3-54cc-4ca3-9bfc-689efd768baf] raise NotImplementedError(_('direct_snapshot() is not implemented'))
Oct 10 13:48:24.126326 ubuntu-xenial-infracloud-chocolate-11309928 nova-compute[26979]: ERROR nova.compute.manager [instance: 8cd13eb3-54cc-4ca3-9bfc-689efd768baf] NotImplementedError: direct_snapshot() is not implemented
Oct 10 13:48:24.126485 ubuntu-xenial-infracloud-chocolate-11309928 nova-compute[26979]: ERROR nova.compute.manager [instance: 8cd13eb3-54cc-4ca3-9bfc-689efd768baf]

We shouldn't be stacktracing on that NotImplementedError since it's an image backend-specific method implementation for handling snapshots, and only the Rbd image backend implements direct_snapshot().

hongbin (hongbin034)
Changed in nova:
assignee: nobody → hongbin (hongbin034)
Revision history for this message
hongbin (hongbin034) wrote :

Note: this error occurred in py35 dsvm job only.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Good call, my guess is it has something to do with this type check on py35:

if type(e) != NotImplementedError:

Revision history for this message
Matt Riedemann (mriedem) wrote :

Oh nevermind, it's something with error handling in python3 when raising an exception in a block that already has an exception in scope:

https://blog.ionelmc.ro/2014/08/03/the-most-underrated-feature-in-python-3/

Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm having a hard time sorting out which test is making this happen. It looks like it's creating a snapshot of the instance and deleting it at the same time.

http://logs.openstack.org/65/509465/18/check/gate-tempest-dsvm-py35-ubuntu-xenial/0e88c0a/logs/screen-n-cpu.txt#_Oct_10_13_48_23_458518

Oct 10 13:48:23.458518 ubuntu-xenial-infracloud-chocolate-11309928 nova-compute[26979]: INFO nova.virt.libvirt.driver [None req-7865289e-3c92-403f-b16f-bbd01190b07b tempest-ImagesTestJSON-1081626789 tempest-ImagesTestJSON-1081626789] [instance: 8cd13eb3-54cc-4ca3-9bfc-689efd768baf] Beginning live snapshot process

And another test is deleting the same instance during the snapshot:

http://logs.openstack.org/65/509465/18/check/gate-tempest-dsvm-py35-ubuntu-xenial/0e88c0a/logs/screen-n-cpu.txt#_Oct_10_13_48_23_837715

Oct 10 13:48:23.837715 ubuntu-xenial-infracloud-chocolate-11309928 nova-compute[26979]: INFO nova.compute.manager [None req-9696b6e6-1f95-4ecd-8bdf-265988f0f261 tempest-ImagesTestJSON-1081626789 tempest-ImagesTestJSON-1081626789] [instance: 8cd13eb3-54cc-4ca3-9bfc-689efd768baf] Terminating instance

Revision history for this message
Matt Riedemann (mriedem) wrote :

I think it's this test that does it:

    @decorators.idempotent_id('aa06b52b-2db5-4807-b218-9441f75d74e3')
    def test_delete_saving_image(self):

It creates a server and starts creating a snapshot image, then deletes the image from glance and deletes the server while it's still creating the snapshot.

Revision history for this message
Matt Riedemann (mriedem) wrote :

So I guess we just need to handle the libvirtError down in nova.virt.libvirt.driver.LibvirtDriver.snapshot where the domain is gone and we can raise that up as an InstanceNotFound which will be handled properly in the ComputeManager.snapshot_instance method.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/511074

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
assignee: hongbin (hongbin034) → Hongbin Lu (hongbin.lu)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/511074
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9ef56fa86639f97f63d853dbb2213415dcd5691b
Submitter: Zuul
Branch: master

commit 9ef56fa86639f97f63d853dbb2213415dcd5691b
Author: Hongbin Lu <email address hidden>
Date: Tue Oct 10 23:05:14 2017 +0000

    Handle not found error on taking snapshot

    If there is a request to create a snapshot of an instance and
    another request to delete the instance at the same time, the
    snapshot task might fail with libvirt error and this error
    is not handled correctly by compute manager. As a result,
    tracestack was printed in the compute log.

    This patch fixes it by handling libvirt exception during live
    snapshot and raise instance not found exception if the libvirt
    exception is raised due to domain not found.

    Change-Id: I585b7b03753ed1d28a313ce443e6918687d76a8b
    Closes-Bug: #1722571

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/550660

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/550660
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=71375a941d14108b14583a963a22febe32c5a2f0
Submitter: Zuul
Branch: stable/queens

commit 71375a941d14108b14583a963a22febe32c5a2f0
Author: Hongbin Lu <email address hidden>
Date: Tue Oct 10 23:05:14 2017 +0000

    Handle not found error on taking snapshot

    If there is a request to create a snapshot of an instance and
    another request to delete the instance at the same time, the
    snapshot task might fail with libvirt error and this error
    is not handled correctly by compute manager. As a result,
    tracestack was printed in the compute log.

    This patch fixes it by handling libvirt exception during live
    snapshot and raise instance not found exception if the libvirt
    exception is raised due to domain not found.

    Change-Id: I585b7b03753ed1d28a313ce443e6918687d76a8b
    Closes-Bug: #1722571
    (cherry picked from commit 9ef56fa86639f97f63d853dbb2213415dcd5691b)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.2

This issue was fixed in the openstack/nova 17.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.