CI is failing randmly when executing snapshot test

Bug #1809114 reported by Sahid Orentino
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nova-lxd
Fix Committed
Undecided
Sahid Orentino

Bug Description

The CI is failing randomly when attempting to execute snapshot test.

Basically the test is spawning a new instance and then immediately execute a snapshot. Based on my investigation it seems the call of snapshot is accepted where the instance is not completely setuped in LXD. I suspect that we update the instance state too early.

To reproduce the issue, just spawn an instance and repeatably execute a call of snapshot until it got accepted.

Dec 18 14:22:41.943843 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] Traceback (most recent call last):
Dec 18 14:22:41.944039 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] File "/opt/stack/new/nova/nova/compute/manager.py", line 229, in decorated_function
Dec 18 14:22:41.944217 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] *args, **kwargs)
Dec 18 14:22:41.944391 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] File "/opt/stack/new/nova/nova/compute/manager.py", line 3468, in snapshot_instance
Dec 18 14:22:41.944566 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] task_states.IMAGE_SNAPSHOT)
Dec 18 14:22:41.944749 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] File "/opt/stack/new/nova/nova/compute/manager.py", line 3501, in _snapshot_instance
Dec 18 14:22:41.944922 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] update_task_state)
Dec 18 14:22:41.945128 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] File "/opt/stack/new/nova-lxd/nova/virt/lxd/driver.py", line 859, in snapshot
Dec 18 14:22:41.945301 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] container.stop(wait=True)
Dec 18 14:22:41.945485 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] File "/usr/local/lib/python2.7/dist-packages/pylxd/models/container.py", line 315, in stop
Dec 18 14:22:41.945658 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] wait=wait)
Dec 18 14:22:41.945842 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] File "/usr/local/lib/python2.7/dist-packages/pylxd/models/container.py", line 296, in _set_state
Dec 18 14:22:41.946023 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] self.sync()
Dec 18 14:22:41.946194 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] File "/usr/local/lib/python2.7/dist-packages/pylxd/models/_model.py", line 154, in sync
Dec 18 14:22:41.946379 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] response = self.api.get()
Dec 18 14:22:41.946573 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] File "/usr/local/lib/python2.7/dist-packages/pylxd/client.py", line 148, in get
Dec 18 14:22:41.946748 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] is_api=is_api)
Dec 18 14:22:41.946922 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] File "/usr/local/lib/python2.7/dist-packages/pylxd/client.py", line 103, in _assert_response
Dec 18 14:22:41.947110 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] raise exceptions.NotFound(response)
Dec 18 14:22:41.947282 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b] NotFound: not found
Dec 18 14:22:41.947463 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR nova.compute.manager [instance: 6133e14c-e1b2-4b44-92db-da851138564b]
Dec 18 14:22:42.346283 ubuntu-xenial-inap-mtl01-0001261442 nova-compute[27817]: ERROR oslo_messaging.rpc.server [None req-00b07dc6-6f69-488b-9272-d140da1448ba tempest-ImagesTestJSON-1762846439 tempest-ImagesTestJSON-1762846439] Exception during message handling: NotFound: not found

summary: - CI is failing randmly
+ CI is failing randmly when executing snapshot test
Changed in nova-lxd:
assignee: nobody → sahid (sahid-ferdjaoui)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova-lxd (master)

Fix proposed to branch: master
Review: https://review.openstack.org/626187

Changed in nova-lxd:
status: New → In Progress
Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :

The reproduction test that I have indicated in the description is not right. Not for this issue. The bug looks to be a race condition by a missing mutex preventing updating container references. Basically executing tests in parallel is creating the issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova-lxd (master)

Reviewed: https://review.openstack.org/626187
Committed: https://git.openstack.org/cgit/openstack/nova-lxd/commit/?id=7e39428691eab57bfb23fda2185695da4a666fc4
Submitter: Zuul
Branch: master

commit 7e39428691eab57bfb23fda2185695da4a666fc4
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Wed Dec 19 10:00:30 2018 -0500

    add mutex to protect container instances

    (SO) Without protecting it, several methods can access in same time to
    the container instance and updating the state.

    (AJK) Also fix py27 change where nova.network.linux_utils has moved
    to/as nova.privsep.linux_net

    Closes-Bug: #1809114
    Change-Id: I28e68e150f5d6e3efdb243aae9e3cf15fda01a65
    Co-authored-by: Sahid Orentino Ferdjaoui <email address hidden>
    Co-authored-by: Alex Kavanagh <email address hidden>

Changed in nova-lxd:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.