transient failures during lxc test during shutdown

Bug #1783198 reported by Scott Moser on 2018-07-23
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Medium
Scott Moser

Bug Description

We have been seeing a lot of transient failures

https://jenkins.ubuntu.com/server/job/cloud-init-integration-lxd-c/72/consoleFull
with a stack trace that looks like below.

I think that we might be attempting to delete an instance twice or shutting it down twice. not sure.

2018-07-20 12:20:30,781 - tests.cloud_tests - DEBUG - executing "collect: instance-id"
2018-07-20 12:20:46,612 - tests.cloud_tests - ERROR - stage: collect test data for cosmic encountered error: not found
2018-07-20 12:20:46,614 - tests.cloud_tests - ERROR - traceback:
  File "/var/lib/jenkins/slaves/torkoal/workspace/cloud-init-integration-lxd-c/cloud-init/tests/cloud_tests/stage.py", line 97, in run_stage
    (call_res, call_failed) = call()
  File "/var/lib/jenkins/slaves/torkoal/workspace/cloud-init-integration-lxd-c/cloud-init/tests/cloud_tests/collect.py", line 111, in collect_test_data
    instance.shutdown()
  File "/var/lib/jenkins/slaves/torkoal/workspace/cloud-init-integration-lxd-c/cloud-init/tests/cloud_tests/platforms/lxd/instance.py", line 171, in shutdown
    self.pylxd_container.stop(wait=wait)
  File "/var/lib/jenkins/slaves/torkoal/workspace/cloud-init-integration-lxd-c/cloud-init/.tox/citest/lib/python3.5/site-packages/pylxd/models/container.py", line 316, in stop
    wait=wait)
  File "/var/lib/jenkins/slaves/torkoal/workspace/cloud-init-integration-lxd-c/cloud-init/.tox/citest/lib/python3.5/site-packages/pylxd/models/container.py", line 291, in _set_state
    response.json()['operation'])
  File "/var/lib/jenkins/slaves/torkoal/workspace/cloud-init-integration-lxd-c/cloud-init/.tox/citest/lib/python3.5/site-packages/pylxd/models/operation.py", line 33, in wait_for_operation
    return cls.get(client, operation.id)
  File "/var/lib/jenkins/slaves/torkoal/workspace/cloud-init-integration-lxd-c/cloud-init/.tox/citest/lib/python3.5/site-packages/pylxd/models/operation.py", line 40, in get
    response = client.api.operations[operation_id].get()
  File "/var/lib/jenkins/slaves/torkoal/workspace/cloud-init-integration-lxd-c/cloud-init/.tox/citest/lib/python3.5/site-packages/pylxd/client.py", line 148, in get
    is_api=is_api)
  File "/var/lib/jenkins/slaves/torkoal/workspace/cloud-init-integration-lxd-c/cloud-init/.tox/citest/lib/python3.5/site-packages/pylxd/client.py", line 103, in _assert_response
    raise exceptions.NotFound(response)

Related branches

Scott Moser (smoser) on 2018-07-23
Changed in cloud-init:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Scott Moser (smoser)
Scott Moser (smoser) wrote :

I'm attaching all console logs that were available of lxd runs of integration
tests. The interesting thing is that *most* of the time when we see pylxd
related trace backs it is in shutdown (there was one in start). Also
interesting is that most of the time the traceback occurs after collection
of the last file, and about 25-30 seconds later.

So for shutdown specifically, it could be a result of the system failing
to shutdown.

Example:
2018-07-15 12:20:11,234 - tests.cloud_tests - DEBUG - executing "collect: result.json"
2018-07-15 12:20:28,612 - tests.cloud_tests - ERROR - stage: collect test data for bionic encountered error: not found
2018-07-15 12:20:28,615 - tests.cloud_tests - ERROR - traceback:

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers