Attach volume tempest tests are failing with failed to detach from server with in required time

Bug #1957031 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

Below is the list of tempest tests failing on full-tempest-api CS9 job[1]

```
{3} tempest.api.compute.volumes.test_attach_volume.AttachVolumeShelveTestJSON.test_attach_volume_shelved_or_offload_server [706.429983s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):

      File "/usr/lib/python3.9/site-packages/tempest/common/waiters.py", line 380, in wait_for_volume_attachment_remove_from_server
    raise lib_exc.TimeoutException(message)

    tempest.lib.exceptions.TimeoutException: Request timed out
Details: Volume c3a22d9b-f4ca-476a-a0d4-7c0c771c1f29 failed to detach from server e4fb37d5-9395-40da-b576-3dd132244fea within the required time (300 s) from the compute API perspective

Captured traceback-1:
~~~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):

      File "/usr/lib/python3.9/site-packages/tempest/common/waiters.py", line 312, in wait_for_volume_resource_status
    raise lib_exc.TimeoutException(message)

    tempest.lib.exceptions.TimeoutException: Request timed out
Details: volume c3a22d9b-f4ca-476a-a0d4-7c0c771c1f29 failed to reach available status (current in-use) within the required time (300 s).

```

Below is the list of other tests failing with same error:
* tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume
* tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_rebuild_server_with_volume_attached
* tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_attach_detach_volume

By taking a look at e4fb37d5-9395-40da-b576-3dd132244fea logs from nova-compute logs [2]
```
2022-01-10 14:34:44.109 ERROR /var/log/containers/nova/nova-compute.log.1: 2 ERROR nova.virt.libvirt.driver [req-e9c9d538-c1d3-435c-8cde-ef2c7181d542 87afcc7329474565ba1a94e10afb94f9 41e8c8794a2d4deaac0b3c374fe282e8 - default default] Waiting for libvirt event about the detach of device vdc with device alias virtio-disk2 from instance e4fb37d5-9395-40da-b576-3dd132244fea is timed out.
2022-01-10 14:34:44.113 ERROR /var/log/containers/nova/nova-compute.log.1: 2 ERROR nova.virt.libvirt.driver [req-e9c9d538-c1d3-435c-8cde-ef2c7181d542 87afcc7329474565ba1a94e10afb94f9 41e8c8794a2d4deaac0b3c374fe282e8 - default default] Run out of retry while detaching device vdc with device alias virtio-disk2 from instance e4fb37d5-9395-40da-b576-3dd132244fea from the live domain config. Device is still attached to the guest.
```

Currently moving these tests to skiplist till we investigate.
Note these tests are passing on CS8.https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-standalone-full-tempest-api-master

Logs:
[1]. https://logserver.rdoproject.org/openstack-periodic-integration-main-cs9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-full-tempest-api-master/1dc7580/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

[2]. https://logserver.rdoproject.org/openstack-periodic-integration-main-cs9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-full-tempest-api-master/1dc7580/logs/undercloud/var/log/extra/errors.txt.gz

summary: - volume tempest tests are failing with failed to detach from server with
- in required time
+ Attach volume tempest tests are failing with failed to detach from
+ server with in required time
description: updated
Revision history for this message
chandan kumar (chkumar246) wrote :

Not sure this info will help

passing job
++++++++++
qemu-guest-agent-6.1.0-8.el9.x86_64
qemu-img-6.1.0-8.el9.x86_64

failed job
++++++++++
qemu-guest-agent-6.2.0-1.el9.x86_64
qemu-img-6.2.0-1.el9.x86_64

in CentOS-8 passing job it is qemu-img-6.0.0-33.el8s.x86_64

Revision history for this message
chandan kumar (chkumar246) wrote :

Just for record, fs035 cs9 job tempest tests are failing with similar error
https://logserver.rdoproject.org/40/37740/5/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master/565e24c/logs/undercloud/var/log/tempest/failing_tests.log.txt.gz

``
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_rebuild_server_with_volume_attached[id-b68bd8d6-855d-4212-b59b-2e704044dace,slow,volume]
tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume[id-f56e465b-fe10-48bf-b75d-646cda3a8bc9,negative,volume]
tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_attach_detach_volume[id-52e9045a-e90d-4c0d-9087-79d657faffff,slow]
tearDownClass (tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON)
tearDownClass (tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON)
tempest.api.compute.volumes.test_attach_volume.AttachVolumeShelveTestJSON.test_attach_volume_shelved_or_offload_server[id-13a940b6-3474-4c3c-b03f-29b89112bfee,slow]

```
and reason is same
https://logserver.rdoproject.org/40/37740/5/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master/565e24c/logs/overcloud-novacompute-0/var/log/extra/errors.txt.gz

```
2022-01-13 03:08:18.987 ERROR /var/log/containers/nova/nova-compute.log: 2 ERROR oslo_messaging.rpc.server nova.exception.DeviceDetachFailed: Device detach failed for vdb: Run out of retry while detaching device vdb with device alias virtio-disk1 from instance 57bfbbad-a9f5-44b0-8e0d-c0654b09cf6a from the live domain config. Device is still attached to the guest.
2022-01-13 03:08:18.987 ERROR /var/log/containers/nova/nova-compute.log: 2 ERROR oslo_messaging.rpc.server
2022-01-13 03:08:30.924 ERROR /var/log/containers/nova/nova-compute.log: 2 ERROR nova.virt.libvirt.driver [req-15edbb6f-f30f-4ba9-bb39-fc46116cbeae 6b315983f378482694c92f0a035e34f9 c845819458e847c0a49d5e9e0d71c029 - default default] Waiting for libvirt event about the detach of device vdb with device alias virtio-disk1 from instance 888d9eb4-7737-421f-867a-d14b1e430929 is timed out.
2022-01-13 03:08:30.928 ERROR /var/log/containers/nova/nova-compute.log: 2 ERROR nova.virt.libvirt.driver [req-15edbb6f-f30f-4ba9-bb39-fc46116cbeae 6b315983f378482694c92f0a035e34f9 c845819458e847c0a49d5e9e0d71c029 - default default] Run out of retry while detaching device vdb with device alias virtio-disk1 from instance 888d9eb4-7737-421f-867a-d14b1e430929 from the live domain config. Device is still attached to the guest.
2022-01-13 03:08:31.058 ERROR /var/log/containers/nova/nova-compute.log: 2 ERROR oslo_messaging.rpc.server [req-15edbb6f-f30f-4ba9-bb39-fc46116cbeae 6b315983f378482694c92f0a035e34f9 c845819458e847c0a49d5e9e0d71c029 - default default] Exception during message handling: nova.exception.DeviceDetachFailed: Device detach failed for vdb: Run out of retry while detaching device vdb with device alias virtio-disk1 from instance 888d9eb4-7737-421f-867a-d14b1e430929 from the live domain config. Device is still attached to the guest.
```

Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
David Vallee Delisle (valleedelisle) wrote :

One of them is because qemu doesn’t send the 2nd device_deleted event:
https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg00542.html

<danpb> dvd: with virtio devices hot-unplug results in TWO DEVICE_DELETED events
<dvd> danpb, this is the bz I was preparing: http://pastebin.test.redhat.com/1021567
<dvd> oh
<danpb> libvirt ignores one of them (the one you show above without any 'device' field
<sean-k-mooney> danpb: we are seeing this upstream too i think so we likely will have to work with canonical to get that backportaded and do it for centos/fedora too
<danpb> QEMU fails to send the second event thuat we actually need
<danpb> libvirt 7.9.0 / 7.10.0, when used in combination with qemu 6.2.0

<danpb> there is a libvirt 8.0.0 build being done today that will fix this by reverting to non-json usage
<danpb> until my qemu fix works its way in

More details under the "Bug fix" section here: https://libvirt.org/news.html#v8-0-0-2022-01-14

Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

https://review.opendev.org/c/openstack/openstack-tempest-skiplist/+/824527 Adding periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master to existing compute tests

Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :
Ronelle Landy (rlandy)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.