AttachVolumeMultiAttachTest.test_resize_server_with_multiattached_volume intermittently fails with "Unable to detach from (guest transient domain|the live config)."

Bug #1770211 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Medium
Unassigned

Bug Description

http://logs.openstack.org/37/522537/27/check/nova-multiattach/7af78b6/job-output.txt.gz#_2018-05-08_17_39_54_410553

2018-05-08 17:39:54.410553 | primary | {2} tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_resize_server_with_multiattached_volume [672.593719s] ... FAILED
2018-05-08 17:39:54.412202 | primary |
2018-05-08 17:39:54.412269 | primary | Captured traceback-2:
2018-05-08 17:39:54.412327 | primary | ~~~~~~~~~~~~~~~~~~~~~
2018-05-08 17:39:54.412393 | primary | Traceback (most recent call last):
2018-05-08 17:39:54.412483 | primary | File "tempest/common/waiters.py", line 211, in wait_for_volume_resource_status
2018-05-08 17:39:54.412559 | primary | raise lib_exc.TimeoutException(message)
2018-05-08 17:39:54.412629 | primary | tempest.lib.exceptions.TimeoutException: Request timed out
2018-05-08 17:39:54.412755 | primary | Details: volume b4c0ac0e-5814-4092-9fef-658691f2b702 failed to reach available status (current detaching) within the required time (196 s).

http://logs.openstack.org/37/522537/27/check/nova-multiattach/7af78b6/logs/screen-n-cpu.txt.gz?level=TRACE#_May_08_17_31_46_919238

May 08 17:31:46.919238 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: WARNING nova.virt.block_device [None req-45144cb8-5878-4f55-9f5b-adac59c02685 tempest-ServerDiskConfigTestJSON-1657103191 tempest-ServerDiskConfigTestJSON-1657103191] [instance: 675ac2f4-9483-4766-b31c-714cb314c53d] Guest refused to detach volume b4c0ac0e-5814-4092-9fef-658691f2b702: DeviceDetachFailed: Device detach failed for vdb: Unable to detach from guest transient domain.

I'm not sure why the log says "ServerDiskConfigTestJSON" in it, that could be because of our known issue with cached oslo.context request ID information in the service code. But this was definitely on a multiattach volume, as seen here:

RESP BODY: {"volume": {"status": "in-use", "user_id": "97d56cb74d094f3bb5412594d0d69105", "attachments": [{"server_id": "675ac2f4-9483-4766-b31c-714cb314c53d", "attachment_id": "cc3cc4ac-cbf7-4213-a278-3b6c4822faad", "attached_at": "2018-05-08T17:29:36.000000", "host_name": "ubuntu-xenial-ovh-gra1-0003921747", "volume_id": "b4c0ac0e-5814-4092-9fef-658691f2b702", "device": "/dev/vdb", "id": "b4c0ac0e-5814-4092-9fef-658691f2b702"}], "links": [{"href": "http://149.202.186.74/volume/v3/2c7c9754edbc493b9f7b35fa1860ce2e/volumes/b4c0ac0e-5814-4092-9fef-658691f2b702", "rel": "self"}, {"href": "http://149.202.186.74/volume/2c7c9754edbc493b9f7b35fa1860ce2e/volumes/b4c0ac0e-5814-4092-9fef-658691f2b702", "rel": "bookmark"}], "availability_zone": "nova", "bootable": "false", "encrypted": false, "created_at": "2018-05-08T17:29:05.000000", "description": null, "os-vol-tenant-attr:tenant_id": "2c7c9754edbc493b9f7b35fa1860ce2e", "updated_at": "2018-05-08T17:43:46.000000", "volume_type": "lvmdriver-1", "name": "tempest-AttachVolumeMultiAttachTest-volume-317264940", "replication_status": null, "consistencygroup_id": null, "source_volid": null, "snapshot_id": null, "multiattach": true, "metadata": {}, "id": "b4c0ac0e-5814-4092-9fef-658691f2b702", "size": 1}}

This is where we start detaching the volume:

http://logs.openstack.org/37/522537/27/check/nova-multiattach/7af78b6/logs/screen-n-cpu.txt.gz#_May_08_17_30_05_620402

May 08 17:30:05.620402 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: INFO nova.virt.block_device [None req-5ef02975-4f91-4b24-963b-1cf67fcaac1a tempest-AttachVolumeMultiAttachTest-169541097 tempest-AttachVolumeMultiAttachTest-169541097] [instance: 675ac2f4-9483-4766-b31c-714cb314c53d] Attempting to driver detach volume b4c0ac0e-5814-4092-9fef-658691f2b702 from mountpoint /dev/vdb
May 08 17:30:05.630483 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: DEBUG nova.virt.libvirt.guest [None req-5ef02975-4f91-4b24-963b-1cf67fcaac1a tempest-AttachVolumeMultiAttachTest-169541097 tempest-AttachVolumeMultiAttachTest-169541097] Attempting initial detach for device vdb {{(pid=27633) detach_device_with_retry /opt/stack/new/nova/nova/virt/libvirt/guest.py:426}}
May 08 17:30:05.632159 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: DEBUG nova.virt.libvirt.guest [None req-5ef02975-4f91-4b24-963b-1cf67fcaac1a tempest-AttachVolumeMultiAttachTest-169541097 tempest-AttachVolumeMultiAttachTest-169541097] detach device xml: <disk type="block" device="disk">
May 08 17:30:05.632312 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: <driver name="qemu" type="raw" cache="none" io="native"/>
May 08 17:30:05.632489 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: <source dev="/dev/sde"/>
May 08 17:30:05.632647 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: <target bus="virtio" dev="vdb"/>
May 08 17:30:05.632793 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: <serial>b4c0ac0e-5814-4092-9fef-658691f2b702</serial>
May 08 17:30:05.632932 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: <shareable/>
May 08 17:30:05.633070 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: <address type="pci" domain="0x0000" bus="0x00" slot="0x05" function="0x0"/>
May 08 17:30:05.633196 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: </disk>
May 08 17:30:05.633365 ubuntu-xenial-ovh-gra1-0003921747 nova-compute[27633]: {{(pid=27633) detach_device /opt/stack/new/nova/nova/virt/libvirt/guest.py:477}}

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Guest%20refused%20to%20detach%20volume%5C%22%20AND%20message%3A%5C%22Unable%20to%20detach%20from%20guest%20transient%20domain.%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

8 hits in 7 days, multiple changes, all failures, mostly in the multiattach job.

melanie witt (melwitt)
summary: AttachVolumeMultiAttachTest.test_resize_server_with_multiattached_volume
- intermittently fails with "Unable to detach from guest transient
- domain."
+ intermittently fails with "Unable to detach from (guest transient
+ domain|the live config)."
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.