cinder store: Volume is not disconnected timely after downloading image data

Bug #2030855 reported by Takashi Kajinami
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glance_store
New
Undecided
Unassigned

Bug Description

In Puppet OpenStack project, we have attempted to switch glance-api deployment from standalone to httpd+mod_wsgi.

However this has been blocked by the following failure in tempest execution.

example:
https://zuul.opendev.org/t/openstack/build/b0712432dc80455db6234b4ae201dda8

```
tearDownClass (tempest.api.image.v2.test_images.BasicOperationsImagesTest)
--------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python3.9/site-packages/tempest/test.py", line 267, in tearDownClass
    raise value.with_traceback(trace)
      File "/usr/lib/python3.9/site-packages/tempest/test.py", line 231, in tearDownClass
    teardown()
      File "/usr/lib/python3.9/site-packages/tempest/test.py", line 649, in resource_cleanup
    raise testtools.MultipleExceptions(*cleanup_errors)
    testtools.runtest.MultipleExceptions: ((<class 'tempest.lib.exceptions.ServerFault'>, Got server fault
Details: The server has either erred or is incapable of performing the requested operation.<br /><br />

, <traceback object at 0x7ff72b9d3580>), (<class 'tempest.lib.exceptions.TimeoutException'>, Request timed out
Details: (BasicOperationsImagesTest:tearDownClass) Failed to delete image 57b250ab-4f80-4063-957e-47b14610b7ff within the required time (300 s). Timer started at 1691251732. Timer ended at 1691252032. Waited for 300 s., <traceback object at 0x7ff72ba333c0>))

```

Looking into glance-api log. it seems glance attempted to delete the volume which is not in a deletable state.

```
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi [None req-c745bf78-4274-40a3-9547-2bba02e629a7 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Caught error: Failed to delete volume 7136a2b1-9c7d-4084-a63e-4d8b91ec2340: Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots, awaiting a transfer, or be disassociated from snapshots after volume transfer. (HTTP 400) (Request-ID: req-79684329-8a56-4648-bd3c-bf09bef88e56): glance_store.exceptions.BackendException: Failed to delete volume 7136a2b1-9c7d-4084-a63e-4d8b91ec2340: Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots, awaiting a transfer, or be disassociated from snapshots after volume transfer. (HTTP 400) (Request-ID: req-79684329-8a56-4648-bd3c-bf09bef88e56)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi Traceback (most recent call last):
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/glance_store/_drivers/cinder/store.py", line 1107, in delete
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi self.volume_api.delete(client, loc.volume_id)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/glance_store/common/cinder_utils.py", line 71, in delete
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi client.volumes.delete(volume_id)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/cinderclient/v3/volumes_base.py", line 235, in delete
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi return self._delete(loc)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/cinderclient/base.py", line 313, in _delete
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi resp, body = self.api.client.delete(url)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/cinderclient/client.py", line 229, in delete
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi return self._cs_request(url, 'DELETE', **kwargs)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/cinderclient/client.py", line 211, in _cs_request
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi return self.request(url, method, **kwargs)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/cinderclient/client.py", line 197, in request
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi raise exceptions.from_response(resp, body)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi cinderclient.exceptions.BadRequest: Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots, awaiting a transfer, or be disassociated from snapshots after volume transfer. (HTTP 400) (Request-ID: req-79684329-8a56-4648-bd3c-bf09bef88e56)
```

Further investigation revealed that the volume was attachend when tempest downloaded image data, and was not attached before the subsequent delete calls were executed.

```
2023-08-05 16:08:43.290 71469 DEBUG glance.api.middleware.version_negotiation [None req-03ab3cae-b314-4640-8a8c-53214ab5419b 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Determining version of request: GET /v2/images/57b250ab-4f80-4063-957e-47b14610b7ff/file Accept: application/json process_request /usr/lib/python3.9/site-packages/glance/api/middleware/version_negotiation.py:44
2023-08-05 16:08:43.302 71469 DEBUG glance_store._drivers.cinder.store [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Cinderclient connection created for user glance using URL: https://[::1]:5000/v3. get_cinderclient /usr/lib/python3.9/site-packages/glance_store/_drivers/cinder/store.py:632
2023-08-05 16:08:43.337 71469 DEBUG os_brick.utils [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] ==> get_connector_properties: call "{'root_helper': 'sudo glance-rootwrap /etc/glance/rootwrap.conf', 'my_ip': '2001:4800:7819:104:be76:4eff:fe03:bfe3', 'multipath': False, 'enforce_multipath': False, 'host': 'np0034868732', 'execute': None}" trace_logging_wrapper /usr/lib/python3.9/site-packages/os_brick/utils.py:175
...
2023-08-05 16:08:50.422 71469 DEBUG os_brick.initiator.connectors.base [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Lock "connect_volume" "released" by "os_brick.initiator.connectors.iscsi.ISCSIConnector.connect_volume" :: held 1.135s inner /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/base.py:87
2023-08-05 16:08:50.423 71469 DEBUG os_brick.initiator.connectors.iscsi [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] <== connect_volume: return (1135ms) {'type': 'block', 'scsi_wwn': '360014053afd57155dd443b6aba7dcf1a', 'path': '/dev/sda'} trace_logging_wrapper /usr/lib/python3.9/site-packages/os_brick/utils.py:202
...
2023-08-05 16:08:51.656 71470 DEBUG glance.api.middleware.version_negotiation [None req-7e33cebd-3d0b-40f7-9f1b-6c90d7ec3b16 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Determining version of request: DELETE /v2/images/57b250ab-4f80-4063-957e-47b14610b7ff Accept: application/json process_request /usr/lib/python3.9/site-packages/glance/api/middleware/version_negotiation.py:44
2023-08-05 16:08:51.823 71470 DEBUG glance.api.middleware.version_negotiation [None req-c745bf78-4274-40a3-9547-2bba02e629a7 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Determining version of request: DELETE /v2/images/57b250ab-4f80-4063-957e-47b14610b7ff Accept: application/json process_request /usr/lib/python3.9/site-packages/glance/api/middleware/version_negotiation.py:44
2023-08-05 16:08:51.968 71470 DEBUG glance.api.middleware.version_negotiation [None req-4e120ce0-3581-4d01-9f31-f1555f8224b1 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Determining version of request: DELETE /v2/images/57b250ab-4f80-4063-957e-47b14610b7ff Accept: application/json process_request /usr/lib/python3.9/site-packages/glance/api/middleware/version_negotiation.py:44
...
2023-08-05 16:08:52.292 71469 DEBUG os_brick.initiator.connectors.base [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Lock "connect_volume" "released" by "os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: held 0.285s inner /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/base.py:87
2023-08-05 16:08:52.292 71469 DEBUG os_brick.initiator.connectors.iscsi [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] <== disconnect_volume: return (287ms) None trace_logging_wrapper /usr/lib/python3.9/site-packages/os_brick/utils.py:202
```

However the API request of image downloaded completed in ~8 seconds and I don't see it's stuck. From tempest's PoV the download completed so it's reasonable it attempted to delete the image.

```
2023-08-05 16:08:51.234 75640 INFO tempest.lib.common.rest_client [req-3d502f08-a8c0-42da-93e7-51ebf87913c6 req-3d502f08-a8c0-42da-93e7-51ebf87913c6 ] Request (BasicOperationsImagesTest:test_register_upload_get_image_file): 200 GET https://[::1]:9292/v2/images/57b250ab-4f80-4063-957e-47b14610b7ff/file 7.993s
2023-08-05 16:08:51.234 75640 DEBUG tempest.lib.common.rest_client [req-3d502f08-a8c0-42da-93e7-51ebf87913c6 req-3d502f08-a8c0-42da-93e7-51ebf87913c6 ] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: None
    Response - Headers: {'date': 'Sat, 05 Aug 2023 16:08:43 GMT', 'server': 'Apache', 'content-length': '1024', 'content-md5': 'eeb8857baba3a1928b130eb022bcaa55', 'x-openstack-request-id': 'req-3d502f08-a8c0-42da-93e7-51ebf87913c6', 'connection': 'close', 'content-type': 'application/octet-stream', 'status': '200', 'content-location': 'https://[::1]:9292/v2/images/57b250ab-4f80-4063-957e-47b14610b7ff/file'}
        Body: b'V\x89\x85...' _log_request_full /usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py:464
```

We haven't seen this issue when glance-api is deployed as a standalone service.
I'm wondering if we can improve some handlings to avoid this timing problem.

Revision history for this message
Takashi Kajinami (kajinamit) wrote :

It seems tempest attempted to delete the image 3 times and glance returned 500. I don't see further attempts made by tempest since then.

~~~
::1 - - [05/Aug/2023:16:08:51 +0000] "DELETE /v2/images/57b250ab-4f80-4063-957e-47b14610b7ff HTTP/1.1" 500 186 "-" "python-urllib3/1.26.5"
::1 - - [05/Aug/2023:16:08:51 +0000] "DELETE /v2/images/57b250ab-4f80-4063-957e-47b14610b7ff HTTP/1.1" 500 186 "-" "python-urllib3/1.26.5"
::1 - - [05/Aug/2023:16:08:51 +0000] "DELETE /v2/images/57b250ab-4f80-4063-957e-47b14610b7ff HTTP/1.1" 500 186 "-" "python-urllib3/1.26.5"
~~~

We might have to fix/tune tempest to improve this behavior.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.