web-download with invalid url does not report error

Bug #1914826 reported by Dan Smith
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Undecided
Dan Smith

Bug Description

In my testing, if I provide a URL to web-download that yields an error from urlopen(), I never see the store listed in the os_glance_failed_import list, and the store remains in os_glance_importing_to_stores. The image status does not change, which means there's no way for the API client to know that the import failed.

I found this when debugging a gate issue where occasionally the tempest web-download test fails. It ends up waiting many minutes for the import to complete, even though it failed long before that. In that case, the cirros link we use for testing web-download raised a timeout.

From my log, here what we log as returning to the user just before we start the import:

Feb 05 20:18:02 guaranine <email address hidden>[1008592]: DEBUG oslo_policy.policy [-] enforce: rule="modify_image" creds={"domain_id": null, "is_admin_project": true, "project_domain_id": "default", "project_id": "59a5997403484e97803cac28b7aa7366", "roles": ["reader", "member"], "service_project_domain_id": null, "service_project_id": null, "service_roles": [], "service_user_domain_id": null, "service_user_id": null, "system_scope": null, "user_domain_id": "default", "user_id": "10e5d60c60e54ab3889bcd57e367fe01"} target={"checksum": null, "container_format": "bare", "created_at": "2021-02-05T20:18:03.000000", "disk_format": "raw", "extra_properties": {}, "image_id": "70917fce-bfc6-4d57-aa54-58235d09cf24", "locations": [], "min_disk": 0, "min_ram": 0, "name": "test", "os_glance_failed_import": "", "os_glance_import_task": "e2cb5441-8c92-45c6-9363-f4b7915401e1", "os_glance_importing_to_stores": "cheap", "os_hash_algo": null, "os_hash_value": null, "os_hidden": false, "owner": "59a5997403484e97803cac28b7aa7366", "protected": false, "size": null, "status": "importing", "tags": [], "updated_at": "2021-02-05T20:18:03.000000", "virtual_size": null, "visibility": "shared"} {{(pid=1008592) enforce /usr/local/lib/python3.8/dist-packages/oslo_policy/policy.py:994}}

Note that os_glance_importing_to_stores="cheap" and os_glance_failed_import="". Shortly after this, the web-download task fails:

Feb 05 20:18:03 guaranine <email address hidden>[1008592]: ERROR glance.async_.flows._internal_plugins.web_download [-] Task e2cb5441-8c92-45c6-9363-f4b7915401e1 failed with exception <urlopen error [Errno -2] Name or service not known>: urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>

Here's where the task is fully reverted:

Feb 05 20:18:03 guaranine <email address hidden>[1008592]: WARNING glance.async_.taskflow_executor [-] Task 'api_image_import-WebDownlo
ad-e2cb5441-8c92-45c6-9363-f4b7915401e1' (bc722b5c-ddd4-404b-9c09-8625ed9c5941) transitioned into state 'REVERTED' from state 'REVERTIN
G' with result 'None'

And after that, here's what we're still returning to the user:

Feb 05 20:18:03 guaranine <email address hidden>[1008592]: DEBUG oslo_policy.policy [-] enforce: rule="get_image" creds={"domain_id": n
ull, "is_admin_project": true, "project_domain_id": "default", "project_id": "59a5997403484e97803cac28b7aa7366", "roles": ["reader", "m
ember"], "service_project_domain_id": null, "service_project_id": null, "service_roles": [], "service_user_domain_id": null, "service_u
ser_id": null, "system_scope": null, "user_domain_id": "default", "user_id": "10e5d60c60e54ab3889bcd57e367fe01"} target={"checksum": nu
ll, "container_format": "bare", "created_at": "2021-02-05T20:18:03.000000", "disk_format": "raw", "extra_properties": {}, "image_id": "
70917fce-bfc6-4d57-aa54-58235d09cf24", "locations": [], "min_disk": 0, "min_ram": 0, "name": "test", "os_glance_failed_import": "", "os
_glance_import_task": "e2cb5441-8c92-45c6-9363-f4b7915401e1", "os_glance_importing_to_stores": "cheap", "os_hash_algo": null, "os_hash_
value": null, "os_hidden": false, "owner": "59a5997403484e97803cac28b7aa7366", "protected": false, "size": null, "status": "queued", "t
ags": [], "updated_at": "2021-02-05T20:18:03.000000", "virtual_size": null, "visibility": "shared"} {{(pid=1008592) enforce /usr/local/
lib/python3.8/dist-packages/oslo_policy/policy.py:994}}

Note that os_glance_importing_to_stores="cheap" and os_glance_failed_import="". In this case, "cheap" should have moved from "importing" to "failed".

I wrote a tempest negative test for this situation using a totally bogus URL, which is here:

https://review.opendev.org/c/openstack/tempest/+/774303

Revision history for this message
Dan Smith (danms) wrote :

This stack should fix the problem:

https://review.opendev.org/c/openstack/glance/+/774315

That link is to a DNM that will pull in the tempest test to validate the fix. The actual fix is below it, and there's another refactor patch required for it to work below that.

I imagine this has been broken forever, so not a regression and thus not critical for a backport.

Changed in glance:
status: New → In Progress
assignee: nobody → Dan Smith (danms)
Revision history for this message
Dan Smith (danms) wrote :

Just want to clarify one thing from the original bug text:

> The image status does not change, which means there's no way for the API client to
> know that the import failed.

Technically the image status does go from queued -> importing -> queued when this happens. However, in the case of a non-timeout failure (like is the case if you get a 404 from the remote server or "Connection Refused"), it's very likely that a well-behaved client won't poll quick enough to see that transition. Since the transition to importing happens async from the return, if you wait a few seconds before the first poll for completion, you'll see that the status is queued (which you may think is unchanged), but the "os_glance_importing_to_stores" key lists your store, looking like things are progressing even though they are not. If you know enough about glance internals, you could deduce that this state actually means failure, but that's not reasonable for a client (IMHO) and also could change later as the internals of glance change.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/glance 21.0.0.0b3

This issue was fixed in the openstack/glance 21.0.0.0b3 development milestone.

Changed in glance:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers