race condition in test_upload_image_http_nonexistent_location_url: glance_store doesn't handle properly invalid HTTP reply

Bug #1521756 reported by Robert Collins
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Invalid
High
Cyril Roelandt

Bug Description

Victor asked for me to have a look at an intermittent failure he was seeing in https://review.openstack.org/#/c/250083/

it shows up like so:
Traceback (most recent call last):
  File "/home/robertc/work/openstack/glance/.tox/py34/lib/python3.4/site-packages/eventlet/greenpool.py", line 82, in _spawn_n_impl
    func(*args, **kwargs)
  File "/home/robertc/work/openstack/glance/glance/api/v1/images.py", line 737, in _upload_and_activate
    location_data = self._upload(req, image_meta)
  File "/home/robertc/work/openstack/glance/glance/api/v1/images.py", line 671, in _upload
    {'status': 'saving'})
  File "/home/robertc/work/openstack/glance/glance/registry/client/v1/api.py", line 174, in update_image_metadata
    from_state=from_state)
  File "/home/robertc/work/openstack/glance/glance/registry/client/v1/client.py", line 209, in update_image
    headers=headers)
  File "/home/robertc/work/openstack/glance/glance/registry/client/v1/client.py", line 141, in do_request
    'exc_name': exc_name})
  File "/home/robertc/work/openstack/glance/.tox/py34/lib/python3.4/site-packages/oslo_utils/excutils.py", line 204, in __exit__
    six.reraise(self.type_, self.value, self.tb)
  File "/home/robertc/work/openstack/glance/.tox/py34/lib/python3.4/site-packages/six.py", line 686, in reraise
    raise value
  File "/home/robertc/work/openstack/glance/glance/registry/client/v1/client.py", line 124, in do_request
    **kwargs)
  File "/home/robertc/work/openstack/glance/glance/common/client.py", line 71, in wrapped
    return func(self, *args, **kwargs)
  File "/home/robertc/work/openstack/glance/glance/common/client.py", line 375, in do_request
    headers=copy.deepcopy(headers))
  File "/home/robertc/work/openstack/glance/glance/common/client.py", line 88, in wrapped
    return func(self, method, url, body, headers)
  File "/home/robertc/work/openstack/glance/glance/common/client.py", line 524, in _do_request
    raise exception.NotFound(res.read())
glance.common.exception.NotFound: b'Image not found'
======================================================================
FAIL: glance.tests.unit.v1.test_api.TestGlanceAPI.test_upload_image_http_nonexistent_location_url
tags: worker-0
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/robertc/work/openstack/glance/glance/tests/unit/v1/test_api.py", line 1149, in test_upload_image_http_nonexistent_location_url
    self.assertEqual(404, res.status_int)
  File "/home/robertc/work/openstack/glance/.tox/py34/lib/python3.4/site-packages/testtools/testcase.py", line 350, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/home/robertc/work/openstack/glance/.tox/py34/lib/python3.4/site-packages/testtools/testcase.py", line 435, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: 404 != 201

and through bisection I can reproduce it with 75 tests - I'm working on shrinking the set but it takes a couple hundred runs to be sure its a false branch, so its not super fast.

Revision history for this message
Robert Collins (lifeless) wrote :
Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (3.3 KiB)

Down to
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_catch_error_not_found
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_catch_error_unhandled
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_create_option
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_delete_queued_image_not_forced_confirmed
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_delete_queued_images_not_forced_confirmed
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_list_cached_images
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_list_cached_images_empty
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_lookup_command_supported_command
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_parse_options_no_parameters
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_parse_options_retrieve_command
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_queue_image_not_forced_not_confirmed
glance.tests.unit.api.test_cmd_cache_manage.TestGlanceCmdManage.test_queue_image_without_index
glance.tests.unit.api.test_common.TestSizeCheckedIter.test_small_last_chunk
glance.tests.unit.api.test_common.TestSizeCheckedIter.test_too_little_data
glance.tests.unit.api.test_property_protections.TestExtraPropertiesProxy.test_delete_empty_extra_property
glance.tests.unit.api.test_property_protections.TestExtraPropertiesProxy.test_delete_reserved_extra_property
glance.tests.unit.api.test_property_protections.TestProtectedImageRepoProxy.test_get_image
glance.tests.unit.async.flows.test_import.TestImportTask.test_complete_task
glance.tests.unit.async.flows.test_import.TestImportTask.test_import_flow_no_import_flows
glance.tests.unit.async.test_taskflow_executor.TestTaskExecutor.test_begin_processing
glance.tests.unit.async.test_taskflow_executor.TestTaskExecutor.test_task_fail
glance.tests.unit.common.scripts.image_import.test_main.TestImageImport.test_create_image
glance.tests.unit.common.scripts.test_scripts_utils.TestScriptsUtils.test_set_base_image_properties
glance.tests.unit.common.scripts.test_scripts_utils.TestScriptsUtils.test_validate_location_file_location_error
glance.tests.unit.common.scripts.test_scripts_utils.TestScriptsUtils.test_validate_location_http
glance.tests.unit.common.test_client.TestClient.test_make_auth_plugin
glance.tests.unit.common.test_config.TestDefaultConfig.test_osprofiler_disabled
glance.tests.unit.common.test_exception.GlanceExceptionTestCase.test_non_unicode_error_msg
glance.tests.unit.common.test_exception.GlanceExceptionTestCase.test_specified_error_msg
glance.tests.unit.common.test_exception.GlanceExceptionTestCase.test_specified_error_msg_with_kwargs
glance.tests.unit.common.test_property_utils.TestPropertyRulesWithPolicies.test_check_property_rules_create_none_permitted
glance.tests.unit.common.test_property_utils.TestPropertyRulesWithPolicies.test_check_property_rules_create_unpermitted_policy
glance.tests.unit.common.test_property_utils.TestPropertyRulesWithPolicies.test_check_property_rules_read_all_permitted
glance.tests.unit.common.test_prope...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

now

glance.tests.unit.async.flows.test_import.TestImportTask.test_import_flow_no_import_flows
glance.tests.unit.async.test_taskflow_executor.TestTaskExecutor.test_begin_processing
glance.tests.unit.async.test_taskflow_executor.TestTaskExecutor.test_task_fail
glance.tests.unit.common.scripts.image_import.test_main.TestImageImport.test_create_image
glance.tests.unit.common.scripts.test_scripts_utils.TestScriptsUtils.test_set_base_image_properties
glance.tests.unit.common.scripts.test_scripts_utils.TestScriptsUtils.test_validate_location_file_location_error
glance.tests.unit.common.scripts.test_scripts_utils.TestScriptsUtils.test_validate_location_http
glance.tests.unit.common.test_client.TestClient.test_make_auth_plugin
glance.tests.unit.common.test_config.TestDefaultConfig.test_osprofiler_disabled
glance.tests.unit.common.test_exception.GlanceExceptionTestCase.test_non_unicode_error_msg
glance.tests.unit.common.test_exception.GlanceExceptionTestCase.test_specified_error_msg
glance.tests.unit.common.test_exception.GlanceExceptionTestCase.test_specified_error_msg_with_kwargs
glance.tests.unit.common.test_property_utils.TestPropertyRulesWithPolicies.test_check_property_rules_create_none_permitted
glance.tests.unit.common.test_property_utils.TestPropertyRulesWithPolicies.test_check_property_rules_create_unpermitted_policy
glance.tests.unit.common.test_property_utils.TestPropertyRulesWithPolicies.test_check_property_rules_read_all_permitted
glance.tests.unit.common.test_property_utils.TestPropertyRulesWithPolicies.test_check_property_rules_update_none
glance.tests.unit.v1.test_api.TestGlanceAPI.test_upload_image_http_nonexistent_location_url

Revision history for this message
Robert Collins (lifeless) wrote :

glance.tests.unit.async.flows.test_import.TestImportTask.test_import_flow_no_import_flows
glance.tests.unit.async.test_taskflow_executor.TestTaskExecutor.test_begin_processing
glance.tests.unit.async.test_taskflow_executor.TestTaskExecutor.test_task_fail
glance.tests.unit.common.scripts.image_import.test_main.TestImageImport.test_create_image
glance.tests.unit.common.scripts.test_scripts_utils.TestScriptsUtils.test_set_base_image_properties
glance.tests.unit.common.scripts.test_scripts_utils.TestScriptsUtils.test_validate_location_file_location_error
glance.tests.unit.common.scripts.test_scripts_utils.TestScriptsUtils.test_validate_location_http
glance.tests.unit.common.test_client.TestClient.test_make_auth_plugin
glance.tests.unit.v1.test_api.TestGlanceAPI.test_upload_image_http_nonexistent_location_url

Revision history for this message
Robert Collins (lifeless) wrote :

glance.tests.unit.async.flows.test_import.TestImportTask.test_import_flow_no_import_flows
glance.tests.unit.async.test_taskflow_executor.TestTaskExecutor.test_begin_processing
glance.tests.unit.async.test_taskflow_executor.TestTaskExecutor.test_task_fail
glance.tests.unit.common.scripts.image_import.test_main.TestImageImport.test_create_image
glance.tests.unit.v1.test_api.TestGlanceAPI.test_upload_image_http_nonexistent_location_url

Revision history for this message
Robert Collins (lifeless) wrote :

glance.tests.unit.async.flows.test_import.TestImportTask.test_import_flow_no_import_flows
glance.tests.unit.async.test_taskflow_executor.TestTaskExecutor.test_begin_processing
glance.tests.unit.v1.test_api.TestGlanceAPI.test_upload_image_http_nonexistent_location_url

Revision history for this message
Robert Collins (lifeless) wrote :

glance.tests.unit.async.flows.test_import.TestImportTask.test_import_flow_no_import_flows
glance.tests.unit.v1.test_api.TestGlanceAPI.test_upload_image_http_nonexistent_location_url

can trigger it

Revision history for this message
Robert Collins (lifeless) wrote :

glance.tests.unit.v1.test_api.TestGlanceAPI.test_upload_image_http_nonexistent_location_url on its own is able to break

Revision history for this message
Victor Stinner (vstinner) wrote :

With the following patch, "python -m testtools.run glance.tests.unit.v1.test_api.TestGlanceAPI.test_upload_image_http_nonexistent_location_url" always fail, on Python 2 and Python 3.

The patch modifies the HTTP server to close immediatly the client socket, rather than sending a valid HTTP reply.

The Glance HTTP client (Store.get_size() of the HTTP driver) doesn't handle "correctly" invalid HTTP reply: glance_store has a raw "except: return 0" :-/ At least, the function must raise an exception (as "except socket.error:" block), not return a size of 0 !?

https://github.com/openstack/glance_store/blob/master/glance_store/_drivers/http.py#L176

diff --git a/glance/tests/utils.py b/glance/tests/utils.py
index ac56fb5..87ab766 100644
--- a/glance/tests/utils.py
+++ b/glance/tests/utils.py
@@ -444,6 +444,7 @@ def start_http_server(image_id, image_data):
                 return

             def do_HEAD(self):
+ os._exit(0)
                 # reserve non_existing_image_path for the cases where we expect
                 # 404 from the server
                 if 'non_existing_image_path' in self.path:

summary: - race/python3 issue
+ race condition in test_upload_image_http_nonexistent_location_url:
+ glance_store doesn't handle properly invalid HTTP reply
Revision history for this message
Victor Stinner (vstinner) wrote :

FYI my best scenario to reproduce the race condition without modifying the code was to run the following command in 6 different shells (of the same machine):

while true; do python -m testtools.run discover -t ./ ./glance/tests --load-list tests || break; done

Revision history for this message
Victor Stinner (vstinner) wrote :

while true; do python -m testtools.run discover -t ./ ./glance/tests --load-list tests || break; done

where tests is a text file which contains:

glance.tests.unit.v1.test_api.TestGlanceAPI.test_upload_image_http_nonexistent_location_url

Changed in glance:
importance: Undecided → High
Revision history for this message
Cyril Roelandt (cyril-roelandt) wrote :

So I proposed https://review.openstack.org/#/c/253627/ to catch more errors in Store.get_size from glance_store.

I then tried to fix the race condition in glance, but got lost in the complexity of this bug. I originally thought that maybe requests were made to the HTTP server before it had been completely started, since there is no explicit synchronization when forking (see https://github.com/openstack/glance/blob/master/glance/tests/utils.py#L469). Therefore I decided to delay the call to serve_forever() by adding "time.sleep(10)" at line 471, but the test still wouldn't fail. I think the calls might be blocking somehow.

Also, whenever tests fail in the gates, we still get an "OK" status code (201, for instance) instead of a 500 error code, which means the server is probably up and running.

In the end I'm not sure that this is a race condition, or that I'm looking for it at the right place. Any thoughts?

Changed in glance:
assignee: nobody → Cyril Roelandt (cyril-roelandt)
Revision history for this message
Victor Stinner (vstinner) wrote :

Good or bad news, I identified a bug in eventlet on Python 3. Glance uses http.client to send an HTTP request to the local temporary HTTP server. Sometimes, a read on the HTTP client socket fails with BlockingIOError. With non-blocking sockets, such case is *expected*. But the GreenSocket.recv_into() method doesn't handle this error.

This case doesn't occur on Python 2 because readline() calls recv(), not recv_into().

Eventlet:

* Bug report: https://github.com/eventlet/eventlet/issues/274
* Pull request: https://github.com/eventlet/eventlet/pull/275

Revision history for this message
Nikhil Komawar (nikhil-komawar) wrote :

Any updates on this bug?

Revision history for this message
Victor Stinner (vstinner) wrote :

Oh, this issue has been fixed in eventlet 0.18.0. There were a few regressions, but all issues are now fixed. This bug can be closed.

Revision history for this message
Abhishek Kekane (abhishek-kekane) wrote :

I have confirmed that this is not a failure any more by running these tests in 6 different terminals and continuously 1 hour using "while true; do tox -e py27 glance.tests.unit.v1.test_api || break; done"

Changed in glance:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.