failure to communicate with core storage can result in draining of retracing queue

Bug #1394365 reported by Brian Murray
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Daisy
New
Low
Unassigned

Bug Description

The retracers are designed to handle the occasional error with accessing core files by removing the item from the queue and moving on to the next item. However, on 2014-11-13 there was an error communicating with the core storage system due a package upgrade and the retracing queues were drained completely.

An example of the failure follows:

2014-11-14 00:07:23,696:17005:140504763680512:INFO:root:8f0911be-5ead-11e4-a94f-fa163e4aaad4:swift:swift token:
2014-11-14 00:07:23,840:17005:140504763680512:INFO:root:8f0911be-5ead-11e4-a94f-fa163e4aaad4:swift:Could not retrieve 8f0911be-5ead-11e4-a94f-fa163e4aaad4 (swift):
2014-11-14 00:07:23,840:17005:140504763680512:INFO:root:8f0911be-5ead-11e4-a94f-fa163e4aaad4:swift:Traceback (most recent call last):
  File "/srv/daisy.ubuntu.com/production/daisy/daisy/retracer.py", line 347, in write_swift_bucket_to_disk
    resp_chunk_size=65536)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1082, in get_object
    resp_chunk_size=resp_chunk_size)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1008, in _retry
    rv = func(self.url, self.token, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 709, in get_object
    http_response_content=body)
ClientException: Object GET failed: /daisy-production-cores/8f0911be-5ead-11e4-a94f-fa163e4aaad4 404 Not Found 404 Not Found

The resource could not be found.

2014-11-14 00:07:23,840:17005:140504763680512:INFO:root:8f0911be-5ead-11e4-a94f-fa163e4aaad4:swift:Could not find None
2014-11-14 00:07:23,840:17005:140504763680512:INFO:root:8f0911be-5ead-11e4-a94f-fa163e4aaad4:swift:swift token:
2014-11-14 00:07:23,943:17005:140504763680512:INFO:root:8f0911be-5ead-11e4-a94f-fa163e4aaad4:swift:Could not remove 8f0911be-5ead-11e4-a94f-fa163e4aaad4 (swift):
2014-11-14 00:07:23,944:17005:140504763680512:INFO:root:8f0911be-5ead-11e4-a94f-fa163e4aaad4:swift:Traceback (most recent call last):
  File "/srv/daisy.ubuntu.com/production/daisy/daisy/retracer.py", line 374, in remove_from_swift
    _cached_swift.delete_object(bucket, key)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1114, in delete_object
    return self._retry(None, delete_object, container, obj)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1008, in _retry
    rv = func(self.url, self.token, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 937, in delete_object
    http_response_content=body)
ClientException: Object DELETE failed: daisy-production-cores/8f0911be-5ead-11e4-a94f-fa163e4aaad4 404 Not Found 404 Not Found

The resource could not be found.

Perhaps we should keep a count of the failures and exit after 100 or something.

Revision history for this message
Brian Murray (brian-murray) wrote :

The daisy front ends had an authorization failure traceback.

 File "/usr/lib/python2.7/dist-packages/oops_wsgi/middleware.py", line 208, in oops_middleware
    app(environ, oops_start_response))
  File "/<email address hidden>/daisy/version_middleware.py", line 34, in __call__
    return self.app(environ, custom_start_response)
  File "/<email address hidden>/daisy/wsgi.py", line 98, in app
    response = handle_core_dump(_pool, environ, fileobj, components, content_type)
  File "/<email address hidden>/daisy/wsgi.py", line 47, in handle_core_dump
    return submit_core.submit(_pool, environ, fileobj, uuid, arch)
  File "/<email address hidden>/daisy/submit_core.py", line 302, in submit
    message = write_to_storage_provider(environ, fileobj, uuid)
  File "/<email address hidden>/daisy/submit_core.py", line 229, in write_to_storage_provider
    written = write_to_swift(environ, fileobj, uuid, provider_data)
  File "/<email address hidden>/daisy/submit_core.py", line 103, in write_to_swift
    _cached_swift.put_container(bucket)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1065, in put_container
    return self._retry(None, put_container, container, headers=headers)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1003, in _retry
    self.url, self.token = self.get_auth()
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 991, in get_auth
    insecure=self.insecure)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 328, in get_auth
    insecure=insecure)
  File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 265, in get_keystoneclient_2_0
    raise ClientException('Authorization Failure. %s' % err)

Revision history for this message
Brian Murray (brian-murray) wrote :

About the different tracebacks fo0bar had this opinion:

15:02 < fo0bar> bdmurray: could be that
                python-swiftclient silently ignored the
                internal auth error and fell back to
                trying to GET it unauthenticated,
                resulting in the 404. suckage if that's
                the case, but it's a random guess that
                fits the outcome
15:02 < fo0bar> should be relatively easy to write a
                script to test that assumption, just
                feed it random auth data

Changed in daisy:
importance: Undecided → Medium
Revision history for this message
Brian Murray (brian-murray) wrote :

This seems less important given that we no longer have a backlog in the queues.

Changed in daisy:
importance: Medium → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.