Glance caching not correctly handling backend failures

Bug #1045792 reported by Paul Bourke on 2012-09-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Medium
Paul Bourke

Bug Description

Currently the cache assumes any exception fired is the result of a problem writing to the cache, and tries to continue the response. In the event of a backend failure this causes clients to become stuck waiting for data that isn't coming.

Also, if the cache file is bad, either through the backend serving out corrupt data or otherwise, the file is still placed in and subsequently served out of the cache.

Fix proposed to branch: master
Review: https://review.openstack.org/12347

Changed in glance:
assignee: nobody → Paul Bourke (pauldbourke)
status: New → In Progress

Reviewed: https://review.openstack.org/12347
Committed: http://github.com/openstack/glance/commit/95e00c9247d5c56c1184d788ca8c1f9b165a25ba
Submitter: Jenkins
Branch: master

commit 95e00c9247d5c56c1184d788ca8c1f9b165a25ba
Author: Paul Bourke <email address hidden>
Date: Mon Sep 3 11:17:54 2012 +0100

    Fix cache not handling backend failures

    1) caching_iter doesn't handle backend exceptions:

    caching_iter assumes any exception that occurs is the result of being
    unable to cache. Hence the IOError raised from size_checked_iter, which
    indicates a problem with the backend, means the caching_iter will
    continuing trying to serve non-existent data. The exception was not
    been re-raised in this case, making wsgi keep the connection open and
    clients stuck forever waiting for more data.

    Raising a GlanceException in size_checked_iter rather than an IOError
    allows caching_iter to distinguish between a problem fetching data, and
    a problem writing to the cache.

    2) Checksum verification happens after cache commit rather than before:

    This block was outside the context manager block which meant the
    GlanceException was not caught by open_for_write and the rollback didn't
    happen. This resulted in an error been logged, but the bad image still
    placed in and subsequently served from the cache.

    Also:

    * Fix test_gate_caching_iter_bad_checksum - the loop to consume the
      iterator in was in a subroutine that never got called.

    * Move test_gate_caching_iter_(good|bad)_checksum into
      ImageCacheTestCase to excercise both the sql and xattr drivers.

    * Remove invalid registry_host/registry_port params from
      TestImageCacheXattr/TestImageCacheSqlite setup which caused a failure
      when testing the file on it's own using nosetests.

    Fixes bug 1045792

    Change-Id: I8aedec347e7f50566c44c5b6c6db424573c5ebaf

Changed in glance:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-09-11
Changed in glance:
milestone: none → folsom-rc1
status: Fix Committed → Fix Released
Brian Waldon (bcwaldon) on 2012-09-12
Changed in glance:
importance: Undecided → Medium
Thierry Carrez (ttx) on 2012-09-27
Changed in glance:
milestone: folsom-rc1 → 2012.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers