Backend storage disconnections do not raise exceptions
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Glance |
Fix Released
|
High
|
Jay Pipes |
Bug Description
We'd like to suggest an improvement to how errors in a glance API server's backend are detected.
Consider a configuration where a Nova client node makes a request of a Glance API server which
has a Swift backend. The Glance API server will in turn make a request of the Swift proxy, which will
then make a request of a Swift object server (one of three). Once the object server responds,
a http response code of 200 is optimistically returned from the Swift proxy server to the Glance API server.
Then the data starts to be streamed from the object server in ‘chunks’ to the proxy and from
there to the Glance API server and from there to the Nova client.
If, for any reason, the object server ceases to respond (its overloaded, it crashes and so on)
then the proxy server will observe a ChunkReadTimeout from the object server and will raise
an exception which the wsgi service uses to terminate the connection to the Glance API server.
So far so good.
Remembering that all this is implemented using python iterations and generators, what we
observed is that the iterator in glance/
the connection is terminated. We also observed that this leads to a situation whereby the Nova
node kept an open socket to the Glance API server which never went away, and the Nova operation
stalled. Updating the Glance API server side to detect a disconnect and raise an exception
caused the Glance server to disconnect the Nova client.
The following diff illustrates the change that was required. Inserting it at the get_from_backend()
level means that any underlying backend that disconnects is covered.
diff --git a/glance/
index b80a164..3871389 100644
--- a/glance/
+++ b/glance/
@@ -19,6 +19,7 @@
/images endpoint for Glance v1 API
"""
+import errno
import httplib
import json
import logging
@@ -210,6 +211,24 @@ class Controller(
"""
image = self.get_
+
+ def gen_from_
+ bytes_transferred = 0
+ try:
+ for chunk in image_data:
+ yield chunk
+ bytes_transferred += len(chunk)
+ except Exception, e:
+ msg = ("Error getting image: %s") % str(e)
+ logger.error(msg)
+ raise
+ if image_meta['size'] != bytes_transferred:
+ logger.
+ raise IOError(
+ else:
+ logger.
+
+
def get_from_
try:
@@ -218,7 +237,7 @@ class Controller(
except exception.NotFound, e:
- return image_data
+ return gen_from_
def get_from_
Note too that at the Nova end when a ‘short’ or ‘corrupt’ image is then received there should be
a check of the length of the received data AND probably the checksum. This could be in glance
client or Nova code or a combination thereof.
Changed in glance: | |
status: | Fix Committed → Fix Released |
Changed in glance: | |
milestone: | essex-1 → 2012.1 |
I'm on it, Tom. Should be pushing a fix shortly based on your code above (has to change slightly because of recent updates, but mostly the same).
-jay