copy-from broken for large files and swift

Bug #1412802 reported by Alexander Tivelkov
42
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Glance
Fix Released
Critical
Alexander Tivelkov
glance_store
Won't Fix
Medium
Stuart McLaren

Bug Description

Glance may loose some image data while transferring it to the backing store thus corrupting the image when ALL the following conditions are met:

- Image is being created by copying data from remote source (--copy-from CLI parameter or appropriate API call)
- Backing store is Swift
- Image size is larger then configured "swift_store_large_object_size"

In such scenarios the last chunk stored in Swift will have the size significantly less then expected. An attempt to download the image will result in a checksum verification error, however the checksum stored in Glance (with image metadata) is correct, and so is the size.

This is easily reproducible even on devstack (if the devstack is configured to run Swift as Glance backend). Just decrease 'swift_store_large_object_size' to some reasonably low value (i.e. 200 Mb) and try to copy-from any image which is larger then that value. After the upload is successful, check the object size in swift (by either summing the sizes of all the chunks or by looking to the size of virtual large object) - they will be lower then expected:

glance image-create --name tst --disk-format qcow2 --container-format bare --copy-from http://192.168.56.1:8000/F18-x86_64-cfntools.qcow2

...

glance image-list
+--------------------------------------+---------------------------------+-------------+------------------+-----------+--------+
| ID | Name | Disk Format | Container Format | Size | Status |
+--------------------------------------+---------------------------------+-------------+------------------+-----------+--------+
| fc34ec49-4bd3-40dd-918f-44d3254f2ac9 | tst | qcow2 | bare | 536412160 | active |
+--------------------------------------+---------------------------------+-------------+------------------+-----------+--------+

...

swift stat glance fc34ec49-4bd3-40dd-918f-44d3254f2ac9 --os-tenant-name service --os-username admin
       Account: AUTH_cce6e9c12fa34c63b64ef29e84861554
     Container: glance
        Object: fc34ec49-4bd3-40dd-918f-44d3254f2ac9
  Content Type: application/octet-stream
Content Length: 509804544 <---- see, the size is different!
 Last Modified: Mon, 19 Jan 2015 15:52:18 GMT
          ETag: "6d0612f82db9a531b34d25823a45073d"
      Manifest: glance/fc34ec49-4bd3-40dd-918f-44d3254f2ac9-
 Accept-Ranges: bytes
   X-Timestamp: 1421682737.01148
    X-Trans-Id: tx01a19f7476a541808c9a1-0054bd28e1

....

glance image-download tst --file out.qcow2
[Errno 32] Corrupt image download. Checksum was 0eeddae1007f01b0029136d28518f538 expected 3ecddfe0787a392960d230c87a421c6a

Changed in glance:
assignee: nobody → Alexander Tivelkov (ativelkov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/148572

Changed in glance:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/148574

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on glance (master)

Change abandoned by Alexander Tivelkov (<email address hidden>) on branch: master
Review: https://review.openstack.org/148572
Reason: Duplicated for some reason

Changed in glance:
importance: Undecided → Medium
tags: added: icehouse-backport-potential
tags: added: juno-backport-potential
Changed in glance:
importance: Medium → Critical
milestone: none → kilo-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance (master)

Reviewed: https://review.openstack.org/148574
Committed: https://git.openstack.org/cgit/openstack/glance/commit/?id=270ec44a890f4bd2917310e2fcab9ad2bb9413b7
Submitter: Jenkins
Branch: master

commit 270ec44a890f4bd2917310e2fcab9ad2bb9413b7
Author: Alexander Tivelkov <email address hidden>
Date: Tue Jan 20 17:25:07 2015 +0300

    Fix for CooperativeReader to process read length

    CooperativeReader, being an eventlet-friendly wrapper around the
    generator- based reader of image data, actually transforms
    chunk-by-chunk iteration into the readable stream. It is used when the
    image is being copied from the remote source: some generator-based
    image data representing the remote source acts as its underlying
    object, and the instance of CooperativeReader is passed as a data
    stream to the backend client which uses it to read the data.

    Before this patch, the CooperativeReader was ignoring the "length"
    parameter of the read method, always returning the whole chunk returned
    by the underlying generator (in case of HTTP source the size of this
    chunk is 16 M). This was causing problems for the clients attempting to
    read data from it, and - under some circumstances - the loss of data.

    For chunked storage of files in Swift a special class (ChunkReader,
    declared in the swift store driver) is used to reduce the requested
    read length so no extra data is read and transferred. However, this was
    not working as the CooperativeReader (which was the underlying stream
    for the ChunkReader) was ignoring the requested size. This was causing
    the data to be lost when reading behind the boundaries of the Chunks.

    This patchset introduces a buffer in the CooperativeReader to store the
    most recently fetched iterator chunk. The reads are independent from
    requests to iterator, so the CooperativeReader is able to return the
    exact requested amount of bytes and no data is lost due to extra-reads.

    SecurityImpact

    Change-Id: Ief37d1e29487bb03e612320f5cc06910cfd1c23a
    Closes-bug: #1412802

Changed in glance:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in glance:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/153502

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/153508

Revision history for this message
Nikhil Komawar (nikhil-komawar) wrote :

Although, the fix proposed to master works, it needs a bit of refactor to avoid cruft like iters wrapped around iters. (Please see comments on the merged review).

We need to refactor this code before backports happen.

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

I did some digging to try to understand the issue a little better for myself.

I checked out the Essex code base which contained the copy-from functionality (including the ResponseIndexable class) but did not include the CooperativeReader yet.

Running the reproducer gave:

 2015-02-06 16:01:36 23295 ERROR [glance.api.v1.images] Traceback (most recent call last):
   File "/mnt/ubuntu/git/glance-essex/glance/glance/api/v1/images.py", line 398, in _upload
     image_size)
   File "/mnt/ubuntu/git/glance-essex/glance/glance/store/swift.py", line 418, in add
     content_length=content_length)
   File "/mnt/ubuntu/git/glance-essex/glance/.venv/local/lib/python2.7/site-packages/swift/common/client.py", line 940, in put_object
     content_type=content_type, headers=headers)
   File "/mnt/ubuntu/git/glance-essex/glance/.venv/local/lib/python2.7/site-packages/swift/common/client.py", line 844, in _retry
     rv = func(self.url, self.token, *args, **kwargs)
   File "/mnt/ubuntu/git/glance-essex/glance/.venv/local/lib/python2.7/site-packages/swift/common/client.py", line 709, in put_object
     chunk = contents.read(size)
   File "/mnt/ubuntu/git/glance-essex/glance/glance/store/swift.py", line 532, in read
     result = self.fd.read(i)
 AttributeError: 'ResponseIndexable' object has no attribute 'read'

So, as far as I can tell, copy-from *never* worked when the image was larger than 'swift_store_large_object_size'.

With CooperativeReader present we just get a more subtle error.

We supply an iterator to wsgi for a normal download (it's all it needs). Hence _get_from_store doesn't supply a 'read' method if the underlying file descriptor/iterator doesn't have one, and downloads work fine.

response.app_iter = checked_iter(image_id, expected_size, image_iter)

So the original iterator:

 def http_response_iterator(conn, response, size):
    """
    Return an iterator for a file-like object.

    :param conn: HTTP(S) Connection
    :param response: httplib.HTTPResponse object
    :param size: Chunk size to iterate with
    """
    chunk = response.read(size)
    while chunk:
        yield chunk
        chunk = response.read(size)
    conn.close()

worked fine for standard downloads.

There seems to be a bit of a catch-22, with evenlet's greenio when reading in the data over the network. It requires more than a typical iterator/generator provides, eg it also needs read() and len. ResponseIndexable adds these, but doesn't work for images with size > swift_store_large_object_size. (See the stack trace above).

Alexander's patch makes the read behaviour of CooperativeReader respect the read length passed in. This effectively masks the lack of a working read method in 'ResponseIndexable'. (But it works! Yay!)

I *think* we could get rid of a lot of the iterator complexity (eg even http_response_iterator) by moving to 'requests', but I'd need to try it out a bit more to be sure.

Revision history for this message
Nikhil Komawar (nikhil-komawar) wrote :

Thanks for the detailed investigation on this Stuart!

I'm not confident on how fast we can make a transition to requests. On the other hand, it can be strong motivation factor for us to do such transition if the code is overtly complicate anyways.

Changed in glance-store:
status: New → In Progress
assignee: nobody → Stuart McLaren (stuart-mclaren)
milestone: none → v0.1.11
importance: Undecided → Critical
Changed in glance-store:
milestone: v0.1.11 → none
milestone: none → v0.1.12
Changed in glance-store:
milestone: v0.1.12 → v0.1.13
Changed in glance-store:
milestone: v0.4.0 → v0.4.1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on glance (stable/juno)

Change abandoned by Flavio Percoco (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/153502
Reason: Abandoning based on Erno's and nikhil's comments.

Thierry Carrez (ttx)
Changed in glance:
milestone: kilo-2 → 2015.1.0
Changed in glance-store:
milestone: 0.5.0 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on glance (stable/icehouse)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/153508
Reason: The change on stable/juno was abandoned, sounds like there were more changes to make in kilo, and this has a -2 from the glance PTL (at the time), so dropping this from icehouse.

no longer affects: glance/icehouse
no longer affects: glance/juno
Changed in glance-store:
importance: Critical → Medium
Revision history for this message
Ian Cordasco (icordasc) wrote :

Does this still affect glance-store?

tags: added: propose-close
removed: icehouse-backport-potential juno-backport-potential
Changed in glance-store:
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.