Corrupt image download after swiftclient 2.0 release

Bug #1280072 reported by John Griffith
34
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Core Infrastructure
Invalid
Undecided
Unassigned
python-swiftclient
Fix Released
Undecided
Tristan Cacqueray

Bug Description

Started seeing rash of gate failures in all devstack tests for this today. Looks like others have been logging this against bug #1254890, but that doesn't seem accurate, or at least not detailed enough.

Here's an example of the failure being seen:
http://logs.openstack.org/74/73474/1/check/check-tempest-dsvm-full/be41408/console.html

We continue to fall apart after the first error here.

Tags: gate-failure
tags: added: gate-failure
Revision history for this message
Qiu Yu (unicell) wrote :

Several patterns I observed so far for this gate failure:

Following jenkins job would fail
------------------------------------------
* gate-devstack-dsvm-cells
* gate-tempest-dsvm-full
* gate-tempest-dsvm-postgres-full
* gate-tempest-dsvm-neutron
* gate-grenade-dsvm

First failure inside individual jobs
----------------------------------------
* gate-devstack-dsvm-cells
[ERROR] /opt/stack/new/devstack/exercises/volumes.sh:136 server didn't become active!

* gate-tempest-dsvm-full
tempest.api.compute.admin.test_aggregates.AggregatesAdminTestXML.test_aggregate_add_host_create_server_with_az[gate] ... FAIL

* gate-tempest-dsvm-postgres-full
tempest.api.compute.admin.test_aggregates.AggregatesAdminTestXML.test_aggregate_add_host_create_server_with_az[gate] ... FAIL

* gate-tempest-dsvm-neutron
setUpClass (tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestJSON) ... FAIL

* gate-grenade-dsvm
setUpClass (tempest.api.compute.servers.test_create_server.ServersTestJSON) ... FAIL

Changed in nova:
status: New → Confirmed
Revision history for this message
Qiu Yu (unicell) wrote :

Looking into n-cpu log for the failing server id, it shows

libvirtError: internal error Process exited while reading console log output: char device redirected to /dev/pts/1
qemu: linux kernel too old to load a ram disk

Revision history for this message
Qiu Yu (unicell) wrote :

Earliest good build I can track:

14-Feb-2014 01:04
http://logs.openstack.org/51/73451/1/check/check-tempest-dsvm-full/61d8f36/

and earliest bad build tracked so far

14-Feb-2014 01:20
http://logs.openstack.org/12/73112/2/check/check-tempest-dsvm-full/1abec24/

As John Griffith suggests, the issue *might* be caused by new python-swiftclient release. So far from my observation, all good builds are with swiftclient 1.9.0 0c7264c21d06d37de5db0beee677c4e026dc8f45, while all the failed builds are with 2.0 19d7e1812a99d73785146667ae2f3a7156f06898

Although we don't have strong evidence of connection between 2.0 version and failed to boot vm error above at this moment.

Revision history for this message
Qiu Yu (unicell) wrote :

Seem the swiftclient is the one to blame.

log snippet from g-api.log

2014-02-14 01:15:03.622 28919 INFO requests.packages.urllib3.connectionpool [-] Starting new HTTP connection (1): 127.0.0.1
2014-02-14 01:15:03.657 28919 ERROR glance.api.common [ad40e50a-4a39-4ef6-a29b-315ba2507e71 8f8649a8d77b4779a191f3d523c4f26d 4e4f15e3fdcf443b86cd72099d2c22a5] Backend storage for image 4a155eef-4a4e-49f7-b0de-f549859a05d8 disconnected after writing only 25165964 bytes
2014-02-14 01:15:03.657 28919 INFO glance.wsgi.server [ad40e50a-4a39-4ef6-a29b-315ba2507e71 8f8649a8d77b4779a191f3d523c4f26d 4e4f15e3fdcf443b86cd72099d2c22a5] Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 402, in handle_one_response
    for data in result:
  File "/opt/stack/new/glance/glance/api/common.py", line 58, in size_checked_iter
    {'image_id': image_id})
GlanceException: Corrupt image download for image 4a155eef-4a4e-49f7-b0de-f549859a05d8

Qiu Yu (unicell)
affects: nova → openstack-ci
Revision history for this message
Qiu Yu (unicell) wrote :
Qiu Yu (unicell)
summary: - FAIL:
- tempest.api.compute.admin.test_aggregates.AggregatesAdminTestJSON.test_aggregate_add_host_create_server_with_az
+ Corrupt image download after swiftclient 2.0 release
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Indeed there is a bug within swiftclient. Files get uploaded as "Content-Type: multipart/form-data;". But swift will actually store the whole multipart content as the upload file. Thus when you upload a file, with only 'DATA' inside, download will give you:
--85b0712f8c724b48b079eee6ce9ef90a
Content-Disposition: form-data; name="file"; filename="test"

DATA

--85b0712f8c724b48b079eee6ce9ef90a--

I removed the multipart thing from swiftclient and it is now working correctly. In fact previous http implentation didn't do multipart/form-data upload before. see: https://review.openstack.org/73585

Revision history for this message
Derek Higgins (derekh) wrote :

Also tripleo devtest is seeing extra memory usage with the new python-swiftclient
https://bugs.launchpad.net/tripleo/+bug/1280275

trying out the patch now.

Alan Pevec (apevec)
Changed in python-swiftclient:
status: New → Confirmed
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

This last review, while it fixes the content integrity, reveal another bug related to glance image upload. Glance CooperativeReader makes requests believe it's a chunked encoding while it's not. The last review (https://review.openstack.org/73585) fixes this behavior as well.

Changed in python-swiftclient:
assignee: nobody → Tristan Cacqueray (tristan-cacqueray)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-swiftclient (master)

Reviewed: https://review.openstack.org/73585
Committed: https://git.openstack.org/cgit/openstack/python-swiftclient/commit/?id=380e83087447b724458ba16e11f527babb39dd01
Submitter: Jenkins
Branch: master

commit 380e83087447b724458ba16e11f527babb39dd01
Author: Tristan Cacqueray <email address hidden>
Date: Fri Feb 14 12:52:26 2014 +0100

    Remove multipart/form-data file upload

    The requests 'files' parameter adds this 'Content-Type: multipart/form-data'
    HTTP header and the whole multipart body data get stored with the object.
    This also create a memory hog issue because files are loaded in memory before
    being actually sent. This patch removes this behavior and restores what was
    done before, ie: direct uploading.

    This patches also fixes an issue in requests, when used with glance
    CooperativeReader it mis-calculates content-length leading to chunked encoding
    for raw upload.

    Change-Id: Ie5b0a1078bedd33f09c6157f48b5f88116c589fa
    Closes-Bug: #1280072
    Closes-Bug: #1280275

Changed in python-swiftclient:
status: Confirmed → Fix Committed
Jeremy Stanley (fungi)
Changed in openstack-ci:
status: Confirmed → Invalid
Changed in python-swiftclient:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.