Glance failed to upload image to swift storage

Bug #1518431 reported by Andrey Shestakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
In Progress
Medium
Cyril Roelandt
Nominated for Kilo by Mike Fedosin
Liberty
Triaged
Medium
Cyril Roelandt

Bug Description

When glance configured with swift backend, and swift API provides via RadosGW is unable to upload image.

Command:
glance --debug image-create --name trusty_ext4 --disk-format raw --container-format bare --file trusty-server-cloudimg-amd64.img --visibility public --progress
Logs:
http://paste.openstack.org/show/479621/

Revision history for this message
Anuj Sharma (anuj-sharma10) wrote :

Please share your glance-api.conf file...

Revision history for this message
Andrey Shestakov (ashestakov) wrote :
Revision history for this message
Flavio Percoco (flaper87) wrote :

The swift driver is meant to be used to talk to Swift. If there's a problem in RadosGW swift API, we should notify the RadosGW maintainers.

Changed in glance:
status: New → Invalid
Revision history for this message
Andrey Shestakov (ashestakov) wrote :

Looks like problem in Glance API v2, with v1 works as expected.

Revision history for this message
Flavio Percoco (flaper87) wrote :

I've debugged this issue a bit further and it's not related to RadosGW. The swift driver deletes that last 0-byte chunk when the image is being uploaded and that's causing this conflict.

---
                    if bytes_read == 0:
                        # Delete the last chunk, because it's of zero size.
                        # This will happen if size == 0.
                        LOG.debug("Deleting final zero-length chunk")
                        connection.delete_object(location.container,
                                                 chunk_name)
                        break

                    chunk_id += 1
                    combined_chunks_size += bytes_read
---

summary: - Glance failed to upload image to swift storage via RadosGW
+ Glance failed to upload image to swift storage
Changed in glance:
status: Invalid → Confirmed
Revision history for this message
Flavio Percoco (flaper87) wrote :

The above was added here: https://review.openstack.org/#/c/2728/

Revision history for this message
Mike Fedosin (mfedosin) wrote :

I think Flavio is right here - RadosGW claimed to be 100% compatible with Swift API, but here we see inconsistencies.

As a workaround we can use glance v1, which works fine and used by default in Nova in Liberty. But it must be fixed in Mitaka, because we can break many deployments after that.

Revision history for this message
Flavio Percoco (flaper87) wrote :

One of the things triggering this misbehavior is that glanceclient is not sending the request's content-lenght and therefore, Glance doesn't know the image/request size when the call to the store is made.

Unfortunately, there's a bit more to this issue since this doesn't happen deterministically. There seems to be a race in the swift driver that's causing the first conflict error, which then makes the subsequent requests fail.

affects: glance → mos
affects: mos → glance
Changed in mos:
status: New → Confirmed
assignee: nobody → MOS Glance (mos-glance)
no longer affects: mos
Revision history for this message
Cyril Roelandt (cyril-roelandt) wrote :

For some reason this does not show up, but I submitted https://review.openstack.org/#/c/254873/ to fix the issue.

Revision history for this message
Kairat Kushaev (kkushaev) wrote :

It took some amount of time to debug this issue for Glance.
It turned out that both Glance and Swift support chunked requests. Unfortunately, chunked uploading is not supported mod_fastcgi that is used by RadosGW. It is very typical for Apache to response with 411 error because some cgi (or wsgi) frameworks always require Content-length to be specified.
Request with transfer-encoding=chunked is a part of Http spec so glance_store prepares correct request here.
So I would recommend to deploy RadosGW under different CGI (for example mod_proxy_fcgi) that supports chunked requests.
I will mark this as Invalid for Glance, please re-open the bug if you don't agree.

Changed in glance:
assignee: nobody → Kairat Kushaev (kkushaev)
status: Confirmed → Invalid
Revision history for this message
Sergey Gotliv (sgotliv) wrote :

Kairat,

I reopened the bug because its reproducible with Swift not just with RadosGW. Please, review a previous comments, especially #5, #8 and solution proposed in comment #9.

Changed in glance:
status: Invalid → In Progress
assignee: Kairat Kushaev (kkushaev) → nobody
Changed in glance:
assignee: nobody → Cyril Roelandt (cyril-roelandt)
Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

> As a workaround we can use glance v1, which works fine and used by default in Nova in Liberty.

Is this still the case? Or can we reproduce this on v1 also?

> One of the things triggering this misbehavior is that glanceclient is not sending the request's content-lenght and therefore, Glance doesn't know the image/request size when the call to the store is made.

Not sending the image size should be common behaviour. (I think it may be the case with Nova snapshots, but I'd need to double check.) If that was problematic I'd have thought this issue would have been flushed out by now.

What are the steps to reproduce?

Thanks.

Revision history for this message
Flavio Percoco (flaper87) wrote :

Stuart, Kairat,

This bug is hard to reproduce, TBH. The race happens in a combination of Glance's v2 behavior and swift's driver. This is not *entirely* Glance's fault, though. The Glance "weird" part is not being able to recognize "0-sized" chunks before submitting them. However, this is being refactored a bit by one of our current swift specs (need to find the link).

The swift issue is that the DELETE is hitting the swift node before the last chunk is written/available. This could be caused by unsynchronized clocks or just a race condition.

I've managed to reproduce it in one of our installers CI systems with some frequency but I have yet to find a deterministic way to do so. Meanwhile, I can confirm this issue exists.

While it's true that part of this code is being refactored by the "buffered chunk writer" spec, I still think we should fix it as it's eligible for backport.

Hope the above helps clarifying a bit the real problem

Changed in glance:
importance: Undecided → Medium
Revision history for this message
Andrey Shestakov (ashestakov) wrote :

I can confirm that image upload via v2 to radosgw works with mod_proxy_fcgi.

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

@Flavio

Thanks for the extra info.

> The race happens in a combination of Glance's v2 behavior and swift's driver.

Ok, can someone smarter than me explain why we don't see this with v1? :-)

(I updated the code review with an alternative fix.)

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/glance_store 0.11.0

This issue was fixed in the openstack/glance_store 0.11.0 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/glance_store 0.9.2

This issue was fixed in the openstack/glance_store 0.9.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

This issue was fixed in the openstack/glance_store 0.9.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.