Glance image-create stuck in saving state using rbd as a backend and --copy-from option

Bug #1146830 reported by Joël Casutt
72
This bug affects 14 people
Affects Status Importance Assigned to Milestone
Glance
Fix Released
High
Chris Fattarsi
Grizzly
Fix Released
High
Francois Deppierraz

Bug Description

We are running OpenStack on top of Ceph. Currently we are experiencing a problem when using the --copy-from option with the glance cli and thus when using Horizon to fetch and save images. If we do so the Image does get saved onto the appropriate pool in Ceph but stays in saving state as far as it concerns OpenStack.

==========================================================================
Used Version of Glance:

root@ineri:~# dpkg -s glance
Package: glance
Status: install ok installed
Priority: extra
Section: python
Installed-Size: 44
Maintainer: Ubuntu OpenStack <email address hidden>
Architecture: all
Version: 2012.2.4+git201303040601~quantal-0ubuntu1
...

Used Version of Ceph:

root@ineri:~# dpkg -s ceph
Package: ceph
Status: install ok installed
Priority: optional
Section: admin
Installed-Size: 25103
Maintainer: Laszlo Boszormenyi (GCS) <email address hidden>
Architecture: amd64
Version: 0.57-1quantal
...

==========================================================================
==========================================================================
Used command:

root@ineri:~# glance -v image-create --copy-from http://mattdm.fedorapeople.org/cloud-images/Fedora18-Cloud-i386-20130115.qcow2 --name="Bug-test Fedora" --is-public true --container-format bare --disk-format qcow2
+------------------+--------------------------------------+
| Property | Value |
+------------------+--------------------------------------+
| checksum | None |
| container_format | bare |
| created_at | 2013-03-05T12:50:08 |
| deleted | False |
| deleted_at | None |
| disk_format | qcow2 |
| id | 9c219573-2aae-480d-a4f7-acaf469b241d |
| is_public | True |
| min_disk | 0 |
| min_ram | 0 |
| name | Bug-test Fedora |
| owner | 973725e345204044a6078ef764f42d9d |
| protected | False |
| size | 225640448 |
| status | queued |
| updated_at | 2013-03-05T12:50:08 |
+------------------+--------------------------------------+
==========================================================================

This results in a Image that is stuck in saving state:

==========================================================================
Result after approx. 1 Minute:

root@ineri:~# nova image-show 9c219573-2aae-480d-a4f7-acaf469b241d
+----------+--------------------------------------+
| Property | Value |
+----------+--------------------------------------+
| created | 2013-03-05T12:50:08Z |
| id | 9c219573-2aae-480d-a4f7-acaf469b241d |
| minDisk | 0 |
| minRam | 0 |
| name | Bug-test Fedora |
| progress | 50 |
| status | SAVING |
| updated | 2013-03-05T12:50:08Z |
+----------+--------------------------------------+
==========================================================================
==========================================================================
Relevant excerpts from the glance log (api.log):

2013-03-05 13:50:08 INFO glance.api.v1.images [bfc96989-0e40-44fb-8cc9-b01fbbf8a417 c3eb8511a68846978b121d6afa8356e7 973725e345204044a6078ef764f42d9d] Triggering asynchronous copy from external source
==========================================================================
==========================================================================
Result after some more minutes:

root@ineri:~# date
Tue Mar 5 13:57:56 CET 2013
root@ineri:~# nova image-show 9c219573-2aae-480d-a4f7-acaf469b241d
+----------+--------------------------------------+
| Property | Value |
+----------+--------------------------------------+
| created | 2013-03-05T12:50:08Z |
| id | 9c219573-2aae-480d-a4f7-acaf469b241d |
| minDisk | 0 |
| minRam | 0 |
| name | Bug-test Fedora |
| progress | 50 |
| status | SAVING |
| updated | 2013-03-05T12:50:08Z |
+----------+--------------------------------------+
==========================================================================
==========================================================================
The image is there (and it has the right size):

root@ineri:~# rbd -p images list --long | grep 9c219573-2aae-480d-a4f7-acaf469b241d
9c219573-2aae-480d-a4f7-acaf469b241d 215M 2
==========================================================================

Revision history for this message
Joël Casutt (joel-casutt) wrote :

This bug is a follow up on Comment 15 onwards of bug https://bugs.launchpad.net/glance/+bug/1112670

description: updated
description: updated
Revision history for this message
Joël Casutt (joel-casutt) wrote :
Revision history for this message
Stephen Street (stephen-redrocketcomputing) wrote :

Here is some addition debug from the glance-api log showing glance.store.rbd in rbd.py creating the image. I'm not sure what is supposed to happen next, but with a single glance worker thread ([DEFAULT] worker = 1), "glance image-list" hangs. Increasing the number of workers allows the glance client to function, but the image status never changes.

2013-03-05 17:47:21 INFO glance.api.v1.images [f0131e0b-f006-4255-aa8a-4e9bfb609df1 None None] Triggering asynchronous copy from external source
2013-03-05 17:47:21 20244 DEBUG glance.api.v1.images [-] Setting image 2584b5b0-211e-4f85-822e-ea3b1791d666 to status 'saving' _upload /usr/lib/python2.7/dist-packages/glance/api/v1/images.py:426
2013-03-05 17:47:21 20244 DEBUG glance.registry [-] Updating image metadata for image 2584b5b0-211e-4f85-822e-ea3b1791d666... update_image_metadata /usr/lib/python2.7/dist-packages/glance/registry/__init__.py:144
2013-03-05 17:47:21 20244 DEBUG glance.common.client [-] Constructed URL: http://0.0.0.0:9191/images/2584b5b0-211e-4f85-822e-ea3b1791d666 _construct_url /usr/lib/python2.7/dist-packages/glance/common/client.py:464
2013-03-05 17:47:21 20244 DEBUG glance.registry.client [-] Registry request PUT /images/2584b5b0-211e-4f85-822e-ea3b1791d666 HTTP 200 request id req-50a82cd6-32f9-4c8f-a46b-ee984c9ef625 do_request /usr/lib/python2.7/dist-packages/glance/registry/client.py:94
2013-03-05 17:47:21 20244 DEBUG glance.api.v1.images [-] Uploading image data for image 2584b5b0-211e-4f85-822e-ea3b1791d666 to rbd store _upload /usr/lib/python2.7/dist-packages/glance/api/v1/images.py:431
2013-03-05 17:47:22 20244 DEBUG glance.store.rbd [-] creating image 2584b5b0-211e-4f85-822e-ea3b1791d666 with order 23 add /usr/lib/python2.7/dist-packages/glance/store/rbd.py:241

Changed in glance:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/27457

Changed in glance:
assignee: nobody → Chris Fattarsi (chris-fattarsi)
status: Confirmed → In Progress
Revision history for this message
Chris Fattarsi (chris-fattarsi) wrote :

I'll cast some insight on the relevance of this patch to this bug. We ran into a similar issue of images getting stuck in saving state. Turns out if the image_size passed in is larger than the actual image_file there will be attempts to write 0 bytes of data, which causes image.write to hang indefinitely. That would also explain the case above where only one worker would become unresponsive.

This patch makes rbd driver consistent with how other drivers copy the image over.

I am not sure if it is a separate issue that image_size differs from the size of image_file passed in, perhaps that is a validation issue? I don't see other drivers handling that case.

Revision history for this message
Mark Washenberger (markwash) wrote :

Interesting. BTW, as others have said, the code in the patch looks much better than what was there before.

Regarding your more general question of validation issues, it looks like there may indeed be a problem. . .

Revision history for this message
Jens-Christian Fischer (jens-christian-fischer) wrote :

I have just tested the proposed fix, and indeed, it works as expected:

root@ineri:/# glance -v image-create --copy-from http://mattdm.fedorapeople.org/cloud-images/Fedora18-Cloud-i386-20130115.qcow2 --name="Bug-test Fedora fix" --is-public true --container-format bare --disk-format qcow2
+------------------+--------------------------------------+
| Property | Value |
+------------------+--------------------------------------+
| checksum | None |
| container_format | bare |
| created_at | 2013-05-06T08:10:51 |
| deleted | False |
| deleted_at | None |
| disk_format | qcow2 |
| id | 5c0cdbe7-20bb-4aa1-b186-61ccab3b24e6 |
| is_public | True |
| min_disk | 0 |
| min_ram | 0 |
| name | Bug-test Fedora fix |
| owner | df72c23dc1b543338bd235784404f087 |
| protected | False |
| size | 225640448 |
| status | queued |
| updated_at | 2013-05-06T08:10:51 |
+------------------+--------------------------------------+

root@ineri:/# nova image-show 5c0cdbe7-20bb-4aa1-b186-61ccab3b24e6
+----------+--------------------------------------+
| Property | Value |
+----------+--------------------------------------+
| created | 2013-05-06T08:10:51Z |
| id | 5c0cdbe7-20bb-4aa1-b186-61ccab3b24e6 |
| minDisk | 0 |
| minRam | 0 |
| name | Bug-test Fedora fix |
| progress | 100 |
| status | ACTIVE |
| updated | 2013-05-06T08:11:30Z |
+----------+--------------------------------------+

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance (master)

Reviewed: https://review.openstack.org/27457
Committed: http://github.com/openstack/glance/commit/101f000d3ca25761d75f626fe4379143ef5a6a70
Submitter: Jenkins
Branch: master

commit 101f000d3ca25761d75f626fe4379143ef5a6a70
Author: Chris Fattarsi <email address hidden>
Date: Wed Apr 24 17:23:35 2013 -0700

    RBD store uses common utils for reading file chunks

    Prevents cases where the image_size passed to the RBD store is larger
    than the actual image_file which attempts to write 0 bytes of data
    causing image.write to hang indefinitely.

    Fixes LP Bug #1146830

    Change-Id: I56216437a212c0c802d2ed349d88b3bd0d3bb4d7

Changed in glance:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in glance:
milestone: none → havana-1
status: Fix Committed → Fix Released
Revision history for this message
Ramonskie (ramonskie) wrote :

i applied this patch to my grizzly setup
and it worked like a charm.

Revision history for this message
Joe Seegmiller (joeseegmiller) wrote :

Worked for me as well. Would be nice to get this backported to grizzly

Revision history for this message
Jeff Lane  (bladernr) wrote :

yes, it would be nice to ahve this in Grizzly...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (stable/grizzly)

Fix proposed to branch: stable/grizzly
Review: https://review.openstack.org/47418

tags: added: grizzly-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance (stable/grizzly)

Reviewed: https://review.openstack.org/47418
Committed: http://github.com/openstack/glance/commit/92a783147fc8a4a3ab01925655d3b55d0b61418e
Submitter: Jenkins
Branch: stable/grizzly

commit 92a783147fc8a4a3ab01925655d3b55d0b61418e
Author: Chris Fattarsi <email address hidden>
Date: Wed Apr 24 17:23:35 2013 -0700

    RBD store uses common utils for reading file chunks

    Prevents cases where the image_size passed to the RBD store is larger
    than the actual image_file which attempts to write 0 bytes of data
    causing image.write to hang indefinitely.

    Fixes LP Bug #1146830

    Cherry-picked from havana (101f000d3ca25761d75f626fe4379143ef5a6a70)

    Change-Id: I56216437a212c0c802d2ed349d88b3bd0d3bb4d7

tags: added: in-stable-grizzly
Thierry Carrez (ttx)
Changed in glance:
milestone: havana-1 → 2013.2
Alan Pevec (apevec)
tags: removed: grizzly-backport-potential in-stable-grizzly
Revision history for this message
semy (semyazz) wrote :

I'm using Icehouse (Ubuntu 14.04) and a plain directory as a backend and I can confirm that in emergency cases when something get stuck e.g. in SAVING state, DELETE just removes data from the database, but the image itself remains on the disk.

In my case during image-create there was an error related to authentication. It looks like the token has expired - the image had 20GB, plus slow connection, so it all was slow.

In general when downloading is finished, there is a request to update the state and because of expired token, NotAuthenticated error is thrown. Later glance tries to KILL the image, but also NotAuthenticated error occurs and Image is stuck in SAVING state. So I just deleted broken image, but after few days when I checked the directory with images I've noticed that too many images are there and I've figured out that all images deleted during that weird error were just deleted from glance's database, so they are no longer on images' list, but they remain on the disk.

Revision history for this message
semy (semyazz) wrote :

I've created a new bug related to this one - https://bugs.launchpad.net/glance/+bug/1371121

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.