Snapshots of instances launched from images fails with Ceph as storage.

Bug #1639940 reported by bgodette on 2016-11-07
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openstack-ansible
Medium
Logan V

Bug Description

With Ceph configured as a Cinder backend, instance snapshots fail for instances that are launched from images where a new volume is not created. This is caused by missing configuration when RBD is the default store for Glance.

Attached patch for glance-api.conf.j2

bgodette (bgodette) wrote :

There could have security implications on this change.

Please have a look to our bug triage conversation before implementing:
http://eavesdrop.openstack.org/irclogs/%23openstack-ansible/%23openstack-ansible.2016-11-08.log.html#t2016-11-08T17:07:49

Logan V (loganv) wrote :

Regarding the security implications of exposing the direct image URL via endpoints, this is documented in the Ceph Openstack integration docs.
http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/#any-openstack-version

"Note that this exposes the back end location via Glance’s API, so the endpoint with this option enabled should not be publicly accessible."

What I do to mitigate this concern is run two sets of glance containers, both tied to the same RBD cluster/database behind my load balancers. The public endpoints route to a set of glance containers that does not have show_image_direct_url enabled. The "backend" containers bind to the internal LB endpoint and have show_image_direct_url enabled to allow for ceph rbd copy on write.

Andy McCrae (andrew-mccrae) wrote :

Logan, do you think this would be a documentation fix then?
And if so, do you have sample configuration to show for this?

Changed in openstack-ansible:
importance: Undecided → Medium
assignee: nobody → Logan V (logan2211)
Logan V (loganv) wrote :

@bgodette: I am working on reproducing this bug so we can mark it confirmed and begin evaluating solutions. Can you confirm the OSA branch and SHA/tag you were seeing this on?

Thanks

bgodette (bgodette) wrote :

It's present in stable/newton @ SHA 75c1384d2738cfb992064385747fca5dc23ca90d

https://github.com/openstack/openstack-ansible/tree/75c1384d2738cfb992064385747fca5dc23ca90d

Logan V (loganv) on 2016-11-19
Changed in openstack-ansible:
status: New → Confirmed
Logan V (loganv) on 2016-11-19
Changed in openstack-ansible:
status: Confirmed → Incomplete
Logan V (loganv) wrote :

@bgodette- I'm still having problems confirming this bug.

Env:
glance:
show_image_direct_url = True

cinder:
[RBD]
rbd_secret_uuid=495923e5-1b24-4097-aea1-896483016457
rbd_ceph_conf=/etc/ceph/ceph.conf
volume_backend_name=rbddriver
rbd_store_chunk_size=8
volume_driver=cinder.volume.drivers.rbd.RBDDriver
report_discard_supported=True
rbd_pool=volumes
rbd_user=cinder

Workflow:
I've created Ceph backed disks using nova launch from image and also cinder volume. I noticed the nova disk was created as a CoW layered clone, while the cinder volume is a flattened disk. This may be related to the cinder backend configuration.
Create snapshot in nova works fine with nova backed disks. It uploads a raw snapshot of the instance disk to glance.
With Cinder+RBD backed instances, it is creating a 0B image in glance, however it does create a volume snapshot in cinder.
The behavior above is identical whether I have show_multiple_locations on or off in glance.

I'm very unclear as to what errors you are seeing in the logs or interface, and what workflow you are using to get to that point. Can you help me understand better what you are seeing and how to reproduce it?

The best I was able to do is find some documentation on the ceph website stating that for "mitaka only" operators should enable show_multiple_locations.
http://docs.ceph.com/docs/master/rbd/rbd-openstack/?highlight=uuid#enable-copy-on-write-cloning-of-images
However, it does not explain why that must be enabled or what features are activated by doing so. And from some testing just now, I can't tell any difference with that on or off in Newton.

Logan V (loganv) wrote :

I found a bug that I think may be related to this issue. Even with multiple_locations enabled, I was seeing CoW volume creation fallback to glance download on Newton, however https://review.openstack.org/#/c/435184/ has fixed that issue.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers