Silent wasted storage with multiple RBD backends

Bug #1858877 reported by Dan Smith
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Wishlist
Unassigned

Bug Description

Nova does not currently support multiple rbd backends. However, Glance does and an operator may point Nova at a Glance with access to multiple RBD clusters. If this happens, Nova will silently download the image from Glance, flatten it, and upload it to the local RBD cluster named privately to the image. If another instance is booted from the same image, this will happen again, using more network resources and duplicating the image on ceph for the second and subsequent instances. When configuring Nova and Glance for shared RBD, the expectation is that instances are fast-cloned from Glance base images, so this silent behavior of using a lot of storage would be highly undesirable and unexpected. Since operators control the backend config, but users upload images (and currently only to one backend), it is the users that would trigger this additional consumption of storage.

This isn't really a bug in Nova per se, since Nova does not claim to support multiple backends and is download/uploading the image in the same way it would if the image was located on any other not-the-same-as-my-RBD-cluster location. It is, however, unexpected and undesirable behavior.

Revision history for this message
Eric Fried (efried) wrote :

Related fix in nova/master: https://review.opendev.org/#/c/657078/

Changed in nova:
importance: Undecided → Wishlist
status: New → Confirmed
Matt Riedemann (mriedem)
tags: added: ceph libvirt
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/657078
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=80191e6d828cf823ce3aa7c6176da5e531694900
Submitter: Zuul
Branch: master

commit 80191e6d828cf823ce3aa7c6176da5e531694900
Author: Dan Smith <email address hidden>
Date: Fri May 3 13:46:23 2019 -0700

    Add a workaround config toggle to refuse ceph image upload

    If a compute node is backed by ceph, and the image is not clone-able
    in that same ceph, nova will try to download the image from glance
    and upload it to ceph itself. This is nice in that it "just works",
    but it also means we store that image in ceph in an extremely
    inefficient way. In a glance multi-store case with multiple ceph
    clusters, the user is currently required to make sure that the image
    they are going to use is stored in a backend local to the compute
    node they land on, and if they do not (or can not), then nova will
    do this non-COW inefficient copy of the image, which is likely not
    what the operator expects.

    Per the discussion at the Denver PTG, this adds a workaround flag
    which allows the operators to direct nova to *not* do this behavior
    and instead refuse to boot the instance entirely.

    Related-Bug: #1858877
    Change-Id: I069b6b1d28eaf1eee5c7fb8d0fdef9c0c229a1bf

Eric Fried (efried)
Changed in nova:
status: Confirmed → Invalid
Revision history for this message
Eric Fried (efried) wrote :

Closing since this is essentially a feature request. Reopen as a blueprint if this is going to be worked on.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/757177

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/c/openstack/nova/+/757177
Committed: https://opendev.org/openstack/nova/commit/794bedf00e6a3dcdf89f07ae3f63deee09138a9a
Submitter: "Zuul (22348)"
Branch: stable/train

commit 794bedf00e6a3dcdf89f07ae3f63deee09138a9a
Author: Dan Smith <email address hidden>
Date: Fri May 3 13:46:23 2019 -0700

    Add a workaround config toggle to refuse ceph image upload

    If a compute node is backed by ceph, and the image is not clone-able
    in that same ceph, nova will try to download the image from glance
    and upload it to ceph itself. This is nice in that it "just works",
    but it also means we store that image in ceph in an extremely
    inefficient way. In a glance multi-store case with multiple ceph
    clusters, the user is currently required to make sure that the image
    they are going to use is stored in a backend local to the compute
    node they land on, and if they do not (or can not), then nova will
    do this non-COW inefficient copy of the image, which is likely not
    what the operator expects.

    Per the discussion at the Denver PTG, this adds a workaround flag
    which allows the operators to direct nova to *not* do this behavior
    and instead refuse to boot the instance entirely.

    Conflicts:
        nova/conf/workarounds.py

    NOTE(melwitt): The conflict is because this patch originally landed on
    ussuri and change If874f018ea996587e178219569c2903c2ee923cf (Reserve
    DISK_GB resource for the image cache) landed afterward and was
    backported to stable/train.

    Related-Bug: #1858877
    Change-Id: I069b6b1d28eaf1eee5c7fb8d0fdef9c0c229a1bf
    (cherry picked from commit 80191e6d828cf823ce3aa7c6176da5e531694900)

tags: added: in-stable-train
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.