instance.root_gb should be 0 for volume-backed instances

Bug #1469179 reported by Feodor Tersin on 2015-06-26
244
This bug affects 42 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Dan Smith

Bug Description

instance.root_gb means size of host local disk used by the instance root disk. All code that uses this attribute treats it that way.

Volume-backed instances have root disks placed in Cinder. Therefore obviously root_gb should be 0 for them. However currently this is not so. The same applies to min_disk and size image attributes, used for boot from a volume.

As a result the code which uses these attributes works incorrectly. Some problems are already detected [1], other are not yet [2].

There are two kinds of bugs:
1 Nova fails to launch an instance from a large volume if the volume size (or an original image's min_disk) is greater than requested flavor.root_gb.
2 Nova incorrectly calculates host disk space consumed by volume-backed instances.

To fix all these problems fully it is proposed to set root_gb, min_disk, size to 0 for volume-backed instances.

[1] https://bugs.launchpad.net/nova/+bug/1334974
https://bugs.launchpad.net/nova/+bug/1459491
https://bugs.launchpad.net/nova/+bug/1466305
https://bugs.launchpad.net/nova/+bug/1457517
https://bugs.launchpad.net/nova/+bug/1358566
[2] https://github.com/openstack/nova/blob/master/nova/notifications.py#L407
https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L250

tags: added: disks volumes
Feodor Tersin (ftersin) on 2015-07-01
description: updated
Changed in nova:
assignee: nobody → Feodor Tersin (ftersin)
status: New → In Progress
Feodor Tersin (ftersin) on 2015-07-02
description: updated
Yaguang Tang (heut2008) on 2015-07-13
tags: added: kilo-backport-potential

Change abandoned by Feodor Tersin (<email address hidden>) on branch: master
Review: https://review.openstack.org/203766
Reason: This was a preparation step to fix bug #1334974 and bug #1358566. The idea was to call is_volume_backed_instance from nova.scheduler.utils.build_request_spec to set root_gb=0 for volume-backed instances.

But since an object refactoring is doing now under https://blueprints.launchpad.net/nova/+spec/request-spec-object , i prefer to wait the result of it.

Changed in nova:
assignee: Feodor Tersin (ftersin) → Matthew Booth (mbooth-9)

Change abandoned by Feodor Tersin (<email address hidden>) on branch: master
Review: https://review.openstack.org/196569

Changed in nova:
assignee: Matthew Booth (mbooth-9) → Feodor Tersin (ftersin)

Reviewed: https://review.openstack.org/170243
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8cf2d41344785a0752fbfe8745032aed2ec11e27
Submitter: Jenkins
Branch: master

commit 8cf2d41344785a0752fbfe8745032aed2ec11e27
Author: Feodor Tersin <email address hidden>
Date: Mon Jul 6 21:42:14 2015 +0300

    Fix collection of metadata for a snapshot of a volume-backed instance

    Currently the snapshot derives its properties from the instance source:
    an other snapshot or a bootable volume. But that sources could be
    changed since the instance was booted.

    To make instance snapshots independent of source changes this patch
    collects metadata from instance system metadata rather than the sources.

    Since it brings the only way to initialize image metadata, it fixes:
    a) min_ram attribute is not inherited from a bootable volume
    (LP #1369854).
    b) empty container_format and disk_format attribute are inherited from a
    source instance snapshot (LP #1439819).

    Closes-Bug: #1369854
    Closes-Bug: #1439819
    Related-Bug: #1469179
    Change-Id: I067f66356a5ebd738add1591a0069d8049f35c24

Daniel (leaberry) wrote :

Just hit this on a liberty cluster I am putting together. Would like to see this fixed as it's a pretty annoying bug.

To work around it I simply boosted the disk oversubscription ratio significantly. It's risky because someone might launch a bunch of images and fill up the root disk instead of launching images backed by cinder but it works around this issue.

nova.conf

disk_allocation_ratio=100

Tobias Urdin (tobias-urdin) wrote :

Marked my bug https://bugs.launchpad.net/nova/+bug/1508907 as a duplicate for this one, but I wan't to push for a fix for this since we have the same issue.

Changed in nova:
importance: Undecided → Medium
Tobias Urdin (tobias-urdin) wrote :

After some testing setting root_gb to zero in build_request_spec function which is the function creating the request spec it doesn't seem like this affects the data that is inserted into the nova database which will still lead to the resource tracker, scheduling and horizon etc showing the wrong used disk. We need to find a good way to 1) check if a booted instance has a cinder root volume 2) if it has set the root_gb and ephemeral_gb to zero for that instance so that it's not counted against the local used disk space for the compute node which will result in wrong resource tracking and wrong scheduling.

Ankit Agrawal (ankitagrawal) wrote :

Hi Feodor, I have seen your patch [1] which is in merge conflict for long. Just wanted to confirm if you are still willing to fix this issue or we should takeover this task.

[1] https://review.openstack.org/#/c/200870/

Thanks,

Andrew Laski (alaski) wrote :

Tobias: As you've seen just setting the disk values to 0 in the request spec isn't enough because that is only really used by the scheduler, which doesn't currently do resource tracking. I think it needs to be made clear that root_gb, and really all of the disk sizes, only apply to local disks. So the right thing to do would be to set it to 0 when booting from a volume. The patch from Feodor looks like the right approach for that.

The other side of it is what's shown in the API, and therefore what Horizon displays. Some consideration should be given to whether the disk size exposed should be the local disk or the root disk. My first hunch says that we should display root_gb as 0 and require the disk size to be extracted from the volume info. If we displayed root_gb == volume_size that could lead to confusion as to why the root_gb is > the max disk size allowed by a flavor.

Tobias Urdin (tobias-urdin) wrote :

Andrew: I totally agree with you, I've been trying to get a hold of Feodor but haven't heard back yet, I basicly have the same question as Ankit if he wants to proceed wit his fix or if we need somebody else to take over. Thanks for your input!

Tobias Urdin (tobias-urdin) wrote :

I have personally tried Feodors patch [1] against our test environment and it seems to do the work as it should. However I have seen in the comments of that review that some people is skeptical to the way it's handled and perhaps there is a better way to do it. I will try to get this review some more attention and it seems it needs to be fixed to prevent a conflict aswell. Please provide feedback on this patch.

This simple patch would solve issues for volume usage being counted as local hypervisor storage which results in the Nova schedulers DiskFilter (which is default as of Liberty) is given the wrong data to do proper scheduling that can cause failure of created instances because it thinks it's out of disk space, it also gives the wrong statistics for the API since the resource tracker generates wrong statistics and therefore affects for example Horizon showing wrong data since the data is actually wrong.

[1] https://review.openstack.org/#/c/200870/

Changed in nova:
assignee: Feodor Tersin (ftersin) → Tobias Urdin (tobias-urdin)
tags: added: liberty-backport-potential
Tobias Urdin (tobias-urdin) wrote :

WIP, needs to test master branch with DevStack. I have only tried the Liberty backport.

Ankit Agrawal (ankitagrawal) wrote :

Tobias: I have already tested Feodor's patch [1] with latest master which works as expected. I have a PS ready reabased with the master and waiting for Feodor's confirmation if He is OK if we submit the new PS. Thanks!

[1] https://review.openstack.org/#/c/200870/

Tobias Urdin (tobias-urdin) wrote :

Ankit: Great, thanks for working on it. I just rebased Feodor's patch a while ago and submitted the PS [1]. Please check it out and see if they are equal otherwise please upload your PS and we can review it. I have tested this patch on Liberty so a backport is also easily to submit.

[1] https://review.openstack.org/#/c/200870/

Tobias Urdin (tobias-urdin) wrote :

Ankit: Can you confirm you have tested the patch [1] (the newest patch-set) with devstack or do I need to do it? I have verified this patch on Liberty and this seems to solve all issues we have with this.

Ankit Agrawal (ankitagrawal) wrote :

Tobias: I have tested latest PS [1] with devstack and it seems to resolve the issues [2] mentioned in this bug. Also there is an issue [3] on scheduler side for which I have a solution ready, just trying to find if there is any better way to achieve it before submitting the patch. Thanks!

[1] https://review.openstack.org/#/c/200870/
[2] https://bugs.launchpad.net/nova/+bug/1469179
[3] https://bugs.launchpad.net/nova/+bug/1334974

Tobias Urdin (tobias-urdin) wrote :

Ankit: Thanks! The patch [1] should resolve the issue with the DiskFilter aswell, see my latest comment in the patch [1] or have I missed something else?

[1] https://review.openstack.org/#/c/200870/

Tobias Urdin (tobias-urdin) wrote :

FYI looked into this last Friday and the above statement is wrong, the DiskFilter will be wrong unless another patch is done to change the Request Spec that is sent to the scheduler (it sends the correct root_gb in the Request Spec, it needs to be zero aswell). See the patch [1] comments.

[1] https://review.openstack.org/#/c/200870/

Changed in nova:
assignee: Tobias Urdin (tobias-urdin) → Ankit Agrawal (ankitagrawal)
Tobias Urdin (tobias-urdin) wrote :

CONFIRMED FOR: LIBERTY

Reviewed: https://review.openstack.org/270482
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d6210a4d0cdaf8a71d2516bf44f04789cbe89d0c
Submitter: Jenkins
Branch: master

commit d6210a4d0cdaf8a71d2516bf44f04789cbe89d0c
Author: Tobias Urdin <email address hidden>
Date: Wed Jan 20 21:55:33 2016 +0100

    Move is_volume_backed_instance to compute.utils

    This function is a method of ComputeAPI, so its callers must have an
    instance of the class. This make the usage difficult in modules which
    do not need to call ComputeAPI by other reason.

    Besides this function does not use any of the class members. So there is
    no reason for it to be the class method.

    This patch moves this function to compute.utils because it does not use
    anything of the class members.

    This patch also adds support for the _get_root_bdm and
    is_volume_backed_instance to read from a dictionary instead of an
    Instance object. Because of this we can call is_volume_backed_instance
    from build_request_spec and fix bug #1469179.

    Change-Id: I6d446088faf500ed39a4504794d09d0f86e2bbc3
    Co-Authored-By: Feodor Tersin <email address hidden>
    Co-Authored-By: Ankit Agrawal <email address hidden>
    Related-Bug: #1469179
    Related-Bug: #1334974
    Related-Bug: #1358566

Jesse Keating (jesse-keating) wrote :

We're now past mitaka, and this bug exists in Mitaka, so I've added the back port tag.

tags: added: mitaka-backport-potential
melanie witt (melwitt) wrote :

FYI based on our discussion on the last review[1], I've proposed another approach at:

https://review.openstack.org/#/c/355091/

[1] https://review.openstack.org/#/c/200870/

melanie witt (melwitt) wrote :

Just to close the loop here, the backportable solution from #24 did not make Newton and will be obsolete in Ocata because of the resource providers work.

To reiterate the workarounds for the problem:

* Don't use the scheduler filter, DiskFilter, if your entire cluster is boot-from-volume instances only.

or

* Create a set of 0 disk flavors for boot-from-volume use if you have a mixed cluster of boot-from-volume and non boot-from-volume instances.

Changed in nova:
assignee: Ankit Agrawal (ankitagrawal) → melanie witt (melwitt)

The workarounds don't work so well for pre-existing clusters that have a mix of boot-from-volume and ephemeral (non-volume) instances. This is because currently deployed instances all use the same disk flavors and it's no longer a simple fix to introduce different flavors as nova-scheduler already thinks a bunch of disk space is already used up by pre-existing instances backed by cinder when they aren't, overestimating usage. Perhaps changing the existing flavors to have 0 byte root disks would make that go away, but then any instances that actually do run using local storage will be considered to have 0 byte disks as well, thus underestimating usage.

Disabling DiskFilter is really the only option, but that obviously can also lead to issues in mixed clusters because disk is actually important.

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/200870
Reason: This patch has been sitting unchanged for more than 12 weeks. I am therefore going to abandon it to keep the nova review queue sane. Please feel free to restore the change if you're still working on it.

MarginHu (margin2017) wrote :

I met this issue on Ocata version.

python2-novaclient-7.1.0-1.el7.noarch
python-nova-15.0.3-2.el7.noarch
openstack-nova-scheduler-15.0.3-2.el7.noarch
openstack-nova-common-15.0.3-2.el7.noarch

Change abandoned by melanie witt (<email address hidden>) on branch: master
Review: https://review.openstack.org/355091
Reason: More recent series about this here:

https://review.openstack.org/#/c/428481/

Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → New
assignee: melanie witt (melwitt) → nobody
Changed in nova:
assignee: nobody → melanie witt (melwitt)
status: New → In Progress
melanie witt (melwitt) wrote :

I don't know why sometimes the bot doesn't update here. Bottom patch is at:

https://review.openstack.org/#/c/428481/

Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing the status back to the previous state and unassigning. If there are active reviews related to this bug, please include links in comments.

Changed in nova:
status: In Progress → New
assignee: melanie witt (melwitt) → nobody
Sean Dague (sdague) on 2017-06-28
Changed in nova:
assignee: nobody → melanie witt (melwitt)
status: New → In Progress
Deepa (dpaclt) wrote :

Is there any workaround we can try .I am using Ocata version with EMC Unity box as Cinder storage .
Does it make any difference if we edit /etc/nova/nova.conf file and add images_volume_group = cinder-volumes

tags: added: pike-backport-potential
Max Stepanov (t-max-z) wrote :

It has been more than 2 years and still no proper solution in sight. This is ridiculous!

Tobias Urdin (tobias-urdin) wrote :

@Max: This has been fixed in newer versions with the resource providers and the use of the placement API.

Chris Friesen (cbf123) wrote :

@Tobias: I don't think it actually has been fixed yet.

MarkMielke (mark-mielke) wrote :

Re: "This has been fixed in newer versions with the resource providers"

After disabling the RamFilter and DiskFilter, and instead relying on the Placement API for scheduling, most of the "scheduling" aspect of this problem has been addressed. I was able to drop a local patch to handle this root_gb!=0 for EBS volumes, and for most real-life use cases it is working. The placement API is not recording disk allocations for EBS volumes.

I found that we now have the opposite problem. I checked our database for exceptions and found that the EBS volumes case is now ok, but a few instances created in Mitaka, before upgrading to Ocata, that still exist, have incorrect placement API data. Their flavor has root_gb=0, but they have local disks, and the placement API shows them as having 0 GB of disk in the allocations table which is not true. I'll have to track this one down as a separate issue.

I tend to think that instance.root_gb should either be eliminated, or it should be kept correct. However, if the placement API makes it irrelevant from a scheduling perspective, perhaps this correction can be deferred until an opportunistic time?

Tom Myny (tom-myny) wrote :

We are having the same problem, we use cinder iscsi volumes as storage backend.

The placement API gives the following error when we create a volume bigger then the local disk of the nova compute node:

Got no allocation candidates from the Placement API. This may be a temporary occurrence as compute nodes start up and begin reporting inventory to the Placement service. (nova-16.0.7/lib/python2.7/site-packages/nova/scheduler/manager.py:133)

melanie witt (melwitt) wrote :

Update: as of Ocata, when using the FilterScheduler, I think the only remaining workaround for BFV with volumes > compute host local disk size is to create and use separate flavors configured for root_gb=0 when booting from volume.

Prior to Ocata, when using the FilterScheduler, an additional workaround was to remove DiskFilter from the configurable enabled scheduler filters.

Fix proposed to branch: master
Review: https://review.openstack.org/551026

Changed in nova:
assignee: melanie witt (melwitt) → Dan Smith (danms)
tags: added: canonical-bootstack
Matt Riedemann (mriedem) wrote :

This wasn't linked into the bug properly, but this change should resolve some of the problem here:

https://review.openstack.org/#/c/580720/

With that change, during server create, volume-backed instances won't have root_gb disk allocated against the compute node resource provider in placement, nor will it filter on root_gb disk (no "claim"). It's not a complete solution since it doesn't handle move operations that allocate resources on a new host like during cold/live migrate, unshelve and evacuate (well, at least not for existing instances anyway since their request spec wouldn't have the is_bfv flag set). But it's a step in the right direction. The only problem is, as currently written, it's not backportable to stable branches.

Changed in nova:
assignee: Dan Smith (danms) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem) on 2018-07-18
Changed in nova:
assignee: Matt Riedemann (mriedem) → Dan Smith (danms)

Fix proposed to branch: master
Review: https://review.openstack.org/583715

Changed in nova:
assignee: Dan Smith (danms) → Eric Fried (efried)
assignee: Eric Fried (efried) → Matt Riedemann (mriedem)
Eric Fried (efried) on 2018-07-18
Changed in nova:
assignee: Matt Riedemann (mriedem) → Dan Smith (danms)
Matt Riedemann (mriedem) on 2018-07-19
tags: removed: kilo-backport-potential liberty-backport-potential mitaka-backport-potential pike-backport-potential
tags: added: rocky-rc-potential

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/428505
Reason: We have https://review.openstack.org/#/c/580720/ now.

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/428481
Reason: Given https://review.openstack.org/#/c/580720/ I think it's safe to abandon this now.

Reviewed: https://review.openstack.org/580720
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=03c596a9f4324e572bc04d4bbad09a6d3d47366c
Submitter: Zuul
Branch: master

commit 03c596a9f4324e572bc04d4bbad09a6d3d47366c
Author: Dan Smith <email address hidden>
Date: Fri Jul 6 09:09:05 2018 -0700

    Avoid requesting DISK_GB allocation for root_gb on BFV instances

    Right now, we still ask placement for a disk allocation covering
    swap, ephemeral, and root disks for the instance even if the instance
    is going to be volume-backed. This patch makes us not include the
    root size in that calculation for placement, avoiding a false
    failure because the volume size is counted against the compute's
    available space.

    To do this, we need another flag in request_spec to track the
    BFV-ness of the instance. Right now, this patch just sets that on
    new builds and the scheduler client assumes a lack of said flag
    as "I don't know, so assume not-BFV" for compatibility. A
    subsequent patch can calculate that flag for existing instances
    so that we will be able to heal over time by migrating instances
    or re-writing their allocations to reflect reality.

    Partial-Bug: #1469179
    Change-Id: I9c2111f7377df65c1fc3c72323f85483b3295989

Related fix proposed to branch: master
Review: https://review.openstack.org/584931

Changed in nova:
assignee: Dan Smith (danms) → Matt Riedemann (mriedem)

Reviewed: https://review.openstack.org/583715
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9344c1995a059675af7a8e9cf0de0502bef46c94
Submitter: Zuul
Branch: master

commit 9344c1995a059675af7a8e9cf0de0502bef46c94
Author: Matt Riedemann <email address hidden>
Date: Wed Jul 18 13:13:45 2018 -0400

    Heal RequestSpec.is_bfv for legacy instances during moves

    Change I9c2111f7377df65c1fc3c72323f85483b3295989 sets the
    RequestSpec.is_bfv flag for newly created instances so
    that scheduling (using placement) does not allocate DISK_GB
    resources for the Flavor's root_gb when booting from volume.

    RequestSpecs for old instances created before that change will
    not have the is_bfv field set, so this change adds a check for
    that in the various move operations (evacuate, unshelve, cold
    migrate and live migrate) and sets the RequestSpec.is_bfv flag
    accordingly.

    The related functional test is updated for the legacy cold
    migrate and heal scenario.

    Change-Id: I8e529ad4d707b2ad012328993892db83ce464c4b
    Closes-Bug: #1469179

Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem) on 2018-07-24
Changed in nova:
assignee: Matt Riedemann (mriedem) → Dan Smith (danms)

Reviewed: https://review.openstack.org/583739
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b537148228a0bbafc618a52af905addd29b2543d
Submitter: Zuul
Branch: master

commit b537148228a0bbafc618a52af905addd29b2543d
Author: Matt Riedemann <email address hidden>
Date: Wed Jul 18 15:55:13 2018 -0400

    Fix wonky reqspec handling in conductor.unshelve_instance

    Removes the populate_retry call since we don't reschedule
    from failed unshelve calls in the compute. That was added
    as a partial fix for bug 1400015 but it was never completed.

    The to_legacy/from_primitives stuff in here dropped the is_bfv
    setting on the request spec, which means we'd have to
    recalculate that every time. Instead, if we're given a valid
    RequestSpec, use it, otherwise create a fake one and then we'll
    heal the RequestSpec.is_bfv field on that one. Plus all of that
    missing request spec compat code should get dropped in Stein
    anyway (finally).

    Related-Bug: #1469179

    Change-Id: I49c4e87d15e6fb0fda1b4efd7252bc5ca2066fb4

This issue was fixed in the openstack/nova 18.0.0.0b3 development milestone.

Reviewed: https://review.openstack.org/584931
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dc5c69d0d118be838245a18fb636b044249a9863
Submitter: Zuul
Branch: master

commit dc5c69d0d118be838245a18fb636b044249a9863
Author: Matt Riedemann <email address hidden>
Date: Mon Jul 23 11:13:37 2018 -0400

    Add shelve/unshelve wrinkle to volume-backed disk func test

    This adds a shelve/unshelve scenario to the functional test
    which checks that root_gb from the flavor does not show up
    in placement for volume-backed servers. Because the shelve
    happens after we've cold migrated the server to a new host,
    the fake virt driver's finish_migration() method needed to
    be implemented to track the instance on the destination host.

    Change-Id: Ica456f2512ebe7814c5d20f205ba89b49c42050a
    Related-Bug: #1469179

Fabo Yi (folkart) wrote :

this bug exists in our Kilo version, we use SAN volumes as cinder storage backend.

bel (varr) wrote :

this bug also exist on rocky/stable with iscsi storage , please any hint or update to solve this issue

Reviewed: https://review.opendev.org/551026
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5406c8bd9b8740a27c60a0ac7983c84e440f0d35
Submitter: Zuul
Branch: master

commit 5406c8bd9b8740a27c60a0ac7983c84e440f0d35
Author: Dan Smith <email address hidden>
Date: Thu Mar 8 14:35:15 2018 -0800

    Remove deprecated CPU, RAM, disk claiming in resource tracker

    In change Id62136d293da55e4bb639635ea5421a33b6c3ea2, we deprecated the
    scheduler filters for CPU, RAM, and Disk since they were no longer
    necessary in the new placement-based world. With these filters disabled,
    we are no longer passing limits down to the resource tracker meaning we
    are treating everything in the claim process as unlimited. This means
    most of the claiming code here, NUMA stuff aside, is now a no-op and can
    be removed. Do just that.

    Change-Id: I2936ce8cb293dc80e1a426094fdae6e675461470
    Co-Authored-By: Stephen Finucane <email address hidden>
    Partial-Bug: #1469179

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers