Simple user can disable compute

Bug #1742102 reported by Kevin Tibi
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
High
Unassigned
Pike
New
Undecided
Unassigned
Queens
New
Undecided
Unassigned
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned

Bug Description

Hi,

When I tested a fresh deploy of Pike, I created a private network with a little subnet like /28. If you try to create a lot of new instances, nova failed because which doesn't have free IP for the creation of new instances.

The fail trace is https://thepasteb.in/p/zmh8qDG2ZYJIZ

So after that, the trigger consecutive_build_service_disable_threshold up to 10 very fast and computes are disable.

Revision history for this message
Matt Riedemann (mriedem) wrote :

We discussed this a bit in IRC, and this is a failure after we've created the port and when we go to plug the VIF for the instance, that fails. Neutron doesn't send any information with the network-vif-plugged event, just that it failed.

Are you able to trace the vif plug operation on the neutron side and see what specific error resulted in the vif plug failing?

On the nova side, you should see something in the logs about a network-vif-plugged event for instance 80e4fb95-da3a-4725-9bb4-9cbb514ebcb1 and you should be able to see the request ID in the request from neutron, and trace that back to the failure on the neutron side that generated the call to nova.

tags: added: compute
Changed in nova:
status: New → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :

My point is we could add a whitelist of exceptions that shouldn't count to disabling a compute but we need to know what it is, rather than just generically ignore VirtualInterfaceCreateException since that could be any number of failures.

Revision history for this message
Kevin Tibi (ktibi) wrote :

same bug with create 100 instances on volume (root disk is a volume). If quota volume is 10 max, you have 90 fails and nova disable 9 computes.

Revision history for this message
Matt Riedemann (mriedem) wrote :

The volume create over limit is a good example. Regardless of this bug with the auto-disable stuff, it would probably be in our best interest to do a quota check in the API when we know nova is going to create the volume (which happens late in the compute service) and then we could fail fast in the server create API if we know we don't have enough volume quota for the boot from volume request. We already do a port quota check in the API during server create, so it seems we should also do that for volumes.

jichenjc (jichenjc)
Changed in nova:
assignee: nobody → jichenjc (jichenjc)
Revision history for this message
jichenjc (jichenjc) wrote :

we can refer to https://review.openstack.org/#/c/520158 to add the check for volume quota , the utils function can be reused to get cinder quota limits

Revision history for this message
jichenjc (jichenjc) wrote :
Download full text (3.6 KiB)

I made quota of volme to 1 then try

stack@k8s:~$ nova boot --block-device source=blank,dest=volume,size=1,bootindex=1 --flavor 1 --security-group webservers --image 5ba4945d-d190-4205-b50c-3c957410061b ji1

and see ,so basically we can switch to api layer validation to avoid 'ERROR' instance stuck there

| fault | {"message": "Build of instance 3eb1baef-50e6-4618-a1b8-fd8f8a0f1458 aborted: VolumeLimitExceeded: Maximum number of volumes allowed (1) exceeded for quota 'volumes'.", "code": 500, "details": " File \"/opt/stack/nova/nova/compute/manager.py\", line 1840, in _do_build_and_run_instance |
| | filter_properties, request_spec) |
| | File \"/opt/stack/nova/nova/compute/manager.py\", line 2053, in _build_and_run_instance |
| | bdms=block_device_mapping) |
| | File \"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py\", line 220, in __exit__ |
| | self.force_reraise() |
| | File \"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py\", line 196, in force_reraise |
| | six.reraise(self.type_, self.value, self.tb) |
| | File \"/opt/stack/nova/nova/compute/manager.py\", line 2005, in _build_and_run_instance ...

Read more...

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Revision history for this message
Mohammed Naser (mnaser) wrote :

I'd like to pick this up and take care of it, but we need to have a discussion at the ideal resolution for this bug.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm trying to think of a simple way to at least mark some failures as not something to be counted against the consecutive build failure count. The easy case is volume overquota when booting from volume and nova is trying to create a volume. We handle that OverQuota here and raise BuildAbortException:

https://github.com/openstack/nova/blob/7bdb7dbbddf9fcb4284d490bf315d6756f4015e7/nova/compute/manager.py#L2209

That BuildAbortException is eventually handled here:

https://github.com/openstack/nova/blob/7bdb7dbbddf9fcb4284d490bf315d6756f4015e7/nova/compute/manager.py#L1903

The problem is, that method returns essentially an enum (build_results.FAILED) and that's what is checked when counting against build failures to disable the compute:

https://github.com/openstack/nova/blob/7bdb7dbbddf9fcb4284d490bf315d6756f4015e7/nova/compute/manager.py#L1750

So the code in _build_failed() doesn't have context about the actual failure which makes whitelisting certain types of failures hard.

I was thinking that in this block when we raise BuildAbortException:

https://github.com/openstack/nova/blob/7bdb7dbbddf9fcb4284d490bf315d6756f4015e7/nova/compute/manager.py#L2209

If we were handling an OverQuota error, we could set a flag on BuildAbortException and check that later up the stack in the _build_failed() logic but since _build_failed() doesn't get the actual exception, just build_results.FAILED enum (which only exists for the "build_instance" hook by the way), we can't do that.

One super hacky thing we could do is if we get OverQuota during _prep_block_devices is reset self._failed_builds to 0 like here:

https://github.com/openstack/nova/blob/7bdb7dbbddf9fcb4284d490bf315d6756f4015e7/nova/compute/manager.py#L2209

But if consecutive_build_service_disable_threshold is set to 1, then _build_failed() will still disable the compute service which is what we don't want in this case.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Alternatively we could add a new build_results enum value like "failed-not-misconfig" and skip calling _build_failed() in that case, but that just bakes more dependent logic into the hooks code that has been deprecated since Mitaka.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/568953

Changed in nova:
assignee: jichenjc (jichenjc) → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Changed in ossa:
status: New → Incomplete
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Is this affecting pike and queens?

Jeremy Stanley (fungi)
information type: Public → Public Security
Revision history for this message
Joshua Harlow (harlowja) wrote :

Yes; both.

Revision history for this message
Matt Riedemann (mriedem) wrote :

One simple tactical fix for this would be to change the default on the consecutive_build_service_disable_threshold config option to essentially disable it by default, so that people upgrading to that or installing openstack for the first time don't hit this, and then document in the option itself the scenarios in which it might be OK to enable it while working on a more strategic fix (like if you're spreading instead of packing (default) in the scheduler, then this is less of an issue). Since it's a more advanced feature and obviously has unintended consequences, it should be something operators need to opt into.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/572195

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/572195
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=91e29079a0eac825c5f4fe793cf607cb1771467d
Submitter: Zuul
Branch: master

commit 91e29079a0eac825c5f4fe793cf607cb1771467d
Author: Dan Smith <email address hidden>
Date: Mon Jun 4 10:21:37 2018 -0700

    Change consecutive build failure limit to a weigher

    There is concern over the ability for compute nodes to reasonably
    determine which events should count against its consecutive build
    failures. Since the compute may erronenously disable itself in
    response to mundane or otherwise intentional user-triggered events,
    this patch adds a scheduler weigher that considers the build failure
    counter and can negatively weigh hosts with recent failures. This
    avoids taking computes fully out of rotation, rather treating them as
    less likely to be picked for a subsequent scheduling
    operation.

    This introduces a new conf option to control this weight. The default
    is set high to maintain the existing behavior of picking nodes that
    are not experiencing high failure rates, and resetting the counter as
    soon as a single successful build occurs. This is minimal visible
    change from the existing behavior with default configuration.

    The rationale behind the default value for this weigher comes from the
    values likely to be generated by its peer weighers. The RAM and Disk
    weighers will increase the score by number of available megabytes of
    memory (range in thousands) and disk (range in millions). The default
    value of 1000000 for the build failure weigher will cause competing
    nodes with similar amounts of available disk and a small (less than ten)
    number of failures to become less desirable than those without, even
    with many terabytes of available disk.

    Change-Id: I71c56fe770f8c3f66db97fa542fdfdf2b9865fb8
    Related-Bug: #1742102

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/573239

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/573248

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/573239
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=43a84dbc1ebf147d43451610b76c700a31e08f4b
Submitter: Zuul
Branch: stable/queens

commit 43a84dbc1ebf147d43451610b76c700a31e08f4b
Author: Dan Smith <email address hidden>
Date: Mon Jun 4 10:21:37 2018 -0700

    Change consecutive build failure limit to a weigher

    There is concern over the ability for compute nodes to reasonably
    determine which events should count against its consecutive build
    failures. Since the compute may erronenously disable itself in
    response to mundane or otherwise intentional user-triggered events,
    this patch adds a scheduler weigher that considers the build failure
    counter and can negatively weigh hosts with recent failures. This
    avoids taking computes fully out of rotation, rather treating them as
    less likely to be picked for a subsequent scheduling
    operation.

    This introduces a new conf option to control this weight. The default
    is set high to maintain the existing behavior of picking nodes that
    are not experiencing high failure rates, and resetting the counter as
    soon as a single successful build occurs. This is minimal visible
    change from the existing behavior with default configuration.

    The rationale behind the default value for this weigher comes from the
    values likely to be generated by its peer weighers. The RAM and Disk
    weighers will increase the score by number of available megabytes of
    memory (range in thousands) and disk (range in millions). The default
    value of 1000000 for the build failure weigher will cause competing
    nodes with similar amounts of available disk and a small (less than ten)
    number of failures to become less desirable than those without, even
    with many terabytes of available disk.

    Change-Id: I71c56fe770f8c3f66db97fa542fdfdf2b9865fb8
    Related-Bug: #1742102
    (cherry picked from commit 91e29079a0eac825c5f4fe793cf607cb1771467d)

tags: added: in-stable-queens
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/573248
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bdb5c3b4057b8b3c1ba27c70e237e5fa71064694
Submitter: Zuul
Branch: stable/pike

commit bdb5c3b4057b8b3c1ba27c70e237e5fa71064694
Author: Dan Smith <email address hidden>
Date: Mon Jun 4 10:21:37 2018 -0700

    Change consecutive build failure limit to a weigher

    There is concern over the ability for compute nodes to reasonably
    determine which events should count against its consecutive build
    failures. Since the compute may erronenously disable itself in
    response to mundane or otherwise intentional user-triggered events,
    this patch adds a scheduler weigher that considers the build failure
    counter and can negatively weigh hosts with recent failures. This
    avoids taking computes fully out of rotation, rather treating them as
    less likely to be picked for a subsequent scheduling
    operation.

    This introduces a new conf option to control this weight. The default
    is set high to maintain the existing behavior of picking nodes that
    are not experiencing high failure rates, and resetting the counter as
    soon as a single successful build occurs. This is minimal visible
    change from the existing behavior with default configuration.

    The rationale behind the default value for this weigher comes from the
    values likely to be generated by its peer weighers. The RAM and Disk
    weighers will increase the score by number of available megabytes of
    memory (range in thousands) and disk (range in millions). The default
    value of 1000000 for the build failure weigher will cause competing
    nodes with similar amounts of available disk and a small (less than ten)
    number of failures to become less desirable than those without, even
    with many terabytes of available disk.

    Conflicts:
     nova/conf/scheduler.py
     nova/test.py

    NOTE(danms): The conflict was due to not having changes
    Icee137e15f264da59a1bdc1dc1ecfeaac82b98c6 and
    I911cc51a226d6af29d63a7a2c69253de870073e9 in Queens.

    NOTE(danms): Because IronicHostManager was a thing in pike, this
    includes the fix applied late to queens in this commit:
    d26dc0ca03e9cc9a04ac02d88ba2d2867340f5cd

    Change-Id: I71c56fe770f8c3f66db97fa542fdfdf2b9865fb8
    Related-Bug: #1742102
    (cherry picked from commit 91e29079a0eac825c5f4fe793cf607cb1771467d)
    (cherry picked from commit 43a84dbc1ebf147d43451610b76c700a31e08f4b)

tags: added: in-stable-pike
Revision history for this message
Jeremy Stanley (fungi) wrote :

On discussing with Dan Smith, the related denial of service condition described in this report has been a known risk since the introduction of the feature and generally falls below the threshold for broad publication in an advisory. The related fixes merged back as far as stable/pike will mitigate it (or can be tuned to greater extremes to do so if necessary) and are accompanied by a security release note. Since this report is already public, I'm going to mark this as a security hardening opportunity (class D in our VMT report taxonomy[*]) with no OSSA task needed. If there is a strong objection that an advisory is needed, then we can revisit publishing one.

[*] https://security.openstack.org/vmt-process.html#incident-report-taxonomy

information type: Public Security → Public
Changed in ossa:
status: Incomplete → Won't Fix
tags: added: security
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/568953

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.