RequestedVRamTooHigh failures during server create will get needlessly rescheduled

Bug #1770726 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
Matt Riedemann
Ocata
New
Low
Unassigned
Pike
New
Low
Unassigned
Queens
In Progress
Low
Matt Riedemann

Bug Description

RequestedVRamTooHigh can be raised during server create from the libvirt driver if the flavor's max video ram is less than the hw_video_ram specified in the image, and because the compute manager isn't handling that specific exception, it will result in a RescheduledException to another compute host, but this is never going to work on another compute host so we should just handle it and abort the build, since it's user-error.

Long-term, we should be validating this in the REST API instead of on the compute although not all virt drivers support the image property and flavor extra spec - currently only the libvirt driver does but there is also a patch to add support for it with the vmware driver:

https://review.openstack.org/#/c/564193/

https://github.com/openstack/nova/blob/bfcbfdf9169765219320abfa3e09ecda8ff1a80c/nova/virt/libvirt/driver.py#L4608

https://github.com/openstack/nova/blob/bfcbfdf9169765219320abfa3e09ecda8ff1a80c/nova/compute/manager.py#L2095

Matt Riedemann (mriedem)
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/567929

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/567929
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cd53a181be22291e6f9be1430cab12ba1f381b8a
Submitter: Zuul
Branch: master

commit cd53a181be22291e6f9be1430cab12ba1f381b8a
Author: Matt Riedemann <email address hidden>
Date: Fri May 11 14:44:47 2018 -0400

    Don't reschedule on RequestedVRamTooHigh errors

    The libvirt driver validates the hw_video_ram image property,
    if specified, and the flavor extra spec "hw_video:ram_max_mb"
    is set. If validation fails, the libvirt driver raises
    RequestedVRamTooHigh which is not handled explicitly in
    ComputeManager._build_and_run_instance so it will result in
    a RescheduledException to another compute to retry the spawn
    but that will always fail because this isn't something that
    is per-compute host.

    This change adds the error handling in _build_and_run_instance
    so that we'll fail and abort the build and not reschedule.

    Long-term, this validation should be moved into the API code
    since it's not specific to a compute host and would be user
    error that should result in a 400 response.

    Change-Id: I93b409ca2b7b36400759ee9d2cd5b71c6df186ab
    Partial-Bug: #1770726

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/568642

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/568642
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=177ad2dcfc19e891994c69c678feda4190547e3f
Submitter: Zuul
Branch: stable/queens

commit 177ad2dcfc19e891994c69c678feda4190547e3f
Author: Matt Riedemann <email address hidden>
Date: Fri May 11 14:44:47 2018 -0400

    Don't reschedule on RequestedVRamTooHigh errors

    The libvirt driver validates the hw_video_ram image property,
    if specified, and the flavor extra spec "hw_video:ram_max_mb"
    is set. If validation fails, the libvirt driver raises
    RequestedVRamTooHigh which is not handled explicitly in
    ComputeManager._build_and_run_instance so it will result in
    a RescheduledException to another compute to retry the spawn
    but that will always fail because this isn't something that
    is per-compute host.

    This change adds the error handling in _build_and_run_instance
    so that we'll fail and abort the build and not reschedule.

    Long-term, this validation should be moved into the API code
    since it's not specific to a compute host and would be user
    error that should result in a 400 response.

    Change-Id: I93b409ca2b7b36400759ee9d2cd5b71c6df186ab
    Partial-Bug: #1770726
    (cherry picked from commit cd53a181be22291e6f9be1430cab12ba1f381b8a)

tags: added: in-stable-queens
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.