Mismatch between forced host and AZ prevents move operations

Bug #1934770 reported by Stephen Finucane
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Stephen Finucane
Wallaby
Fix Released
Undecided
Unassigned

Bug Description

When spawning a new instance, it's possible to force the instance to a specific host by using a special 'availability_zone[:host[:node]]' syntax for the 'availability_zone' field in the request. For example, when using OSC:

  openstack server create --availability-zone my-az:my-host ... my-server

Doing so bypasses the scheduler, which means the 'AvailabilityZoneFilter' never runs to validate the availability zone-host combo. As a result, the availability zone portion of this value is effectively ignored and the host will be used regardless of the availability zone requested. This has some nasty side-effects. For one, the availability zone information stored on the instance is generated from the availability zone of the host the instance boots on, *not* the availability zone requested in the host. This means that when a user runs 'openstack server show' or 'openstack server list --long', they'll see different availability zone information to what they requested. However, the value requested *is* recorded in 'RequestSpec' object created for the instance. This is reused if we attempt future move operations and because the availability zone information was never verified, it's possible to end up with an instance that can't be moved since no host with the matching availability zone information exists. The two issues collide with each other since the failure logs in the latter case will reference one availability zone value, while inspecting the instance record will show another value. This is seriously confusing.

The solution seems to be to either (a) error out when an invalid availability zone-host combo is requested or simply ignore the availability zone aspect of the request, opting to use the value of the host instead (with a warning, ideally). Note that microversion 2.74 introduced a better way of requesting a specific host without bypassing the scheduler, using 'host' and 'hypervisor_hostname' fields in the body of the instance create request, however, the old way of doing things is not yet deprecated and even if it was, we'd still have to support this for older microversions. We should fix this DB discrepancy one way or the other.

tags: added: availability-zones
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/801523

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/798145
Committed: https://opendev.org/openstack/nova/commit/8f21ee42bd66b62e75e14acf4e91b786d78b9168
Submitter: "Zuul (22348)"
Branch: master

commit 8f21ee42bd66b62e75e14acf4e91b786d78b9168
Author: Stephen Finucane <email address hidden>
Date: Fri Jun 25 18:51:06 2021 +0100

    api: Align availability zone info with forced host

    Users can create a server like so:

      $ openstack server create --availability-zone az:host ...

    This is a historical way to request that an instance be scheduled to a
    specific host and it causes the scheduler to be bypassed. However, no
    validation of this availability zone-host combo takes place. The host
    could in fact belong to a different availability zone. If it does, we'll
    end up in a very odd situation whereby the RequestSpec record for the
    instance will record the availability zone requested by the user at
    create time, but the Instance record itself will record the availability
    zone of the host on which the instance was scheduled. This leads to even
    more confusing behavior when we attempt to do something like live
    migrate the instance since the RequestSpec record, with its original and
    possibly invalid availability zone information, is used. The
    'AvailabilityZoneFilter' will fail an error message like the following:

      Availability Zone 'foo' requested. ... has AZs: bar

    but the 'openstack server list --long' command will show a non-foo value
    for the availability zone column.

    The solution is simple: when given an availability zone-host combo, make
    sure the availability zone requested matches that of the host (or, more
    specifically, the host is a member of the host aggregates that form the
    availability zone [1]). If not, simply ignore the requested availability
    zone information in favour of using the availability zone of the host,
    logging a warning just for record keeping purposes. This is deemed
    preferable to failing with HTTP 400 (Bad Request) since what users are
    really requesting by using this was to schedule to a specific host: the
    availability zone portion of the request is really irrelevant and just
    an artifact of this legacy mechanism to request hosts. If users wish to
    truly validate a host-availability zone combo, they can use the 'host'
    field introduced in microversion 2.74 along with the 'availability_zone'
    field:

      $ openstack server create --availability-zone az --host host ...

    [1] https://docs.openstack.org/nova/latest/admin/aggregates.html

    Change-Id: Iac0e634e66cd4e150a50935cf635f626fc11b70e
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1934770

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/802236

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/802237

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/802239

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/802241

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/802236
Committed: https://opendev.org/openstack/nova/commit/58782403cdaad33856fd59715525cee3c63ee3cf
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 58782403cdaad33856fd59715525cee3c63ee3cf
Author: Stephen Finucane <email address hidden>
Date: Fri Jun 25 18:51:06 2021 +0100

    api: Align availability zone info with forced host

    Users can create a server like so:

      $ openstack server create --availability-zone az:host ...

    This is a historical way to request that an instance be scheduled to a
    specific host and it causes the scheduler to be bypassed. However, no
    validation of this availability zone-host combo takes place. The host
    could in fact belong to a different availability zone. If it does, we'll
    end up in a very odd situation whereby the RequestSpec record for the
    instance will record the availability zone requested by the user at
    create time, but the Instance record itself will record the availability
    zone of the host on which the instance was scheduled. This leads to even
    more confusing behavior when we attempt to do something like live
    migrate the instance since the RequestSpec record, with its original and
    possibly invalid availability zone information, is used. The
    'AvailabilityZoneFilter' will fail an error message like the following:

      Availability Zone 'foo' requested. ... has AZs: bar

    but the 'openstack server list --long' command will show a non-foo value
    for the availability zone column.

    The solution is simple: when given an availability zone-host combo, make
    sure the availability zone requested matches that of the host (or, more
    specifically, the host is a member of the host aggregates that form the
    availability zone [1]). If not, simply ignore the requested availability
    zone information in favour of using the availability zone of the host,
    logging a warning just for record keeping purposes. This is deemed
    preferable to failing with HTTP 400 (Bad Request) since what users are
    really requesting by using this was to schedule to a specific host: the
    availability zone portion of the request is really irrelevant and just
    an artifact of this legacy mechanism to request hosts. If users wish to
    truly validate a host-availability zone combo, they can use the 'host'
    field introduced in microversion 2.74 along with the 'availability_zone'
    field:

      $ openstack server create --availability-zone az --host host ...

    Conflicts:
      nova/compute/api.py

    NOTE(stephenfin): Conflicts are trivial and due to the absence of change
    I81fec10535034f3a81d46713a6eda813f90561cf ("Remove references to
    'instance_type'") which we don't want to backport here.

    [1] https://docs.openstack.org/nova/latest/admin/aggregates.html

    Change-Id: Iac0e634e66cd4e150a50935cf635f626fc11b70e
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1934770
    (cherry picked from commit 8f21ee42bd66b62e75e14acf4e91b786d78b9168)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/801523
Committed: https://opendev.org/openstack/nova/commit/4ee2f667b563cb485e743eefed89d1460e79ea5c
Submitter: "Zuul (22348)"
Branch: master

commit 4ee2f667b563cb485e743eefed89d1460e79ea5c
Author: Stephen Finucane <email address hidden>
Date: Tue Jul 20 19:13:10 2021 +0100

    tests: Validate AZ values

    As a follow-up for change Iac0e634e66cd4e150a50935cf635f626fc11b70e,
    actually validate the AZ values we're setting at the API level when a
    conflict is detected.

    Change-Id: I018ae0e1ae72591bc34292843a9ac94fff7d2e01
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-Bug: #1934770

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.0.0.0rc1

This issue was fixed in the openstack/nova 24.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.1.0

This issue was fixed in the openstack/nova 23.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/802241
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ussuri)

Change abandoned by "Stephen Finucane <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/802239

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Stephen Finucane <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/802237

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.