shelved offloaded server still shows old AZ while shelved

Bug #1790221 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Matt Riedemann
Pike
Fix Committed
Low
Matt Riedemann
Queens
Fix Committed
Low
Matt Riedemann
Rocky
Fix Committed
Low
Matt Riedemann

Bug Description

When a server is shelved (and offloaded from the compute host), the instance.host and instance.node values are cleared because it's no longer on any host:

https://github.com/openstack/nova/blob/bb14337c30df0c17bc1dadc00d5a5500ae2dc4b7/nova/compute/manager.py#L5007

However, the instance.availability_zone is not cleared, so we still see the old host AZ on the instance from the API.

Recreate steps:

1. create an AZ and put a host in it (this is single-node devstack created from master today):

stack@stein:~$ openstack aggregate add host DC1 stein
+-------------------+--------------------------------+
| Field | Value |
+-------------------+--------------------------------+
| availability_zone | DC1 |
| created_at | 2018-08-31T21:01:06.000000 |
| deleted | False |
| deleted_at | None |
| hosts | [u'stein'] |
| id | 1 |
| metadata | {u'availability_zone': u'DC1'} |
| name | DC1 |
| updated_at | None |
+-------------------+--------------------------------+

2. Create a server - in this case there is only 1 host and it's in an AZ so that's what shows up for the server output:

stack@stein:~$ openstack server create --flavor m1.tiny --image cirros-0.3.5-x86_64-disk --wait test-shelve-az

+-------------------------------------+-----------------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | DC1 |
| OS-EXT-SRV-ATTR:host | stein |
| OS-EXT-SRV-ATTR:hypervisor_hostname | stein |
| OS-EXT-SRV-ATTR:instance_name | instance-00000002 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2018-08-31T21:05:51.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | private=fdde:1239:d41d:0:f816:3eff:fe72:1e68, 10.0.0.9 |
| adminPass | HSetKiH8g336 |
| config_drive | |
| created | 2018-08-31T21:05:44Z |
| flavor | m1.tiny (1) |
| hostId | 50c671bf8b1d64ad108a3d9ee3bfa53efcbdffd15d93060a33c1452c |
| id | 1fee3708-d2b1-4a76-b392-fa255ba426f6 |
| image | cirros-0.3.5-x86_64-disk (94295199-2883-4314-a66b-ac854f62c02f) |
| key_name | None |
| name | test-shelve-az |
| progress | 0 |
| project_id | 567c6a1c89f04c2985c3a8dc48a3aa5d |
| properties | |
| security_groups | name='default' |
| status | ACTIVE |
| updated | 2018-08-31T21:05:51Z |
| user_id | 71835b0bbc6c4baabbf666e44be7af25 |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------------------------+

3. shelve (offload) the server and verify the host and hypervisor_hostname are no longer set for the instance but the AZ still is:

stack@stein:~$ openstack server shelve test-shelve-az
stack@stein:~$ openstack server show test-shelve-az
+-------------------------------------+-----------------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | DC1 |
| OS-EXT-SRV-ATTR:host | None |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None |
| OS-EXT-SRV-ATTR:instance_name | instance-00000002 |
| OS-EXT-STS:power_state | Shutdown |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | shelved_offloaded |
| OS-SRV-USG:launched_at | 2018-08-31T21:05:51.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | private=fdde:1239:d41d:0:f816:3eff:fe72:1e68, 10.0.0.9 |
| config_drive | |
| created | 2018-08-31T21:05:44Z |
| flavor | m1.tiny (1) |
| hostId | |
| id | 1fee3708-d2b1-4a76-b392-fa255ba426f6 |
| image | cirros-0.3.5-x86_64-disk (94295199-2883-4314-a66b-ac854f62c02f) |
| key_name | None |
| name | test-shelve-az |
| project_id | 567c6a1c89f04c2985c3a8dc48a3aa5d |
| properties | |
| security_groups | name='default' |
| status | SHELVED_OFFLOADED |
| updated | 2018-08-31T21:06:31Z |
| user_id | 71835b0bbc6c4baabbf666e44be7af25 |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------------------------+

Tags: shelve
Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/599087

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/599087
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=771b9eaa71742b7a158c2e7759a3046ea5a6fc3a
Submitter: Zuul
Branch: master

commit 771b9eaa71742b7a158c2e7759a3046ea5a6fc3a
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 31 18:14:25 2018 -0400

    Null out instance.availability_zone on shelve offload

    When a user shelve offloads a server, the compute manager
    nulls out the instance host and node attributes. However
    the availability_zone attribute is left set on the instance
    so the API will show the instance as in an AZ when really
    it's not, because an instance that is not on a host is not
    in a host aggregate so it can't be in an AZ.

    Keep in mind that there are two ways an instance can be in
    an AZ:

    1. The user specifically requests to create the server in
       the AZ.

    2. The user does not request an AZ and one is assigned via
       the selected host during server create (or resize, etc).

    For the first case, the server will always remain in the
    user-requested AZ even after shelve/unshelve. But in the
    second case, unshelving the server can result in the server
    being spawned on a new host in a different AZ - the scheduler
    does not restrict the AZ in that second case.

    This change nulls out the instance.availability_zone just like
    the host and node fields on shelve offload since it's
    confusing to show a shelved offloaded server in an AZ that
    doesn't have a host.

    Final note: the _nil_out_instance_obj_host_and_node method
    is also called during server create if the build fails and
    is aborted or rescheduled to another host, and in the case
    that unshelve fails. In the case of a server build reschedule,
    conductor will set the instance.availability_zone appropriately
    based on the alternate host for the reschedule. In the other
    failure cases, leaving the instance AZ null is appropriate.

    Change-Id: I25a4f36027390def83cfe25f4f3b4af9660da502
    Closes-Bug: #1790221

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/606086

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/606155

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/606161

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/606086
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b3060a9d3103fb8ade6ebeaa53cb4dd39308f8af
Submitter: Zuul
Branch: stable/rocky

commit b3060a9d3103fb8ade6ebeaa53cb4dd39308f8af
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 31 18:14:25 2018 -0400

    Null out instance.availability_zone on shelve offload

    When a user shelve offloads a server, the compute manager
    nulls out the instance host and node attributes. However
    the availability_zone attribute is left set on the instance
    so the API will show the instance as in an AZ when really
    it's not, because an instance that is not on a host is not
    in a host aggregate so it can't be in an AZ.

    Keep in mind that there are two ways an instance can be in
    an AZ:

    1. The user specifically requests to create the server in
       the AZ.

    2. The user does not request an AZ and one is assigned via
       the selected host during server create (or resize, etc).

    For the first case, the server will always remain in the
    user-requested AZ even after shelve/unshelve. But in the
    second case, unshelving the server can result in the server
    being spawned on a new host in a different AZ - the scheduler
    does not restrict the AZ in that second case.

    This change nulls out the instance.availability_zone just like
    the host and node fields on shelve offload since it's
    confusing to show a shelved offloaded server in an AZ that
    doesn't have a host.

    Final note: the _nil_out_instance_obj_host_and_node method
    is also called during server create if the build fails and
    is aborted or rescheduled to another host, and in the case
    that unshelve fails. In the case of a server build reschedule,
    conductor will set the instance.availability_zone appropriately
    based on the alternate host for the reschedule. In the other
    failure cases, leaving the instance AZ null is appropriate.

    Conflicts:
          doc/notification_samples/instance-delete-end_not_scheduled.json
          doc/notification_samples/instance-delete-start_not_scheduled.json

    NOTE(mriedem): The conflict is due to not having change
    If0693eab2ed31b5fbfe6cbafa5d67b69c2ed8442 in Rocky.

    Change-Id: I25a4f36027390def83cfe25f4f3b4af9660da502
    Closes-Bug: #1790221
    (cherry picked from commit 771b9eaa71742b7a158c2e7759a3046ea5a6fc3a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.2

This issue was fixed in the openstack/nova 18.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/606155
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=437211bf37aacb6bc9490e2443984a4428c70779
Submitter: Zuul
Branch: stable/queens

commit 437211bf37aacb6bc9490e2443984a4428c70779
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 31 18:14:25 2018 -0400

    Null out instance.availability_zone on shelve offload

    When a user shelve offloads a server, the compute manager
    nulls out the instance host and node attributes. However
    the availability_zone attribute is left set on the instance
    so the API will show the instance as in an AZ when really
    it's not, because an instance that is not on a host is not
    in a host aggregate so it can't be in an AZ.

    Keep in mind that there are two ways an instance can be in
    an AZ:

    1. The user specifically requests to create the server in
       the AZ.

    2. The user does not request an AZ and one is assigned via
       the selected host during server create (or resize, etc).

    For the first case, the server will always remain in the
    user-requested AZ even after shelve/unshelve. But in the
    second case, unshelving the server can result in the server
    being spawned on a new host in a different AZ - the scheduler
    does not restrict the AZ in that second case.

    This change nulls out the instance.availability_zone just like
    the host and node fields on shelve offload since it's
    confusing to show a shelved offloaded server in an AZ that
    doesn't have a host.

    Final note: the _nil_out_instance_obj_host_and_node method
    is also called during server create if the build fails and
    is aborted or rescheduled to another host, and in the case
    that unshelve fails. In the case of a server build reschedule,
    conductor will set the instance.availability_zone appropriately
    based on the alternate host for the reschedule. In the other
    failure cases, leaving the instance AZ null is appropriate.

    Conflicts:
          doc/notification_samples/instance-shelve_offload-end.json

    NOTE(mriedem): The conflict is due to not having change
    Ic64f89d33a985cf6121ddc198380902a5e936ec4 in Queens.

    Change-Id: I25a4f36027390def83cfe25f4f3b4af9660da502
    Closes-Bug: #1790221
    (cherry picked from commit 771b9eaa71742b7a158c2e7759a3046ea5a6fc3a)
    (cherry picked from commit b3060a9d3103fb8ade6ebeaa53cb4dd39308f8af)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.10

This issue was fixed in the openstack/nova 17.0.10 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/606161
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=de7e5ae4e4786d3275e2ada688032a4a1cbc055a
Submitter: Zuul
Branch: stable/pike

commit de7e5ae4e4786d3275e2ada688032a4a1cbc055a
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 31 18:14:25 2018 -0400

    Null out instance.availability_zone on shelve offload

    When a user shelve offloads a server, the compute manager
    nulls out the instance host and node attributes. However
    the availability_zone attribute is left set on the instance
    so the API will show the instance as in an AZ when really
    it's not, because an instance that is not on a host is not
    in a host aggregate so it can't be in an AZ.

    Keep in mind that there are two ways an instance can be in
    an AZ:

    1. The user specifically requests to create the server in
       the AZ.

    2. The user does not request an AZ and one is assigned via
       the selected host during server create (or resize, etc).

    For the first case, the server will always remain in the
    user-requested AZ even after shelve/unshelve. But in the
    second case, unshelving the server can result in the server
    being spawned on a new host in a different AZ - the scheduler
    does not restrict the AZ in that second case.

    This change nulls out the instance.availability_zone just like
    the host and node fields on shelve offload since it's
    confusing to show a shelved offloaded server in an AZ that
    doesn't have a host.

    Final note: the _nil_out_instance_obj_host_and_node method
    is also called during server create if the build fails and
    is aborted or rescheduled to another host, and in the case
    that unshelve fails. In the case of a server build reschedule,
    conductor will set the instance.availability_zone appropriately
    based on the alternate host for the reschedule. In the other
    failure cases, leaving the instance AZ null is appropriate.

    Change-Id: I25a4f36027390def83cfe25f4f3b4af9660da502
    Closes-Bug: #1790221
    (cherry picked from commit 771b9eaa71742b7a158c2e7759a3046ea5a6fc3a)
    (cherry picked from commit b3060a9d3103fb8ade6ebeaa53cb4dd39308f8af)
    (cherry picked from commit 437211bf37aacb6bc9490e2443984a4428c70779)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.8

This issue was fixed in the openstack/nova 16.1.8 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.