renaming availability zone doesn't modify host's availability zone

Bug #1378904 reported by Guillaume Winter
50
This bug affects 10 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Andrey Volkov
Rocky
Fix Released
Medium
Andrey Volkov

Bug Description

Hi,

After renaming our availability zones via Horizon Dashboard, we couldn't migrate any "old" instance anymore, the scheduler returning "No valid Host found"...

After searching, we found in the nova DB `instances` table, the "availability_zone" field contains the name of the availability zone, instead of the ID ( or maybe it is intentional ;) ).

So renaming AZ leaves the hosts created prior to this rename orphan and the scheduler cannot find any valid host for them...

Our openstack install is on debian wheezy, with the icehouse "official" repository from archive.gplhost.com/debian/, up to date.

If you need any more infos, I'd be glad to help.

Cheers

Changed in nova:
status: New → Confirmed
tags: added: compute db
Fan Guo (faguo)
Changed in nova:
assignee: nobody → Fan Guo (faguo)
Revision history for this message
xens (r-aviolat) wrote :

I faced the same problem on my setup:

OS: Ubuntu 14.04
OpenStack: Kilo from the ubuntu-cloud repo

I could rename the availability-zone from the dashboard or the API but I couldn't migrate the instances anymore. I had to edit the following fields:

db->nova,table->instances,row->availability_zone
db->neutron,table->ports,row->device_owner

after the manually update the VMs could be migrated again.

Revision history for this message
leehom (feli5) wrote :

I also faced the same problem on my setup:

OS: CentOS Linux release 7.1.1503 (Core)
OpenStack: Juno from rdo-release

I have two compute nodes, compute1 in AZ1 and compute2 in AZ2.
I created one instance on each compute, which are instance1 in compute1 and instance2 in compute2.

I can live migrate instance1 to compute2, after migration, available zone was changed from AZ1 to AZ2 on dashboard, but when check the instances table in nova database, instance1's available zone is still AZ1.
So though compute1 and compute2 are in different AZ, I can cold migrate instance1 back to compute1.

Change Host Aggregates is just like live migration, instances AZ is changed on dashboard, but not really changed in dashboard.

BTW, when get instance's detail by using "nova --debug list", it also prints the wrong result

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
assignee: Fan Guo (faguo) → nobody
status: Confirmed → Expired
Revision history for this message
John Kruszewski (jiggernaut) wrote :

CONFIRMED FOR: MITAKA
$ nova-manage --version
13.0.0

# Reproduce Steps:
1) Create a HA/AZ. ie., "AZ1"
2) Add compute nodes to "AZ1" (Admin->System->Host Aggregates->Manage Hosts)
3) Launch VM in this AZ.
4) Live migrate/migrate VM - will succeed
5) Create a new HA/AZ. ie., "AZ2"
6) Remove compute nodes from "AZ1"
7) Add compute nodes to "AZ2"
8) Try to migrate VM

Fails with ERROR: Error: No valid host was found. There are not enough hosts available. compute-1: (AvailabilityZoneFilter) avail zone az1 not in host AZ: set([u'az2'])

# nova-scheduler.log
2016-11-17 21:08:38.690 168453 INFO nova.filters [req-e9cede77-e888-4553-83d6-4e112a8e44a7 59d4a769c88545acb86f646b2464f4d1 93dd4afc2ddb4bfd88d8b5d13d348998 - - -] AvailabilityZoneFilter: (compute-1) REJECT: avail zone az1 not in host AZ: set([u'az2'])

# nova show <uuid> displays correct AZ for VM
| OS-EXT-AZ:availability_zone | az2

# however nova --debug list displays in the RESP BODY:
"OS-EXT-AZ:availability_zone": "az1"

# check the VM in DB, availability_zone is still listed as 'az1' as well.

Changed in nova:
status: Expired → New
Revision history for this message
Sean Dague (sdague) wrote :

This looks like a real issue. The short term fix for this would be to also update instances when availability_zone name is updated over the API.

Changed in nova:
status: New → Triaged
importance: Undecided → Low
tags: added: api availability-zones
tags: added: low-hanging-fruit
Changed in nova:
assignee: nobody → Anusha Unnam (anusha-unnam)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/419502

Changed in nova:
assignee: Anusha Unnam (anusha-unnam) → Radoslav Gerganov (rgerganov)
status: Triaged → In Progress
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version icehouse in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.icehouse
Revision history for this message
Sayali Lunkad (sayalilunkad) wrote :

CONFIRMED FOR NEWTON. After renaming the availability zone which has some instances running in it manual changes to the nova database need to be made in order to update the availability_zone. After that if live migration is carried out without specifying a host or anything which requires the AvailabilityZoneFilter returns no host found and this error message in the nova-scheduler.log
 INFO nova.filters [req-f0377310-f38a-47d8-a01b-65d90d79f202 7682adbc49f741ae8555915f95776a7b 07d042659b65465794e46ceafb1a1397 - - -] Filter AvailabilityZoneFilter returned 0 hosts
2017-08-24 10:04:17.687 20388 INFO nova.filters [req-f0377310-f38a-47d8-a01b-65d90d79f202 7682adbc49f741ae8555915f95776a7b 07d042659b65465794e46ceafb1a1397 - - -] Filtering removed all hosts for the request with
 instance ID 'd08b2dd4-039b-4de0-98f4-28c00c0c8cee'. Filter results: ['RetryFilter: (start: 2, end: 2)', 'AvailabilityZoneFilter: (start: 2, end: 0)']

Adding some addition info to the log point to this:
Availability Zone 'texas' requested. (d52-54-77-77-01-04, d52-54-77-77-01-04.c14.cloud.suse.de) ram: 1991MB disk: 14336MB io_ops: 0 instances: 0 has AZs: set([u'austin'])
for all the available hosts.

Revision history for this message
Sayali Lunkad (sayalilunkad) wrote :

After renaming availability zones via horizon the live migrate option for instances in horizon does not work unless a host is specified. This is due to the AvailibilityZone filter which removes all hosts because the availibility_zone of the host is obtained from the RequestSpec object which is stored in the request_specs table in the nova_api database. When availability_zone is renamed it is not updated in this table hence the mismatch. Not sure if we need to file another bug for this.

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

There is clear upstream consensus on the fact that since availability zones are for end-users, updating an AZ from an operator point-of-view would confuse their users.

Let me explain further : say you'd like to change AZ "foo" into "bar". For end-users looking at the AZ API before booting their instances, they can see "foo" as a valid target. So they just use --availability_zone foo in their instance boot calls and they expect to see their instances in AZ foo.

Now, what if operator turns "foo" into "bar" ? If I'm an end-user, I'd be very surprised to see my instances being now in "bar" while I explicitely asked "foo"!

As a clear design decision, we really want to make it explicit that renaming an AZ should be forbidden if there are active instances hosted within that AZ.
Closing that bug as Wontfix since I feel not being able to modify an AZ is not a bug but rather a design decision, but I feel we also need to modify the aggregates API to return a HTTP40x if someone is wanting to update an aggregate metadata containing AZ information when there are instances attached to it.

Changed in nova:
status: In Progress → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Radoslav Gerganov (<email address hidden>) on branch: master
Review: https://review.openstack.org/419502

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/509206

Changed in nova:
assignee: Radoslav Gerganov (rgerganov) → Andrey Volkov (avolkov)
status: Won't Fix → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Andrey Volkov (<email address hidden>) on branch: master
Review: https://review.openstack.org/509206

Andrey Volkov (avolkov)
Changed in nova:
assignee: Andrey Volkov (avolkov) → nobody
Revision history for this message
Jacolex (jacolex) wrote :

Hi
You should also change spec in request_specs table like this (starting from ocata):

use nova_api
update request_specs set spec = replace (spec,'"availability_zone": "old-name"','"availability_zone": "new-name"') where instance_uuid = 'xxxxxxxxxxxxxxx';

Maybe it will help someone, who want to use this unsupported scenario... ;)

Changed in nova:
assignee: nobody → Andrey Volkov (avolkov)
Matt Riedemann (mriedem)
Changed in nova:
importance: Low → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/635315

Changed in nova:
assignee: Andrey Volkov (avolkov) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Andrey Volkov (avolkov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/635315
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5dc1ed3c5cc20173eecd071b0804c469bdc9422f
Submitter: Zuul
Branch: master

commit 5dc1ed3c5cc20173eecd071b0804c469bdc9422f
Author: Matt Riedemann <email address hidden>
Date: Wed Feb 6 16:11:40 2019 -0500

    api-ref: warn about changing/unsetting AZ name with instances

    It is currently possible to rename or unset the availability_zone
    metadata value on a host aggregate which can adversely impact
    instances that were created in that specific AZ since later
    attempts to migrate or unshelve those instances will fail if the
    AZ with the original name no longer exists.

    This adds a warning to the API reference for updating the AZ
    name and also fixes a grammar typo in the 'metadata' response
    parameter description.

    Change-Id: Ie9d4a1ff1a23827490fe51350c11292c6efc4eb2
    Related-Bug: #1378904

Changed in nova:
assignee: Andrey Volkov (avolkov) → Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Andrey Volkov (avolkov)
Changed in nova:
assignee: Andrey Volkov (avolkov) → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/640460

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/640460
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=268190b252aef91d9309f30b2f5db3be7bdecf01
Submitter: Zuul
Branch: master

commit 268190b252aef91d9309f30b2f5db3be7bdecf01
Author: Matt Riedemann <email address hidden>
Date: Fri Mar 1 12:36:42 2019 -0500

    api-ref: explain aggregate set_metadata semantics

    This came up as a source of confusion while reviewing
    change Ic27195e46502067c87ee9c71a811a3ca3f610b73 because
    I thought that the "metadata" key in the
    POST /os-aggregates/{aggregate_id}/action (set_metadata)
    API was an overwrite of the existing metadata rather than
    an update.

    The way the Aggregate.update_metadata() method works is that
    new entries are added, existing metadata is updated if the
    value is not None, otherwise existing entries are removed
    if the value is None.

    And because of the AggregateAPI.is_safe_to_update_az() method
    the special "availability_zone" metadata cannot be unset to None
    once it is set. So the only way to remove an AZ is to delete the
    aggregate altogether.

    This updates the API reference description of the "metadata"
    parameter in the "set_metadata" action API.

    Change-Id: I6fa9f9691b945b5212b7f951ab0a26b4d3049df9
    Related-Bug: #1378904

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/509206
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8e19ef4173906da0b7c761da4de0728a2fd71e24
Submitter: Zuul
Branch: master

commit 8e19ef4173906da0b7c761da4de0728a2fd71e24
Author: Andrey Volkov <email address hidden>
Date: Tue Oct 3 15:42:55 2017 +0300

    Check hosts have no instances for AZ rename

    Update aggregate and update aggregate metadata API calls have the
    ability to update availability zone name for the aggregate. If the
    aggregate is not empty (has hosts with instances on it)
    the update leads to discrepancy for objects saving availability zone as a
    string but not reference.

    From devstack DB they are:
    - cinder.backups.availability_zone
    - cinder.consistencygroups.availability_zone
    - cinder.groups.availability_zone
    - cinder.services.availability_zone
    - cinder.volumes.availability_zone
    - neutron.agents.availability_zone
    - neutron.networks.availability_zone_hints
    - neutron.router_extra_attributes.availability_zone_hints
    - nova.dns_domains.availability_zone
    - nova.instances.availability_zone
    - nova.volume_usage_cache.availability_zone
    - nova.shadow_dns_domains.availability_zone
    - nova.shadow_instances.availability_zone
    - nova.shadow_volume_usage_cache.availability_zone

    Why that's bad?
    First, API and Horizon show different values for host and instance for
    example. Second, migration for instances with changed availability
    zone fails with "No valid host found" for old AZ.

    This change adds an additional check to aggregate an Update Aggregate API call.
    With the check, it's not possible to rename AZ if the corresponding
    aggregate has instances in any hosts.

    PUT /os-aggregates/{aggregate_id} and
    POST /os-aggregates/{aggregate_id}/action return HTTP 400 for
    availability zone renaming if the hosts of the aggregate have any instances.
    It's similar to conflicting AZ names error already available.

    Change-Id: Ic27195e46502067c87ee9c71a811a3ca3f610b73
    Closes-Bug: #1378904

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/641351

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/641351
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1559db570452cd5502b8f5e9ba44fef54295537f
Submitter: Zuul
Branch: stable/rocky

commit 1559db570452cd5502b8f5e9ba44fef54295537f
Author: Andrey Volkov <email address hidden>
Date: Tue Oct 3 15:42:55 2017 +0300

    Check hosts have no instances for AZ rename

    Update aggregate and update aggregate metadata API calls have the
    ability to update availability zone name for the aggregate. If the
    aggregate is not empty (has hosts with instances on it)
    the update leads to discrepancy for objects saving availability zone as a
    string but not reference.

    From devstack DB they are:
    - cinder.backups.availability_zone
    - cinder.consistencygroups.availability_zone
    - cinder.groups.availability_zone
    - cinder.services.availability_zone
    - cinder.volumes.availability_zone
    - neutron.agents.availability_zone
    - neutron.networks.availability_zone_hints
    - neutron.router_extra_attributes.availability_zone_hints
    - nova.dns_domains.availability_zone
    - nova.instances.availability_zone
    - nova.volume_usage_cache.availability_zone
    - nova.shadow_dns_domains.availability_zone
    - nova.shadow_instances.availability_zone
    - nova.shadow_volume_usage_cache.availability_zone

    Why that's bad?
    First, API and Horizon show different values for host and instance for
    example. Second, migration for instances with changed availability
    zone fails with "No valid host found" for old AZ.

    This change adds an additional check to aggregate an Update Aggregate API call.
    With the check, it's not possible to rename AZ if the corresponding
    aggregate has instances in any hosts.

    PUT /os-aggregates/{aggregate_id} and
    POST /os-aggregates/{aggregate_id}/action return HTTP 400 for
    availability zone renaming if the hosts of the aggregate have any instances.
    It's similar to conflicting AZ names error already available.

    Change-Id: Ic27195e46502067c87ee9c71a811a3ca3f610b73
    Closes-Bug: #1378904
    (cherry picked from commit 8e19ef4173906da0b7c761da4de0728a2fd71e24)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.0

This issue was fixed in the openstack/nova 18.2.0 release.

Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Andrey Volkov (avolkov)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.