nova-manage db online_data_migrations hangs on instances with no host set

Bug #1788115 reported by Jiri Suchomel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Jiri Suchomel
Pike
Fix Committed
High
Matt Riedemann
Queens
Fix Committed
High
Matt Riedemann
Rocky
Fix Committed
High
Matt Riedemann

Bug Description

When there are some deleted instances present before upgrading,
"nova-manage db online_data_migrations" will not be able to finish.

I think this is because populate_missing_availability_zones does not check for deleted instances (or for non-existent AZ) - so it is run over and over again because it cannot set AZ to a deleted instance.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/594050

Changed in nova:
assignee: nobody → Jiri Suchomel (jsuchome)
status: New → In Progress
tags: added: nova-manage
Changed in nova:
importance: Undecided → Medium
Changed in nova:
importance: Medium → Undecided
Revision history for this message
Surya Seetharaman (tssurya) wrote : Re: nova-manage db online_data_migrations hangs with deleted instances

While it is true that populate_missing_availability_zone does not check for deleted instances; I don't see how it can go into infinite loop unless there is no CONF.default_availability_zone set (https://github.com/openstack/nova/blob/722d5b477219f0a2435a9f4ad4d54c61b83219f1/nova/availability_zones.py#L99) which is the last resort; meaning CONF.default_availability_zone is set to None, which by default is "nova".

Probably its the situation where the instance was deleted before being scheduled to a compute node in which case it could hit https://github.com/openstack/nova/blob/722d5b477219f0a2435a9f4ad4d54c61b83219f1/nova/availability_zones.py#L168. Could you confirm if it is indeed this i.e instance.host was NULL?

Revision history for this message
Jiri Suchomel (jsuchome) wrote :

That's quite possible (deleted before being scheduling). I'm gonna test it

Revision history for this message
Jiri Suchomel (jsuchome) wrote :

Indeed that's the case - instance.host is NULL and get_instance_availability_zone returned NULL AZ.

Matt Riedemann (mriedem)
summary: - nova-manage db online_data_migrations hangs with deleted instances
+ nova-manage db online_data_migrations hangs with instances with no host
+ set
summary: - nova-manage db online_data_migrations hangs with instances with no host
+ nova-manage db online_data_migrations hangs on instances with no host
set
Changed in nova:
importance: Undecided → Medium
tags: added: rocky-rc-potential
Changed in nova:
importance: Medium → High
Changed in nova:
assignee: Jiri Suchomel (jsuchome) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Jiri Suchomel (jsuchome)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/594178

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/594184

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/594185

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/594050
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=690f91b5c7f7e84a4e6d351b27c05818d947cce1
Submitter: Zuul
Branch: master

commit 690f91b5c7f7e84a4e6d351b27c05818d947cce1
Author: Jiří Suchomel <email address hidden>
Date: Tue Aug 21 09:10:07 2018 +0200

    Filter out instances without a host when populating AZ

    It could happen that instance does not have a host set, e.g.
    when its creation failed before it was scheduled.
    During online_migration, populate_missing_availability_zones tries to
    add missing AZs to all instances. However for instances without a host
    there's no reasonable value for AZ (we can't use a logic that bases
    the value on a host) so let's skip this kind of instances completely.

    Change-Id: Ic6060beaa08af5ea70e5e54fffb94eea58aa7bbf
    Closes-Bug: #1788115

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/594178
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ad14e428f82f02d71e0b33ec6d20e7810a978e3b
Submitter: Zuul
Branch: stable/rocky

commit ad14e428f82f02d71e0b33ec6d20e7810a978e3b
Author: Jiří Suchomel <email address hidden>
Date: Tue Aug 21 09:10:07 2018 +0200

    Filter out instances without a host when populating AZ

    It could happen that instance does not have a host set, e.g.
    when its creation failed before it was scheduled.
    During online_migration, populate_missing_availability_zones tries to
    add missing AZs to all instances. However for instances without a host
    there's no reasonable value for AZ (we can't use a logic that bases
    the value on a host) so let's skip this kind of instances completely.

    Change-Id: Ic6060beaa08af5ea70e5e54fffb94eea58aa7bbf
    Closes-Bug: #1788115
    (cherry picked from commit 690f91b5c7f7e84a4e6d351b27c05818d947cce1)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0rc3

This issue was fixed in the openstack/nova 18.0.0.0rc3 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/594185
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=059c2d4a8a9967403755c40beda667c9463848a8
Submitter: Zuul
Branch: stable/queens

commit 059c2d4a8a9967403755c40beda667c9463848a8
Author: Jiří Suchomel <email address hidden>
Date: Tue Aug 21 09:10:07 2018 +0200

    Filter out instances without a host when populating AZ

    It could happen that instance does not have a host set, e.g.
    when its creation failed before it was scheduled.
    During online_migration, populate_missing_availability_zones tries to
    add missing AZs to all instances. However for instances without a host
    there's no reasonable value for AZ (we can't use a logic that bases
    the value on a host) so let's skip this kind of instances completely.

    Change-Id: Ic6060beaa08af5ea70e5e54fffb94eea58aa7bbf
    Closes-Bug: #1788115
    (cherry picked from commit 690f91b5c7f7e84a4e6d351b27c05818d947cce1)
    (cherry picked from commit ad14e428f82f02d71e0b33ec6d20e7810a978e3b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/594184
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=62464883e1c2ba98e1fddb1284171ca5bef4d8e1
Submitter: Zuul
Branch: stable/pike

commit 62464883e1c2ba98e1fddb1284171ca5bef4d8e1
Author: Jiří Suchomel <email address hidden>
Date: Tue Aug 21 09:10:07 2018 +0200

    Filter out instances without a host when populating AZ

    It could happen that instance does not have a host set, e.g.
    when its creation failed before it was scheduled.
    During online_migration, populate_missing_availability_zones tries to
    add missing AZs to all instances. However for instances without a host
    there's no reasonable value for AZ (we can't use a logic that bases
    the value on a host) so let's skip this kind of instances completely.

    Change-Id: Ic6060beaa08af5ea70e5e54fffb94eea58aa7bbf
    Closes-Bug: #1788115
    (cherry picked from commit 690f91b5c7f7e84a4e6d351b27c05818d947cce1)
    (cherry picked from commit ad14e428f82f02d71e0b33ec6d20e7810a978e3b)
    (cherry picked from commit 059c2d4a8a9967403755c40beda667c9463848a8)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.5

This issue was fixed in the openstack/nova 16.1.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.6

This issue was fixed in the openstack/nova 17.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.