[Nova] Rebuild operation passes through excessive scheduler filters

Bug #1818030 reported by Alexander Rubtsov on 2019-02-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Medium
Denis Meltsaykin

Bug Description

--- Environment ---
MOS: 9.2 MU4+

--- Description ---
The patch from https://bugs.launchpad.net/mos/+bug/1732862 inroduces additional validation during "nova rebuild" action. However, the implementation is not optimal and it may cause failures of some rebuild attempts.

Please backport to Mitaka the refined upstream fix:
https://review.openstack.org/#/c/523434/

Alexander Rubtsov (arubtsov) wrote :

sla2 for 9.0-updates

Changed in mos:
importance: Undecided → Medium
assignee: nobody → MOS Maintenance (mos-maintenance)
milestone: none → 9.x-updates
Alexander Rubtsov (arubtsov) wrote :

The scenario of failed rebuild:

1) There is an availability-zone (AZ) with 4 Compute hosts

2) There is a ServerGroup with anti-affinity defined (SGAnA)

3) Four VMs are booted and AZ only (not the host) along with server_group is provided in the call.
This works as expected as the VMs end up in the right AZ but on different Compute hosts due to SGAnA.

4) User does a 'nova rebuild' command for the first VM but this rebuild is not successful and we see that the Nova scheduler filtered out available hosts due to SGAnA filter.

The VM is never rebuilt (continues to run the same image as before) even though user would expect it to be 'recreated' on the same host.

Changed in mos:
milestone: 9.x-updates → 9.2-mu-11
status: New → Confirmed
Changed in mos:
assignee: MOS Maintenance (mos-maintenance) → Denis Meltsaykin (dmeltsaykin)

Fix proposed to branch: 9.0/mitaka
Change author: Denis V. Meltsaykin <email address hidden>
Review: https://review.fuel-infra.org/40688

Changed in mos:
status: Confirmed → In Progress

Reviewed: https://review.fuel-infra.org/40688
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: 3c4ab7746679ac543a151e9c44500171b8292b09
Author: Denis V. Meltsaykin <email address hidden>
Date: Mon Mar 4 15:00:30 2019

Refined fix for validating image on rebuild

This aims to fix the issue described in bug 1664931 where a rebuild
fails to validate the existing host with the scheduler when a new
image is provided. The previous attempt to do this could cause rebuilds
to fail unnecessarily because we ran _all_ of the filters during a
rebuild, which could cause usage/resource filters to prevent an otherwise
valid rebuild from succeeding.

This aims to classify filters as useful for rebuild or not, and only apply
the former during a rebuild scheduler check. We do this by using an internal
scheduler hint, indicating our intent. This should (a) filter out
all hosts other than the one we're running on and (b) be detectable by
the filtering infrastructure as an internally-generated scheduling request
in order to trigger the correct filtering behavior.

Closes-Bug: #1818030
Change-Id: I1a46ef1503be2febcd20f4594f44344d05525446

Changed in mos:
status: In Progress → Fix Committed
Pavel Glazov (pglazovv) wrote :

Verified:

2019-04-04 17:56:46.451 13478 DEBUG nova.scheduler.filter_scheduler [req-b817fb98-51de-4e1f-aed7-ab9ac0a563a2 5d1b7918391e41b7bbeef6c5d159e53b 5ac19a2884c0483c8637554e080792ff - - -] There are 0 hosts available but 1 instances requested to build. select_destinations /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:71
2019-04-04 17:56:46.452 13478 DEBUG oslo_messaging.rpc.dispatcher [req-b817fb98-51de-4e1f-aed7-ab9ac0a563a2 5d1b7918391e41b7bbeef6c5d159e53b 5ac19a2884c0483c8637554e080792ff - - -] Expected exception during message handling (No valid host was found. There are not enough hosts available.) _dispatch_and_reply /usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py:141

Changed in mos:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers