OpenStack Compute (nova)

Nova won't reschedule when specific hypervisor is set and request failed

Bug #1717916 reported by Vasyl Saienko on 2017-09-18

8

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Triaged	Undecided	Unassigned

Bug Description

Ironic CI is blocked due to frequent failures of
tempest.scenario.test_server_multinode.TestServerMultinode.test_schedule_to_all_nodes

The cause is that nova will not reschedule failed instances when hypervisor is specified [0]

[0] https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L375-L381

Tags:

Revision history for this message

Vasyl Saienko (vsaienko) wrote on 2017-09-18:

#1

related nova patch: https://review.openstack.org/#/c/499545/

Revision history for this message

Matt Riedemann (mriedem) wrote on 2017-09-18:

#2

To be clear, the issue here is that force_hosts is set but force_nodes is not, correct? So what you want is for the reschedule to try other nodes on the forced host. The linked code is not accounting for the 1:M relationship between host:node for Ironic.

tags:	added: ironic
Changed in nova:
status:	New → Triaged

Revision history for this message

Matt Riedemann (mriedem) wrote on 2017-09-18:

#3

The problem in this utility code is we don't know if there is 1 or more nodes on the host...and we don't want to look that up every time, but maybe that could be optimized to only check if force_hosts is specified and force_nodes isn't.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2017-09-18:

#4

tempest.scenario.test_server_multinode.TestServerMultinode.test_schedule_to_all_nodes has existed in Tempest for almost 2 years now, why is this just a recent issue? Or was the test always blacklisted before this for Ironic CI jobs and is just now being investigated?

Revision history for this message

Vasyl Saienko (vsaienko) wrote on 2017-09-19:

#5

@Matt thanks for looking on this.

I confirm this test was working before, but during last time (I can't say for sure near cutting pike release) we start experiencing problems with races in scheduler. We increased scheduler/host_subset_size to 9999 recently (https://review.openstack.org/#/q/I0874fe3b3628cb3e662ee01f24c4599247fdc82d) to stabilize concurrent tests, but now test_schedule_to_all_nodes is failing frequently and looks like it is the same race in scheduler.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.