OpenStack Compute (nova)

resize error on the same current host with enough vcpu resource

Bug #1609193 reported by Charlotte Han on 2016-08-03

30

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Confirmed	Wishlist	Unassigned

Bug Description

Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* I had a compute node, set allow_resize_to_same_host=true.

* I did boot a instance with flavor m1.tiny
1 | m1.tiny | 512 | 1 | 0 | | 1 | 1.0 | True

* then I did boot my instance, let free vcpus of this compute node is 3
* then I did resize this instance use flavor 1-1 which vcpu number is 4.
| 1-1 | hanrong_cpu_4 | 512 | 1 | 0 | | 4 | 1.0 | True |

* then the instance was error.
{"message": "Insufficient compute resources: Free vcpu 3.00 VCPU < requested 4 VCPU.", "code": 400, "created": "2016-08-02T12:12:36Z"}

Expected result
===============

I hope this instance resize successfully and freen vcpus of this compute node is 0 after resize.

Actual result
=============
instance error, and freen vcpus of this compute node is 3 after resize.

Environment
===========
1.
If this is from git, please provide
$ git log -1
commit 2b0557e4ee6737f44cb6fa845d7f59446bca90bf
Merge: 40913fe 15a9458
Author: Jenkins <email address hidden>
Date: Thu Jun 16 01:51:48 2016 +0000

Merge "Added missed response to test_server_tags"

See original description

Tags:

Charlotte Han (hanrong) on 2016-08-03

tags:	added: scheduler
description:	updated
Changed in nova:
assignee:	nobody → Charlotte Han (hanrong)

Charlotte Han (hanrong) on 2016-08-03

Changed in nova:
assignee:	Charlotte Han (hanrong) → nobody
tags:	added: resize

Maciej Szankin (mszankin) on 2016-08-04

Changed in nova:
assignee:	nobody → Maciej Szankin (mszankin)
status:	New → In Progress

Sarafraj Singh (sarafraj-singh) on 2016-08-04

description:

updated

Revision history for this message

Matt Riedemann (mriedem) wrote on 2016-08-04:

#1

what's probably happening is it's hitting ComputeResourcesUnavailable which triggers a reschedule, but since there is nowhere to reschedule to, it fails and is set to error.

yeah it gets into rt.resize_claim which does the claim test and raises the ComputeResourcesUnavailable exception which can't reschedule b/c it's resize to same host / single node and that all happens within a _error_out_instance_on_exception context manager so the instance is put in error state.

so i guess you'd have to handle ComputeResourcesUnavailable in _error_out_instance_on_exception and not set the instance to error state.

Revision history for this message

Sarafraj Singh (sarafraj-singh) wrote on 2016-08-04:

#2

So this is not going to work in Nova as resources currently used by an instance are not considered free. But if instance goes to error that might qualify for a bug.
Acc to Matt "it gets into rt.resize_claim which does the claim test and raises the ComputeResourcesUnavailable exception"

Matt Riedemann (mriedem) on 2016-08-04

Changed in nova:
importance:	Undecided → Medium
tags:	added: compute removed: scheduler

Revision history for this message

Sarafraj Singh (sarafraj-singh) wrote on 2016-08-04:

#3

mriedem "b/c it's resize to same host / single node
and that all happens within a _error_out_instance_on_exception context manager
so the instance is put in error state
so i guess you'd have to handle ComputeResourcesUnavailable in _error_out_instance_on_exception and not set the instance to error state"

Revision history for this message

Charlotte Han (hanrong) wrote on 2016-08-05:

#4

Thank you.

The resize action of this instance failed because scheduler core_filter return 0 hosts. So I think this is a scheduler problem.

One host's free vpus is 3, a instance with 1 vcpus on this host. when this instance want to be resized to 4 vpus, I think this resize action should be successful, because host_free_vcpus + instance_current_used_vcpus == resized_instance_required_vcpus.(3 + 1 == 4)

Revision history for this message

Maciej Szankin (mszankin) wrote on 2016-08-05:

#5

But this would mean that resize gets a special treatment. It was designed this way, I do not think of it as a bug, rather as an inconvenience. Plus, counting a difference between flavors leaves us with more things to reconsider, not only CPU, but also RAM and so on.

Revision history for this message

Charlotte Han (hanrong) wrote on 2016-08-05:

#6

Yea, resize action was designed this way. But as a cloud user, I hope resource can be used as much possible, especially when I need it. So I think this is a bug, or any other api can resolve this problem?

Revision history for this message

John Garbutt (johngarbutt) wrote on 2016-08-05:

#7

So interestingly, we are going to hit this with live-resize as well, as that will always resize on the same host.

The problem here, is we are currently re-writing the scheduler. Once we are done, and instance will actually have a claim in the scheduler, so we should be able to tell the scheduler, increase the size of the claim, rather than just give me a new claim of the new size.

It feels like this is blocked until we get that scheduler work complete. It has been a priority feature for the last few cycle, we are making some progress.

Changed in nova:
importance:	Medium → Wishlist

Revision history for this message

Charlotte Han (hanrong) wrote on 2016-08-05:

#8

Thank you

I will wait for live-resize function, but I am worried about numa topology. For numa-topology change, resize a shutdown instance is more easily.

Maciej Szankin (mszankin) on 2016-08-05

Changed in nova:
assignee:	Maciej Szankin (mszankin) → nobody
status:	In Progress → Confirmed

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-10-29:

#9

Per comment 7, the scheduler claims resources in placement since pike but that still doesn't resolve this issue, see bug 1790204. The resource tracking code is not smart enough to consider that the instance is being resized on the same host where it's already consuming resources and count those as available, so in this case you start with VCPU=1 and want to go to VCPU=4, there are 4 total VCPU on the host. nova/placement is going to try to claim 4 when there are only 3 available and fail. See my attempts at fixing bug 1790204 - this isn't going to be fixed anytime soon and I think is best documented as a known limitation.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1790204

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.