OpenStack Compute (nova)

race condition with resize causing old resources not to be free

Bug #1590556 reported by Moshe Levi on 2016-06-08

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Expired	Undecided	Unassigned

Bug Description

While I was working on fixing the resize for pci passthrough [1] I have notice the following issue in resize.

If you are using small image and you resize-confirm it very fast the old resources are not getting freed.

After debug this issue I found out the root cause of it.

A Good run of resize is as detailed below:

When doing resize the _update_usage_from_migration in the resource trucker called twice.

1. The first call we return the instance type of the new flavor and will enter this case

https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L718

2. Then it will put in the tracked_migrations the migration and the new instance_type

https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

3. The second call we return the old instance_type and will enter this case

https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L725

4. Then in the tracked_migrations it will overwrite the old value with migration and the old instance type

5. https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

6. When doing resize-confirm the drop_move_claim called with the old instance type

https://github.com/openstack/nova/blob/9a05d38f48ef0f630c5e49e332075b273cee38b9/nova/compute/manager.py#L3369

7. The drop_move_claim will compare the instance_type[id] from the tracked_migrations to the instance_type.id (which is the old one)

8. And because they are equals it will remove the old resource usage

https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L328

But with small image like CirrOS and doing the revert-confirm fast the second call of _update_usage_from_migration will not get executing.

The result is that when we enter the drop_move_claim it compares it with the new instance_type and this expression is false https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L314

This mean that this code block is not executed https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L326 and therefore old resources are not getting freed.

Tags:

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-08: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/327356

Changed in nova:
assignee:	nobody → Moshe Levi (moshele)
status:	New → In Progress

Takashi Natsume (natsume-takashi) on 2016-06-09

tags:

added: compute resource-tracker

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-22: Change abandoned on nova (master)

Change abandoned by Moshe Levi (<email address hidden>) on branch: master
Review: https://review.openstack.org/327356

Revision history for this message

Sean Dague (sdague) wrote on 2017-06-23:

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status:	In Progress → New
assignee:	Moshe Levi (moshele) → nobody

Revision history for this message

Sean Dague (sdague) wrote on 2017-07-25:

I saw you abandoned the patch, is this still a known bug?

Changed in nova:
status:	New → Incomplete

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-09-24:

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status:	Incomplete → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.