race condition with resize causing old resources not to be free

Bug #1590556 reported by Moshe Levi on 2016-06-08

This bug report will be marked for expiration in 37 days if no further activity occurs. (find out why)

8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned

Bug Description

While I was working on fixing the resize for pci passthrough [1] I have notice the following issue in resize.

If you are using small image and you resize-confirm it very fast the old resources are not getting freed.

After debug this issue I found out the root cause of it.

A Good run of resize is as detailed below:

When doing resize the _update_usage_from_migration in the resource trucker called twice.

1. The first call we return the instance type of the new flavor and will enter this case

                     https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L718

2. Then it will put in the tracked_migrations the migration and the new instance_type

                    https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

3. The second call we return the old instance_type and will enter this case

                     https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L725

4. Then in the tracked_migrations it will overwrite the old value with migration and the old instance type

5. https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

6. When doing resize-confirm the drop_move_claim called with the old instance type

https://github.com/openstack/nova/blob/9a05d38f48ef0f630c5e49e332075b273cee38b9/nova/compute/manager.py#L3369

7. The drop_move_claim will compare the instance_type[id] from the tracked_migrations to the instance_type.id (which is the old one)

8. And because they are equals it will remove the old resource usage

                    https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L328

But with small image like CirrOS and doing the revert-confirm fast the second call of _update_usage_from_migration will not get executing.

The result is that when we enter the drop_move_claim it compares it with the new instance_type and this expression is false https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L314

This mean that this code block is not executed https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L326 and therefore old resources are not getting freed.

Fix proposed to branch: master
Review: https://review.openstack.org/327356

Changed in nova:
assignee: nobody → Moshe Levi (moshele)
status: New → In Progress
tags: added: compute resource-tracker

Change abandoned by Moshe Levi (<email address hidden>) on branch: master
Review: https://review.openstack.org/327356

Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → New
assignee: Moshe Levi (moshele) → nobody
Sean Dague (sdague) wrote :

I saw you abandoned the patch, is this still a known bug?

Changed in nova:
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers