race condition with resize causing old resources not to be free

Bug #1590556 reported by Moshe Levi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

While I was working on fixing the resize for pci passthrough [1] I have notice the following issue in resize.

If you are using small image and you resize-confirm it very fast the old resources are not getting freed.

After debug this issue I found out the root cause of it.

A Good run of resize is as detailed below:

When doing resize the _update_usage_from_migration in the resource trucker called twice.

1. The first call we return the instance type of the new flavor and will enter this case

                     https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L718

2. Then it will put in the tracked_migrations the migration and the new instance_type

                    https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

3. The second call we return the old instance_type and will enter this case

                     https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L725

4. Then in the tracked_migrations it will overwrite the old value with migration and the old instance type

5. https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

6. When doing resize-confirm the drop_move_claim called with the old instance type

https://github.com/openstack/nova/blob/9a05d38f48ef0f630c5e49e332075b273cee38b9/nova/compute/manager.py#L3369

7. The drop_move_claim will compare the instance_type[id] from the tracked_migrations to the instance_type.id (which is the old one)

8. And because they are equals it will remove the old resource usage

                    https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L328

But with small image like CirrOS and doing the revert-confirm fast the second call of _update_usage_from_migration will not get executing.

The result is that when we enter the drop_move_claim it compares it with the new instance_type and this expression is false https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L314

This mean that this code block is not executed https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L326 and therefore old resources are not getting freed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/327356

Changed in nova:
assignee: nobody → Moshe Levi (moshele)
status: New → In Progress
tags: added: compute resource-tracker
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Moshe Levi (<email address hidden>) on branch: master
Review: https://review.openstack.org/327356

Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → New
assignee: Moshe Levi (moshele) → nobody
Revision history for this message
Sean Dague (sdague) wrote :

I saw you abandoned the patch, is this still a known bug?

Changed in nova:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.