Migration record for resize not cleared if exception is thrown during the resize

Bug #1258275 reported by Jennifer Mulsow
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
John Warren

Bug Description

Testing on havana.

prep_resize() calls resource tracker's resize_claim() which creates a migration record. This record is cleared during the rt.drop_resize_claim() from confirm_resize() or revert_resize(), however if an exception is thrown before one of these is called or after, but before they clean up the migration record, then the migration record will hang around in the database indefinitely.

This results in an WARNING being logged every 60 seconds for every resize operation that ended with the instance in ERROR state as part of the update_available_resource period task, like the following:
2013-12-04 17:49:15.247 25592 WARNING nova.compute.resource_tracker [req-75e94365-1cca-4bca-92a7-19b2c62b9551 e4857f249aec4160bfa19c12eb805a96 a42cfb9766bf41869efab25703f5ce7b] [instance: 12d2551a-6403-4100-ba57-0995594c9c93] Instance not resizing, skipping migration.

This message is because the resource tracker's _update_usage_from_migrations() logs this warning if a migration record for an instance is found, but the instance's current state is not in a resize state.

These messages will be permanent in the logs even after the instance in question's state is reset, and even after a successful resize has occurred on that instance. There is no way to clean up the old migration record at this point.

It seems like there should be some handling when an exception occurs during resize, finish_resize, confirm_resize, revert_resize, etc. that will drop the resize claim, so the claim and migration record do not persist indefinitely.

Tags: compute
Matt Riedemann (mriedem)
tags: added: compute
Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in nova:
status: New → In Progress
assignee: nobody → John Warren (jswarren)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/61470
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=16dfff5dedffcf4645df3c13b623d1ecd7560d8b
Submitter: Jenkins
Branch: master

commit 16dfff5dedffcf4645df3c13b623d1ecd7560d8b
Author: John Warren <email address hidden>
Date: Wed Dec 11 16:11:35 2013 +0000

    Add error as not-in-progress migration status

    When a migration status becomes "error" the migration is no longer
    in progress, i.e. it is in a terminal state. However, the
    migration_get_in_progress_by_host_and_node method returns migrations
    that have an "error" status, causing, among other things, the
    Resource Tracker to continually log messages about how an
    Instance is not resizing, creating excessive noise in the logs,
    especially since the fact that a migration is entering the "error"
    status is already logged. This change causes "error"-status
    migrations to not be included when retrieving "in progress"
    migrations.

    Closes-Bug: #1258275

    Change-Id: I67bfec9f91f89ff38422469a5d86e14c4fffa40b

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/72774

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/73755

Changed in nova:
milestone: none → icehouse-3
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Sean Dague (sdague)
Changed in nova:
status: Fix Released → Confirmed
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-3 → icehouse-rc1
Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in nova:
importance: Undecided → High
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/72774
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=132f13e5b095bdeaa0db72b151a3bf912ccfca36
Submitter: Jenkins
Branch: master

commit 132f13e5b095bdeaa0db72b151a3bf912ccfca36
Author: John Warren <email address hidden>
Date: Tue Feb 11 21:15:33 2014 +0000

    Error out failed migrations

    This change causes migrations to have their status set to error when
    either the resize_instance or finish_resize methods raise an exception.
    This prevents continuous logging of stalled migrations.

    Change-Id: Ie752e4833d28fd679c6d1abbc9da5f0ef57f5ec4
    Closes-bug: 1258275

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-rc1 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.