race condition when putting host in maintenance causes VM's state to be SHUTOFF

Bug #944145 reported by Armando Migliaccio on 2012-03-01
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Armando Migliaccio

Bug Description

putting a host in maintenance mode means moving all the VMs from the host to another host (effectively this is a live migration that requires the VM's host to change from the source host to the destination host).

Because of the potential concurrent execution of compute/manager->_sync_power_states, it is possible that after the migration, the state of the VM becomes SHUTOFF.

Looking at the implementation of _sync_power_states this problem can be solved by ensuring that the VM's host field is updated prior to the actual VM migration, and by reverting it to the old value if something goes wrong.

Current code in virt/xenapi/host.py was updating the VM's host field after the VM was migrated (to save one DB call in case of failures); clearly this does cause the race condition and need to be fixed.

Changed in nova:
status: New → Confirmed
assignee: nobody → Armando Migliaccio (armando-migliaccio)

Fix proposed to branch: master
Review: https://review.openstack.org/4761

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
milestone: none → essex-rc1
importance: Undecided → High
status: In Progress → Triaged
Changed in nova:
status: Triaged → In Progress
tags: removed: xenserver

Reviewed: https://review.openstack.org/4761
Committed: http://github.com/openstack/nova/commit/ec20076d24455860b38fd9a143910f75741ac8f6
Submitter: Jenkins
Branch: master

commit ec20076d24455860b38fd9a143910f75741ac8f6
Author: Armando Migliaccio <email address hidden>
Date: Thu Mar 1 19:10:54 2012 +0000

    bug 944145: race condition causes VM's state to be SHUTOFF

    ensure we close down the contention window between _sync_power_states
    and live migration/host evacuation.

    Change-Id: Ie6cbd9bf2eee206b4a821a4b77a6dced409f3983

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-03-20
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2012-04-05
Changed in nova:
milestone: essex-rc1 → 2012.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers