Inconsistent state when connection to conductor is lost during live migration
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Confirmed
|
Medium
|
Unassigned |
Bug Description
If during live migration the connection to nova conductor service is somehow lost (for instance, due to the rabbitmq server becoming unavailable), the migration status of the nodes never gets updated, and they end up forever in "migrating" state, with the actual guest already running on the new host, but the data in the nova database still pointing at the old host.
This happens in all versions at lest up to Mitaka.
How to reproduce:
1. Create a simple setup with two hosts.
2. Create an instance and start a live migration.
3. Kill the rabbitmq server.
4. Wait for the migration to finish.
5. Bring the rabbitmq server back up.
6. Observe the instance stuck in "migrating" state, with everything migrated to the new host, but Nova thinking it's still on the old host.
description: | updated |
Changed in nova: | |
assignee: | nobody → Radomir Dopieralski (thesheep) |
summary: |
- Inconsistent state if connection to conductor is lost during live + Inconsistent state when connection to conductor is lost during live migration |
tags: | added: live-migration |
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
This looks very similar to https:/ /bugs.launchpad .net/nova/ +bug/1437154 and actually leads to the same issues. Doesnt matter whether you kill live migration monitor or rabbitmq, you will end up with an instance running on destination host without networking configured correctly and with a mess on source host.
I believe that https:/ /review. openstack. org/#/c/ 225910/ should at least partially solve this problem. There is also proposition to make compute stateful which should solve the issue completely.