live migration does not coordinate VM resume with network readiness
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Confirmed
|
Low
|
Unassigned |
Bug Description
When migrating a VM from one host to another in combination with neutron, VM can resume at destination host while network is not ready (race condition)
QEMU has a mechanism to send a few RARPs once migration is done and before resuming.
Nova needs to coordinate with Qemu and neutron (nova/neutron notification mechanism) to make sure VM is only resumed at destination host when networking has been properly wired, otherwise the RARPs are lost, and connectivity to the VM is disrupted until the VM sends any broadcast message.
log detail (merged from two hosts logs and tcpdumps)
migration from host 29 to 30
2015-10-29 10:54:27.592000 [VMLIFE30] 21476 INFO nova.compute.
2015-10-29 10:54:27.609000 [VMLIFE29] 29022 INFO nova.compute.
2015-10-29 10:54:27.636000 [TAP30] tcpdump DEBUG 10:54:27.632047 fa:16:3e:50:a3:46 > Broadcast, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:50:a3:46 tell fa:16:3e:50:a3:46, length 46
2015-10-29 10:54:27.656000 [TAP29] tcpdump DEBUG tcpdump: pcap_loop: The interface went down
2015-10-29 10:54:27.787000 [TAP30] tcpdump DEBUG 10:54:27.783353 fa:16:3e:50:a3:46 > Broadcast, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:50:a3:46 tell fa:16:3e:50:a3:46, length 46
2015-10-29 10:54:27.818000 [FDB30] ovs-fdb DEBUG 62 0 fa:16:3e:50:a3:46 0 # switch associated to VLAN 0, should be "1", still not tagged, also not propagated to other hosts because vlan0 is invalid in the OVS implementation
2015-10-29 10:54:28.037000 [TAP30] tcpdump DEBUG 10:54:28.033259 fa:16:3e:50:a3:46 > Broadcast, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:50:a3:46 tell fa:16:3e:50:a3:46, length 46
2015-10-29 10:54:28.387000 [TAP30] tcpdump DEBUG 10:54:28.383211 fa:16:3e:50:a3:46 > Broadcast, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:50:a3:46 tell fa:16:3e:50:a3:46, length 46
2015-10-29 10:54:28.969000 [VMLIFE29] 29022 INFO nova.compute.
2015-10-29 10:54:29.803000 [OVS30] 21310 DEBUG neutron.
A reproduction ansible script is provided to show how it happens:
And complete merged output with oslogmerger can be found here:
https:/
Changed in nova: | |
status: | New → Confirmed |
tags: | added: live-migration |
Changed in nova: | |
importance: | Undecided → Low |
Changed in nova: | |
assignee: | nobody → Mohammed Ashraf (mohammed-asharaf) |
status: | Confirmed → In Progress |
Changed in nova: | |
assignee: | Mohammed Ashraf (mohammed-asharaf) → nobody |
Changed in nova: | |
status: | In Progress → Confirmed |