macvtap: possible race of interfaces scan/deletion at migration
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Expired
|
High
|
Unassigned |
Bug Description
A few days ago I stumbled upon a failed migration of VM with macvtap interface. Nova "joyed" me with error "Unsupported VIF type binding_failed". A short investigation lead me to the real problem: macvtap agent crushed
2019-02-21 18:40:38.636 54364 ERROR neutron.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.
2019-02-21 18:40:38.640 54364 INFO neutron.
I have digg a bit and it looks like there's a race between interface deletion and a periodic scan in daemon_loop. For now get_all_devices (neutron/
def get_all_
...............
for device_name in all_device_names:
if device_
if ip_lib.
It's quite possible that device will be deleted while we're looping over (in my case with ~10% probability).
An obvious 'solution' is to check device on every loop but it doesn't really solve an issue though reduce race window so that I able to migrate VM for about 100 times before crush.
I'll attache this patch anyway.
Changed in neutron: | |
importance: | Undecided → High |
Hello:
I have some questions related to this issue. interface_ mapping configuration option on each compute node. For more information, see https:/ /bugs.launchpad .net/neutron/ +bug/1550400.
1) Are you migrating or live-migrating the VM?
2) Is this error 100% reproducible? When you migrate the VM, does it always happen?
3) Did you happen to know if this device "macvtap1" is created after the agent error?
4) Did you see Nova compute agent (in debug mode) creating this interface during the migration?
5) Are you aware of [1]?
"""
Instance migration requires the same values for the physical_
"""
Regards.
[1]https:/ /docs.openstack .org/neutron/ pike/admin/ config- macvtap. html