linuxbridge-agent missed updated device sometimes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
Yuki Nishiwaki |
Bug Description
* Version:
master branch head as of 11 July 2018
https:/
* Summary:
When the operation that make tap interface disappeared/
linuxbridge-agent can miss updated device events depending on when tap device disappeared. this cause eventually following "VirtualInterfa
---
File "/usr/lib/
2018-07-11 01:07:49.632 56453 ERROR nova.compute.
2018-07-11 01:07:49.632 56453 ERROR nova.compute.
2018-07-11 01:07:49.632 56453 ERROR nova.compute.
---
* Reproducing:
Actually this is very difficult to reproduce because the pre-condition to make it reproduce strongly depending on the running state in linuxbridge-agent, so I'm gonna explain the state transition for the logic of detection to updated device step by step
---
let's say Hypervisor have 1 tap device for 1 VM which is "tapA" and tapA's interface index is 1 and User just requested rebuilding this VM.
0. Previous device info is like following
1. Get current_devices https:/
-> current_devices is "tapA"
__ disappeared tapA due to rebuilding VM __
2. Get timestamp(interface index in the case of linuxbridge-agent) https:/
-> current timestamp is {"tapA": None}. this is because we failed to get interface information here
3. Check device locally changed or not
https:/
-> locally "tapA" is detected as locally changed device. because timestamp information is change from before (1 != None)
4. Generate device_info like following
5. Process linuxbridge-agent interface plugging logic for tapA, but checking device existence failed because there is no such a device. here note even if check for device existence failed, this function won't raise exception and re-sync won't happen https:/
-- appeared tapA again due to rebuilding VM --
-- next scan_device iteration start --
6. Get current_devices
-> current_devices is "tapA"
7. Get timestamp
-> current timestamp is {"tapA":2}.
8. Check device locally changed or not
-> no locally device is detected because of this line https:/
9. Generate device_info like following
next iteration is expected to detect device updated but didn't in this case.
So we have to improve this locally changed device detection logic. otherwise rebooting/
Changed in neutron: | |
importance: | Undecided → Medium |
Fix proposed to branch: master /review. openstack. org/581648
Review: https:/