Add FDB bridge entry fails if old entry not removed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack-Ansible |
Invalid
|
Undecided
|
Kevin Carter | ||
Juno |
Fix Released
|
Undecided
|
Kevin Carter | ||
neutron |
Fix Released
|
Undecided
|
Li Ma |
Bug Description
Running on Ubuntu 14.04 with Linuxbridge agent and L2pop with vxlan networks.
In situations where "remove_
2015-03-16 21:10:08.520 30207 ERROR neutron.
Command: ['sudo', '/usr/local/
Exit code: 2
Stdout: ''
Stderr: 'RTNETLINK answers: File exists\n'
In our case, instances were unable to communicate with their Neutron router because vxlan traffic was being forwarded to the wrong vxlan endpoint. This was corrected by either migrating the router to a new agent or by executing a "bridge fdb del" for the fdb entry corresponding with the Neutron router mac address. Once deleted, the LB agent added the appropriate fdb entry at the next polling event.
If anything is unclear, please let me know.
Changed in openstack-ansible: | |
milestone: | next → 10.1.4 |
Changed in openstack-ansible: | |
milestone: | 10.1.4 → 10.1.5 |
Changed in openstack-ansible: | |
milestone: | none → 10.1.5 |
Changed in openstack-ansible: | |
status: | Triaged → Fix Committed |
assignee: | nobody → Kevin Carter (kevin-carter) |
status: | Fix Committed → Invalid |
milestone: | 10.1.5 → 9.0.9 |
milestone: | 9.0.9 → none |
Some additional info...
The Neutron DB and the forwarding DB somehow get out of sync so that the FDB has one entry and Neutron has another. For example:
On a compute node:
compute003# bridge fdb | grep fa:16:3e:5d:05:4f
fa:16:3e:5d:05:4f dev vxlan-8 vlan 0
fa:16:3e:5d:05:4f dev vxlan-8 dst 172.29.243.252 self permanent
fa:16:3e:5d:05:4f is the MAC address of the qr interface of the router. 172.29.243.252 is the vtep of infra01. Neutron, however, thinks the router is scheduled to infra04:
root@compute003:~# neutron l3-agent- list-hosting- router e29e967c- 4db1-4283- b9cf-bb2625198c 9f ------- ------- ------- ------- ----+-- ------- ------- ------- ------- ------- ------- ------+ ------- ------- --+---- ---+ ------- ------- ------- ------- ----+-- ------- ------- ------- ------- ------- ------- ------+ ------- ------- --+---- ---+ 2bab-4a8b- bc89-7da3dcd224 a2 | infra04_ neutron_ agent | True | :-) | ------- ------- ------- ------- ----+-- ------- ------- ------- ------- ------- ------- ------+ ------- ------- --+---- ---+
+------
| id | host | admin_state_up | alive |
+------
| 18e9dbb6-
+------
When you attempt to unschedule the router from infra04, you'll see the following fdb delete failure in the linuxbridge agent log:
2015-03-17 13:48:05.853 30207 ERROR neutron. agent.linux. utils [req-5d5b8a90- cb10-4acf- 9971-a3fa6b996c 74 None] bin/neutron- rootwrap' , '/etc/neutron/ rootwrap. conf', 'bridge', 'fdb', 'del', 'fa:16: 3e:5d:05: 4f', 'dev', 'vxlan-8', 'dst', '172.29.242.66']
Command: ['sudo', '/usr/local/
Exit code: 2
Stdout: ''
Stderr: 'RTNETLINK answers: No such file or directory\n'
172.29.242.66 is the vtep on infra04. It is expected that it would fail, considering the entry doesn't exist. As a result, this is still left:
compute003# bridge fdb | grep fa:16:3e:5d:05:4f
fa:16:3e:5d:05:4f dev vxlan-8 vlan 0
fa:16:3e:5d:05:4f dev vxlan-8 dst 172.29.243.252 self permanent
To work around it, you can reschedule the router to infra01. That results in the following error:
2015-03-17 13:50:33.006 30207 ERROR neutron. agent.linux. utils [req-3a4ae444- 40f8-4d3b- ad37-8813b963a5 ec None] bin/neutron- rootwrap' , '/etc/neutron/ rootwrap. conf', 'bridge', 'fdb', 'add', 'fa:16: 3e:5d:05: 4f', 'dev', 'vxlan-8', 'dst', '172.29.243.252']
Command: ['sudo', '/usr/local/
Exit code: 2
Stdout: ''
Stderr: 'RTNETLINK answers: File exists\n'
That is to be expected, as the entry already exists. Then, you can unschedule the router from infra01 and see the FDB entry get properly removed:
compute003# bridge fdb | grep fa:16:3e:5d:05:4f
fa:16:3e:5d:05:4f dev vxlan-8 vlan 0
Rescheduling to another agent results in the correct entry being added:
compute003# bridge fdb | grep fa:16:3e:5d:05:4f
fa:16:3e:5d:05:4f dev vxlan-8 vlan 0
fa:16:3e:5d:05:4f dev vxlan-8 dst 172.29.242.66 self permanent
We don't exactly know what causes the FDB entry to not get removed properly to begin with. The result, though, is an inconsistent Neutron DB/FDB state and eventual traffic loss.