L3 agent fails on FIP when DVR and HA both enabled in router
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Invalid
|
High
|
Swaminathan Vasudevan |
Bug Description
I have a vlan-based Neutron configuration. My tenant networks are vlans, and my shared external network (br-ex) is a flat network. Neutron is configured for DVR+SNAT mode. In testing floating IPs, I've run into issues with my neutron router, and I've traced it back to a single scenario: when the router is both distributed AND ha. To be clear, I've tested all four possibilities:
"--distributed False --ha False" == works
"--distributed True --ha False" == works
"--distributed False --ha True" == works
"--distributed True --ha True" == fails
* I can reproduce this again and again, just by deleting the router I have (which implies first clearing its gateway, and removing any associated ports), then re-creating the router in any of the four configurations above. Then I boot some VMs, associate a FIP to any one of them, and attempt to reach the FIP. Results are the same whether I create the router on the CLI or from within Horizon.
* Expected result is that I should be able to associate a floating IP to a running VM and then ping that floating IP (and ultimately other kinds of activity, such as SSH access to the VM).
* Actual result is that the floating IP is completely unreachable from other valid IPs within same L2 space. Additionally, in /var/log/
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
2016-08-17 22:33:25.512 11369 ERROR neutron.
* Version
** CentOS 7.2
** Kernel 3.10.0-
** Mitaka from RDO rpms, puppet managed
** Neutron RPMS:
openstack-
openstack-
openstack-
openstack-
openstack-
python-
python-
python-
python-
* Environment
* 1 controller (running neutron-server, but no other neutron components)
* 2 dedicated network nodes for neutron agents
* N compute nodes running neutron l3-agent because of dvr_snat mode
Changed in neutron: | |
status: | Confirmed → Invalid |
@Swami: any chance you can triage this?