Gratuitous ARPs are not sent during master transition
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
LIU Yulong |
Bug Description
* High level description:
When a router transitions to MASTER state, keepalived should send GARPs but it fails because qg-* interface is down(it comes up about 1 sec after that, so it might be some race condition)
Keepalived should also send another GARPs after 60 seconds(
When I add random port to this router to trigger keepalived's reload, then all GARPs are sent properly(because netns is already configured and qg-* interface is up for the whole time)
* Pre-conditions:
Operating System: Ubuntu 20.04
Keepalived version: 2.0.19
Affected neutron releases:
- my AIO env: Xena (master/
- my prod env: Victoria
- (most likely all versions after this change https:/
* Step-by-step reproduction:
Simply perform a failover on HA router.
The same goal may be also achieved by removing all l3 agents from the router, and then adding one, so:
# openstack router create neutron-bug --ha
# openstack router set --external-gateway public neutron-bug
# neutron l3-agent-
# (for all l3 agents): neutron l3-agent-
# (for a single l3 agent): neutron l3-agent-router-add L3_AGENT_ID neutron-bug
(GARPs are not sent)
# openstack router add port neutron-bug test-port
(GARPs are sent properly)
* Expected output:
Gratuitous ARPs should be sent from router's namespace during MASTER transition.
* Actual output:
Gratuitous ARPs are not sent.
Keepalived complains about: Error 100 (Network is down) sending gratuitous ARP on qg-4a2f0239-5c for 172.29.249.194
qg-* interface wakes up about 1 second after keepalived tries to send GARPs.
* Root cause
Currently neutron keeps qg- interface down for BACKUP agents: https:/
Keepalived's MASTER transition takes place before keepalived-
As a result, neutron-l3-agent links qg- interface after keepalived's MASTER transition, which simply means that keepalived can't send GARPs during this transition, because qg- interface is down then.
* Proposed solutions
1. Revert https:/
I'm not sure, but maybe we don't need above change anymore because it was fixed in keepalived: https:/
2. Send delayed GARPs by keepalived_
Change proposal: https:/
3. Send GARPs also for FIPs(like it's done for non-HA routers by ./agent/
Change proposal: https:/
P.S. As solutions 2. and 3. only sends GARPs, we may also need to fix IPv6's NDP. Besides ARPs, keepalived also fails to send unsolicited neighbor advertisements. I'm not sure about it though, I don't know much about IPv6.
* Attachments:
Keepalived logs: https:/
Interfaces inside router's netns + tcpdump from master transition: https:/
Actually I've noticed that issue about 1.5 years ago. And there is a patch [1] which is going to deal with such issue, but it does not get enough attentions from upstream. Thank you for the bug report. Maybe you can try this fix.
[1] https:/ /review. opendev. org/c/openstack /neutron/ +/712474