HA router master instance in error state because qg-xx interface is down
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Rodolfo Alonso |
Bug Description
BZ reference: https:/
Sometimes a router is created with all the instances in standby mode because the qg-xx interface is in down state and there isn't connectivity:
(overcloud) [stack@undercloud-0 ~]$ neutron l3-agent-
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+------
| id | host | admin_state_up | alive | ha_state |
+------
| 3b93ec23-
| 41b8d1a8-
| 4533bd88-
+------
(overcloud) [stack@undercloud-0 ~]$
Steps to reproduce:
1. for i in $(seq 10); do ./create.sh $i; done
3. Check FIP connectivity to detect the error
4. for i in $(seq 10); do ./delete.sh $i; done
Scripts: http://
Seems to be a race condition between L3 agent and keepalived configuring qg-xxx interface:
- /var/log/messages: http://
- L3 agent logs: http://
When keepalive is setting the qg-xxx interface IP addresses, the interface disappears from udev and reappears again (I still don't know why yet). The log in journalctl looks the same as when a new interface is created.
Since [1], the L3 agent controls the GW interface status (up or down). If the L3 agent do not link up the interface, the router namespace won't be able to send/receive any traffic.
[1]https:/
Changed in neutron: | |
assignee: | nobody → Rodolfo Alonso (rodolfo-alonso-hernandez) |
tags: | added: l3-ha |
Changed in neutron: | |
importance: | Undecided → High |
Changed in neutron: | |
status: | New → In Progress |
tags: | added: neutron-proactive-backport-potential |
Changed in neutron: | |
status: | In Progress → Fix Released |
tags: | removed: neutron-proactive-backport-potential |
Related patch: https:/ /review. opendev. org/c/openstack /neutron/ +/776427