HA router state change from "standby" to "master" should be delayed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Invalid
|
Undecided
|
Unassigned | ||
Queens |
Fix Released
|
Undecided
|
Unassigned | ||
Rocky |
Fix Released
|
Undecided
|
Unassigned | ||
Stein |
Fix Released
|
Undecided
|
Unassigned | ||
neutron |
Fix Released
|
Undecided
|
Rodolfo Alonso |
Bug Description
Currently, when a HA state change occurs, the agent execute a series of actions [1]: updates the metadata proxy, updates the prefix delegation, executed L3 extension "ha_state_change" methods, updates the radvd status and notifies this to the server.
When, in a system with more than two routers (one in "active" mode and the others in "standby"), a switch-over is done, the "keepalived" process [2] in each "standby" server will set the virtual IP in the HA interface and advert it. In case that other router HA interface has the same priority (by default in Neutron, the HA instances of the same router ID will have the same priority, 50) but higher IP [3], the HA interface of this instance will have the VIPs and routes deleted and will become "standby" again. E.g.: [4]
In some cases, we have detected that when the master controller is rebooted, the change from "standby" to "master" of the other two servers is detected, but the change from "master" to "standby" of the server with lower IP (as commented before) is not registered by the server, because the Neutron server is still not accessible (the master controller was rebooted). This status change, sometimes, is lost. This is the situation when both "standby" servers become "master" but the "master"-"standby" transition of one of them is lost.
1) INITIAL STATUS
(overcloud) [stack@undercloud-0 ~]$ neutron l3-agent-
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+------
| id | host | admin_state_up | alive | ha_state |
+------
| 4056cd8e-
| 527d6a6c-
| edbdfc1c-
+------
2) CONTROLLER 1 REBOOTED
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+------
| id | host | admin_state_up | alive | ha_state |
+------
| 4056cd8e-
| 527d6a6c-
| edbdfc1c-
+------
The aim of this bug is to make public this problem and propose a patch to delay the transition from "standby" to "master" to let keepalived, among all the instances running in the HA servers, to decide which one of them is the "master" server.
[1] https:/
[2] https:/
[3] This method is used by keepalived to define which router is predominant and must be master.
[4] http://
Changed in neutron: | |
assignee: | nobody → Rodolfo Alonso (rodolfo-alonso-hernandez) |
tags: | added: l3-ha |
tags: | added: neutron-proactive-backport-potential |
tags: | added: sts |
In the following bug, I noticed some similar log, it is not related to the higher IP, but maybe it will cause the same problem as here: /bugs.launchpad .net/neutron/ +bug/1798475/ comments/ 14 /bugs.launchpad .net/neutron/ +bug/1798475/ comments/ 15 /bugs.launchpad .net/neutron/ +bug/1798475/ comments/ 16 /bugs.launchpad .net/neutron/ +bug/1798475/ comments/ 17
https:/
https:/
https:/
https:/