Comment 1 for bug 1554942

Revision history for this message
Sonu (sonu-sudhakaran) wrote :

I have noticed this behavior.

Ping stops when you bring the NIC back up again on the original master.
When NIC goes down on the current master, router ownership is taken by the stand-by controller voluntarily and is expected.
However, the original master doesn't realize that his data network is down, and continues to keep the Mastership of the routers.
 This is a split brain situation, where the original master still thinks he is the master of the router, since he is oblivion to the data path connectivity problems, whereas by VRRP protocol, stand-by controller have gained Mastership and serving Ping requests.

And when the NIC on the master is back up again, the router on the original master doesn't have to have a state transition (since as per router stat known to him, he was always the master). And since there is not state transition, gratuitous ARPs are not sent. And this results in FIP stop pinging.

The split brain can be prevented if we ensure the health check of master. And I think the patch https://review.openstack.org/#/c/273546/ tries to solve this.

However, workaround is, just restart L3 agent on the master after bringing up the NIC. Ping resumes, since a router failover will happen.

One another solution is to repeat the GARPs in master keepalived at regular intervals.