Comment 42 for bug 1731595

Revision history for this message
Corey Bryant (corey.bryant) wrote :

It seems as if this bug surfaces due to load issues. While the fix provided by Venkata (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to.

It seems to me that something is being pushed to it's limit, and possibly once that limit is hit, master router advertisements aren't being received, causing a new master to be elected. If this is the case it would be great to get to the bottom of what resource is getting constrained.