[RFE] BGP Speaker peer sessions down when rabbitmq offline
Bug #2006145 reported by
Maximilian Stinsky
This bug affects 3 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
In Progress
|
Undecided
|
Unassigned |
Bug Description
Greetings,
While we tested a couple of disaster scenarios in our lab environment we noticed that when we stop our rabbitmq cluster completely, the neutron dynamic routing bgp speaker shuts down all bgp sessions to its peers.
This results in all announced floating ip's or subnet pools to go offline.
We are running neutron wallaby (18.5.0) with the StaticScheduler for the neutron bgp part.
In my opinion the bgp speaker should continue to announce its local cached state until the rabbitmq connection can be reestablished.
As most rabbitmq upgrades require a full downtime, upgrades to rabbitmq are almost impossible to do without openstack to be offline when using neutron dynamic routing.
The latest versions of rabbitmq support rolling upgrades, so I'm not sure whether that scenario will still be relevant looking forward. Also if a node looses connectivity to rabbitmq, I'm not sure that serving stale data is better than stopping announcements and hoping other nodes would take over. At least this change of behavior would need to be configurable and possibly include a timeout after which announcements still would be dropped. Like maybe 5 minutes, similar to what would happen in a graceful restart scenario. So I'd suggest to treat this as a feature request rather than a bug.