Comment 70 for bug 1927868

Revision history for this message
Christian Rohmann (christian-rohmann) wrote :

@hopem thanks for your nice reply and the complete overview of the situation.

I do understand the issue with exception handling and propagation between privsep and the reader.
As one cannot catch all exceptions or erroneous conditions that systems might reach, a major improvement would be to consider possible ways to reconcile in this and also other situations:

1) If the setup of any of the various components (veth interfaces, routes, iptables, ...) fails, switch away from being the keepalived master giving any other node the chance to actually take over

2) If a node is the master but things failed retry to set things up once more

To avoid excessive retries certainly an exponential back-off needs to be applied to retries, but
the state of a node being the HA router master, but then not being ready to service traffic must not remain.