Comment 2 for bug 1523691

Revision history for this message
Federico Ceratto (federico-ceratto) wrote :

If we were to flag a zone in ERROR status after an unhandled exception and then move on to the next one we might incur in failure mode where a transient external error (e.g. a short network outage) leads to periodic_sync aggressively flagging many zones.
Periodic_recovery could kick in and restore the zones only after a relatively long time (1h).

We can implement a simple retry logic in periodic_sync to go through the failing zones N times (default 3) after a sleep interval (default 30s)