Comment 6 for bug 1285380

Revision history for this message
Alex Yurchenko (ayurchen) wrote : Re: [Bug 1285380] Re: Donor in desynced state makes the joiner wait indefinitely

On 2014-02-28 15:58, Ovais Tariq wrote:
> wsrep_sst_donor only had a single hostname specified. So yes that
> should be
> the ONLY node used for SST, but perhaps there should be a timeout of
> some
> sort so that the DBA is notified about SST failure and he can possibly
> bring out the other node from desynced state.

Timeout is possible, however:

1) It is not exactly a failure. Even exactly not a failure. The node was
instructed to use only one donor for SST and hence is patiently waiting
for it to be available. What can be better, aborting?

2) When a joiner is waiting for the donor to be available, you can see
the following in the error log:
140228 18:54:41 [Note] WSREP: Requesting state transfer failed:
-11(Resource temporarily unavailable). Will keep retrying every 1
second(s)
so it is not exactly silent.

3) Joiner is by definition in undefined state. If it is aborted at that
moment, next time it will request an SST (unless you know what you're
doing).

4) In that particular case, are they willing to decrease availability of
their cluster even more by allowing another node to become an SST donor?

5) Even more, since there is already backup going on, it would be
considerably faster to use that backup (when it's finished) to bootstrap
the joiner rather then do it all over again via SST from another node.