Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Bug #1285380
Comment #9

Comment 9 for bug 1285380

Revision history for this message

Ryan Gordon (ryan-5) wrote on 2014-03-02:

Hi Alex,

> It is not implemented at the moment, but should be possible to
> implement.
>
> However, from what I understand, in this case the only purpose of the
> timeout is to be a sort of notification for the operator that the donor
> node is currently unavailable doing the backup. Which operator may be
> already aware of, since the backup is a scheduled operation. Given that
> during the joining phase one can't connect to mysqld (by mysqld design),
> the only source of feedback is the error log. So it is only natural to
> expect the operator to check the log for the progress of the operation.
> So this is another source to get the information that the donor is
> unavailable.
>
> At the same time timeout will result in aborting the joiner operation
> (what else?), which is exactly the opposite to what you want: bringing
> up the joiner as fast as possible. Ideally the joiner should keep on
> working and waiting for donor to become available and operator should
> simply go to donor and make sure it is not busy with something else.
>
> So I'm not exactly sure if timeout is what you really want. It doesn't
> even save you from looking at the error log - you still have to do it
> after joiner aborts. It just aborts the normally progressing operation.

You're correct a timeout isn't exactly related problem I'm trying to solve, but in respect to the original ticket, it is still a very helpful feature because in the event that the cluster is in a compromised state and an SST is not responding, DBA's can tune or understand what to expect for the default SST timeout. Right now I have no idea how long it would be before the joiner node would give up.

> Really? doesn't it say that it temporarily cannot access a "resource"
> (prescribed donor) and will keep on retrying? It is kinda generic, but
> at the level it is logged this is all information that it has (error
> code).

I do still contend that the message is not very clear: "140228 18:54:41 [Note] WSREP: Requesting state transfer failed: -11(Resource temporarily unavailable). Will keep retrying every 1 second(s)"

"Resource temporarily unavailable" doesn't really tell me anything specific and I can't find a list of PXC error codes anywhere on the percona website. Why not "[Note]: WSREP: Requesting state transfer failed: (Error Code: -11 Donor already in DESYNCED state, must be SYNCED first)."? That would be a lot clearer to me

> Are you sure you would prefer a log message on donor? It is the joiner
> who is having problems, not donor, so why would you look into donor log
> before checking the joiner? What sort of message should the donor
> produce that would not contain redundant information?

Yes. In the situations I've experienced the donor is having a problem (it is already in a desynced state and thus unable to provide a state transfer), not the joiner (perhaps the joiner is trying to IST from a scheduled restart which would be a perfectly nominal scenario). And so in these situations it is useful to have debug information on the donor saying "[Note] WSREP: Joiner node XYZ requesting state transfer" and subsuquently "[Note] WSREP: Unable to provide state transfer to node XYZ, already in DESYNCED mode." I don't think anyone trying to manage/understand their distributed cluster minds a little redundant information if it means an increased time to clarity.

> Besides it is not working as you think it is. Cluster is selecting a
> donor based on the information provided by joiner. If the donor is
> unavailable, then joiner receive error, but donor does not receive
> anything. It is unaware about joiner having problems. One reason it is
> so is that there can be several potential donors.

Just so I understand better, do the nodes all publish their state and then the joiner checks to see which nodes are in a synced state and if it can ask that node to be the donor? In this way, there wouldn't be any actual request from the joiner to the donor if all of the donors are in a desynced state, right? Is this how the procedure currently works or am I mistaken?

Our setup right now has only one node (the backup node) predefined to be the donor (to prevent FTWRL's from running on a node that is serving production traffic). Maybe this isn't the best setup. What would you're recommendation be if you had to setup a cluster with xtrabackup running on one of the nodes (and have it interact nicely with the xtrabackup SST donor script)?

Does the node even need to be desynced when xtrabackup runs? During the FTWRL?

> But, suppose we can go as far as see that there is only one possible
> donor and then we pass the state transfer request to it. What can donor
> do? It knows that it is in desynced state and something serious is going
> on. But it does not know
>
> 1) what is going on?
> 2) is it interruptible?
> 3) how to interrupt it?
>
> In this particular case, it is an xtrabackup started externally. For
> mysqld to know something about it and being able to interrupt it we'll
> have to integrate backup process with mysqld - which (besides being a
> questionable idea in its own right) is far outside of our replication
> focus.

Sure, I'm certainly not saying the cluster should know which process had put the node into a desynced state but I'm thinking if there was a way to use the wsrep_notify_cmd setting or a new setting similar to that, to perform the rendezvous of the SST for more tricky setups like ours, that would be ideal. That script could be programmed to stop the current backup in progress and put the node back into a synced state so it could be available to provide a SST for the joiner node.

Hi Alex,

> It is not implemented at the moment, but should be possible to
> implement.
> 
> However, from what I understand, in this case the only purpose of the
> timeout is to be a sort of notification for the operator that the donor
> node is currently unavailable doing the backup. Which operator may be
> already aware of, since the backup is a scheduled operation. Given that
> during the joining phase one can't connect to mysqld (by mysqld design),
> the only source of feedback is the error log. So it is only natural to
> expect the operator to check the log for the progress of the operation.
> So this is another source to get the information that the donor is
> unavailable.
> 
> At the same time timeout will result in aborting the joiner operation
> (what else?), which is exactly the opposite to what you want: bringing
> up the joiner as fast as possible. Ideally the joiner should keep on
> working and waiting for donor to become available and operator should
> simply go to donor and make sure it is not busy with something else.
> 
> So I'm not exactly sure if timeout is what you really want. It doesn't
> even save you from looking at the error log - you still have to do it
> after joiner aborts. It just aborts the normally progressing operation.

I do still contend that the message is not very clear: "140228 18:54:41 [Note] WSREP: Requesting state transfer failed: -11(Resource temporarily unavailable). Will keep retrying every 1 second(s)"

Does the node even need to be desynced when xtrabackup runs? During the FTWRL?

> But, suppose we can go as far as see that there is only one possible
> donor and then we pass the state transfer request to it. What can donor
> do? It knows that it is in desynced state and something serious is going
> on. But it does not know
> 
> 1) what is going on?
> 2) is it interruptible?
> 3) how to interrupt it?
> 
> In this particular case, it is an xtrabackup started externally. For
> mysqld to know something about it and being able to interrupt it we'll
> have to integrate backup process with mysqld - which (besides being a
> questionable idea in its own right) is far outside of our replication
> focus.