Reduce the number of aborted connections on donor during state transfer

Bug #1134625 reported by Aleksey Sanin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
New
Wishlist
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Confirmed
Wishlist
Unassigned
5.6
Confirmed
Wishlist
Unassigned

Bug Description

This is not a bug but a feature request. And I am not sure if it should be reported against Galera or WSREP project.

At the moment (galera 23.2.2) the state transfer (SST or IST) immediately aborts all connections on the donor. This causes the client (app) to either retry the transaction/query or just fail. It would be great to add a (configurable) grace period to allow existing connections to finish:
1) Donor receives SST or IST request.
2) Donor immediately stops accepting new connections but does not kill existing queries/connections.
3) After a grace period (5-10 secs by default), all current connections are killed and state transfer proceeds.

This would allow a typical web app to gracefully finish processing current request and seamlessly transition connections to another server in the cluster either using in-app fail over or HAproxy or something else.

Hope this makes sense :)

no longer affects: galera
Changed in codership-mysql:
importance: Undecided → Wishlist
Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Sounds nice, but requires substantial research. Not a 1-day hack.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Could confirm connection closing only for mysqldump SST. Other SST methods and IST don't close client connections at all. This takes much of the base from under this feature request.

Revision history for this message
Aleksey Sanin (aleksey-l) wrote :

Apologies, I should have been more clear on the terminology: the connections are not "aborted" but rather the next query on the connection gets "UNKNOWN COMMAND" error for both sst (rsync) and ist.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Aleksey,

Before sending another query on the same connection, won't/can't you
check the HAProxy status for that node (which will not be 200),
so that you don't have to send it and gracefully end the
connection?

Revision history for this message
Aleksey Sanin (aleksey-l) wrote :

@Raghavendra

There is no HAProxy in the picture :)

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1160

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.