Primary node crashes while adding a node while primary is still getting writes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC |
Incomplete
|
Undecided
|
Kenn Takara |
Bug Description
We have setup a 3 node Percona Xtradb Cluster.
Version 5.6.28
Each node contains around 3.5T of data.
node1: 1.1.1.1 : pxc1 : Primary node (bootstrap - this is the only node where writes happen)
node2: 1.1.1.2 : pxc2 : Slave node
node3: 1.1.1.3 : pxc3 : Slave node
Initially while setting up the cluster,
I created node1 by streaming an innobackup from another remote server (regular Percona 5.6.19).
Bootstrapped node1 and created appropriate users
Joined node2 and it did an SST (no writes were coming on node1 at this point)
Joined node3 and it did an SST (no writes were coming on node1 at this point)
It worked fine until now.
Now we started a stress test on this cluster, where all the writes and reads were pointed on node1, it was working fine.
If I remove a node from the cluster and keep it out for long enough where if I have to join it back, it will have to to a full SST
To remove node2(pxc2) from the cluster
service mysql stop
To add it back
service mysql start --wsrep_
Two things happen in above case:
1. It works fine, it does a full SST if the stress test has been stopped and no writes are coming on the primary node(node1:pxc1)
2. The SST on node2:pxc2 fails and crashes the primary node (node1:pxc1) if the stress test is ON and it is getting writes.
Changed in percona-xtradb-cluster: | |
assignee: | nobody → Kenn Takara (kenn-takara) |
Thanks for reporting. Can you share more data
1. You configuration (my.cnf).
2. What crash do you hit on node-1.
3. Do you see this consistently or it is intermittent.
4. What kind of stress test is being done. (If you have a small reproducible scenario that you can share with us on generic data that would help accelerate this).
5. Do you see any error on any of the nodes before/during SST. This should be available in log files.