Primary node crashes while adding a node while primary is still getting writes

Bug #1564617 reported by amit arora
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Incomplete
Undecided
Kenn Takara

Bug Description

We have setup a 3 node Percona Xtradb Cluster.

Version 5.6.28

Each node contains around 3.5T of data.

node1: 1.1.1.1 : pxc1 : Primary node (bootstrap - this is the only node where writes happen)
node2: 1.1.1.2 : pxc2 : Slave node
node3: 1.1.1.3 : pxc3 : Slave node

Initially while setting up the cluster,
I created node1 by streaming an innobackup from another remote server (regular Percona 5.6.19).
Bootstrapped node1 and created appropriate users
Joined node2 and it did an SST (no writes were coming on node1 at this point)
Joined node3 and it did an SST (no writes were coming on node1 at this point)

It worked fine until now.

Now we started a stress test on this cluster, where all the writes and reads were pointed on node1, it was working fine.
If I remove a node from the cluster and keep it out for long enough where if I have to join it back, it will have to to a full SST

To remove node2(pxc2) from the cluster
service mysql stop

To add it back
service mysql start --wsrep_sst_donor=pxc3
Two things happen in above case:
1. It works fine, it does a full SST if the stress test has been stopped and no writes are coming on the primary node(node1:pxc1)
2. The SST on node2:pxc2 fails and crashes the primary node (node1:pxc1) if the stress test is ON and it is getting writes.

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Thanks for reporting. Can you share more data

1. You configuration (my.cnf).

2. What crash do you hit on node-1.

3. Do you see this consistently or it is intermittent.

4. What kind of stress test is being done. (If you have a small reproducible scenario that you can share with us on generic data that would help accelerate this).

5. Do you see any error on any of the nodes before/during SST. This should be available in log files.

Changed in percona-xtradb-cluster:
status: New → Incomplete
Revision history for this message
amit arora (amit-0923) wrote :

Hi Krunal,

1. You configuration (my.cnf).
File attached : mycnf

2. What crash do you hit on node-1.
Signal 11 (have attached the mysqld.log during the time frame)

3. Do you see this consistently or it is intermittent.
Intermittent

4. What kind of stress test is being done. (If you have a small reproducible scenario that you can share with us on generic data that would help accelerate this).
Apache jmeter - We try to simulate pattern similar to our Prod load

5. Do you see any error on any of the nodes before/during SST. This should be available in log files.
Log files attached.

Revision history for this message
amit arora (amit-0923) wrote :
Revision history for this message
amit arora (amit-0923) wrote :
Revision history for this message
amit arora (amit-0923) wrote :
Revision history for this message
amit arora (amit-0923) wrote :
Revision history for this message
amit arora (amit-0923) wrote :

Please let me know if anything else is needed.

Changed in percona-xtradb-cluster:
assignee: nobody → Kenn Takara (kenn-takara)
Revision history for this message
Kenn Takara (kenn-takara) wrote :

Hi Amit,

I'm trying to repro this and I have a couple of question:

(1) can you let me know what OS you are running this on? (the output of 'uname -vrm' should be good).
(2) what is the basic machine config (CPUs, RAM)
(3) Is there anything special about your production load? Are the table sizes skewed? like one big 3TB table or more roughly spread out? Similarly for your queries, are they about equal reads/writes? or skewed towards reads or writes?

Thanks,
Kenn

Revision history for this message
amit arora (amit-0923) wrote :

Hi Kenn,

(1) can you let me know what OS you are running this on? (the output of 'uname -vrm' should be good).
Centos 6.6
(2) what is the basic machine config (CPUs, RAM)
32 core machine with hyper-threading and 512G RAM
(3) Is there anything special about your production load? Are the table sizes skewed? like one big 3TB table or more roughly spread out? Similarly for your queries, are they about equal reads/writes? or skewed towards reads or writes?
There are couple of tables that are few 100Gs, but rest of them are pretty consistent.
Our load is 80-20 read-write.

Please let me know if you need any other details.

Thanks,
Amit

Revision history for this message
Kenn Takara (kenn-takara) wrote :

Hi Amit,

Can you supply the configuration files for nodes 2 and 3? I'm trying to replicate your setup as close as I can (configuration anyway, not necessarily the scale).

Thanks,
Kenn

Revision history for this message
amit arora (amit-0923) wrote :

Hi Kenn,

My.cnf for node 2 and 3 are the same except
Wsrep_node_address which are 1.1.1.2 and 1.1.1.3 for node2 and node3 respectively
Wsrep_node_name pxc2 and pxc3 for node2 and node3 respectively.

Thanks,
Amit

Revision history for this message
Kenn Takara (kenn-takara) wrote :

Hi Amit,

Still unable to repro. From your log files, it looks like

- node-1,2,3 startup
- node-2 is stopped and started

- node-3 is chosen as the sst-donor
2016-03-24 12:58:10 109240 [Note] WSREP: Member 1.0 (pxc2) requested state transfer from '*any*'. Selected 2.0 (pxc3)(SYNCED) as donor.

This is where it gets weird. it appears like node-3 just drops out for a minute (if the times are correct). I don't
see anything in the logs for a minute.

At this point, both node-2 and node-3 remove node-3
2016-03-24 12:58:16 109240 [Note] WSREP: forgetting dae8e800 (tcp://1.1.1.3:4567)
2016-03-24 12:58:16 109240 [Note] WSREP: deleting entry tcp://1.1.1.3:4567

node-2 terminates since it can't sync
2016-03-24 12:58:16 106285 [Warning] WSREP: Donor dae8e800-f091-11e5-a25f-9e7670babd3a is no longer in the group. State transfer cannot be completed, need to abort. Aborting...

since node-1 is the only one left and is non-quorum, it quits
2016-03-24 12:58:23 109240 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-03-24 12:58:23 109240 [Note] WSREP: Flow-control interval: [16, 16]
2016-03-24 12:58:23 109240 [Note] WSREP: Received NON-PRIMARY.

And then we get problem #1, a segfault in node-1.

And then, sometime later, node-3 comes back at 12:59:15 and can't connect to anyone else.

So it looks like there are two main issues:
(1) the segfault on node-1
(2) the (apparent) connectivity failure on node-3

I haven't been able to repro this, one suggestion is to get more info by running the nodes with WSREP_DEBUG=ON. (I also would like to verify the config files since the my.cnf had a different wsrep_cluster_name than the log files).

Thanks,
Kenn

Revision history for this message
amit arora (amit-0923) wrote :

Hi Kenn,

Thanks for the explanation.

Let's say node3(pxc3) drops out and node2(pxc2) is not able to connect to node3 aborts the SST, but why is node1(pxc1) segfaulting and crashing? What we don't get is why is it killing node1(pxc1) during this process, which is where all our writes come.

We are also not able to reproduce the issue anymore.

Regarding the wsrep_cluster_name, the name in the logs is correct, I updated the name in mycnf before sending you the copy.
Name used in logs is what mycnf had.

Revision history for this message
Kenn Takara (kenn-takara) wrote :

Hi Amit,

I still haven't been able to reproduce the segfault (even ran node1 under valgrind with memcheck to check the memory accesses). I'm looking into a similar bug: 3-node cluster under stress where node-1 shuts down (but does not crash) when node-2 is shutdown, will update this bug if I find anything that may be the cause.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-530

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.