mysqld process terminated unexpectedly on joiner node while performing SST

Bug #1669323 reported by weishan
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
New
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Invalid
Undecided
Unassigned

Bug Description

Hi!

Our joiner node is unable to join the PXC cluster due to mysqld terminating while joining the cluster.

1. I'm using the most current Percona XtraDB Version.
57-5.7.16-27.19.1.el7.x86_64

2. The database size is about 100GB

3. 2 node PXC cluster with 1 x garbd node

What I noticed so far is, if I kept Restart=always in the mysql systemd script, it will keep on doing SST sync until it finishes, then it starts the SST process again.

Tags: sst
Revision history for this message
weishan (weishan.ang) wrote :
Revision history for this message
weishan (weishan.ang) wrote :
Revision history for this message
weishan (weishan.ang) wrote :
tags: added: sst
description: updated
no longer affects: percona-server
summary: - mysqld process terminated unexpectedly on joiner node
+ mysqld process terminated unexpectedly on joiner node while performing
+ SST
description: updated
Revision history for this message
Jericho Rivera (jericho-rivera) wrote :

Please attach my.cnf from the 2 nodes (non-garbd). Can you confirm that there is no network issues between the nodes?

Changed in percona-xtradb-cluster:
status: New → Incomplete
Revision history for this message
weishan (weishan.ang) wrote :
Revision history for this message
weishan (weishan.ang) wrote :

Hi there,

The nodes are in the same DC without network issue. Firewalld is disabled for troubleshooting.

Previously, when there was no data in the PXC cluster, the SST ran smoothly without issues.

Changed in percona-xtradb-cluster:
status: Incomplete → New
Revision history for this message
Jericho Rivera (jericho-rivera) wrote :

It really looks like connection timed out though:

2017/03/02 09:42:22 socat[73325] E write(3, 0x104f1b0, 8192): Broken pipe
2017-03-02T09:42:22.434796Z 692394 [Note] Aborted connection 692394 to db: 'unconnected' user: 'sstuser' host: 'localhost' (Got an error reading communication packets)
WSREP_SST: [ERROR] xtrabackup finished with error: 1. Check /data/mysql//innobackup.backup.log (20170302 09:42:22.436)

And also:

xtrabackup: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
xb_stream_write_data() failed.
xtrabackup: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.

And then:

WSREP_SST: [ERROR] Parent mysqld process (PID:88236) terminated unexpectedly. (20170302 09:41:54.098)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20170302 09:41:54.100)

Can you retry starting node2 again for SST, fwiw I've used your mysql wsrep config and it worked for me without editing the systemd file.

I suggest you open a forum post in percona.com/forums and we can continue there. If we really determine that it's a bug we can then confirm this one.

Changed in percona-xtradb-cluster:
status: New → Invalid
Revision history for this message
weishan (weishan.ang) wrote :

Hello,

I went to retry the test and has attached the logs. Please look at the latest logs.

From the logs, you can see the following:

1. node2 start mysqld at 2017-03-02T17:22:11
2. node1 received request from node2 at 2017-03-02T17:22:16
3. node1 error => "socat[137985] E connect(3, AF=2 node2IP:4444, 16): Connection refused" at 2017/03/02 17:22:27
4. innodbbackup log shows the error at 170302 17:22:27
5. mysqld terminated at (20170302 17:22:17.601)
WSREP_SST: [ERROR] Parent mysqld process (PID:128979) terminated unexpectedly.

My apologies for the error log previously as I found out the time difference between node1 and node2 was about 50s. I had resolved it but the issue remains

Revision history for this message
Jericho Rivera (jericho-rivera) wrote :

node1:

2017-03-02T17:22:17.608943Z 0 [Note] WSREP: forgetting c854375a (tcp://node2IP:4567)
2017-03-02T17:22:17.609433Z 0 [Note] WSREP: Node 507b3843 state prim
2017-03-02T17:22:17.609737Z 0 [Note] WSREP: view(view_id(PRIM,507b3843,322) memb {
 507b3843,0
 656c6320,0
} joined {
} left {
} partitioned {
 c854375a,0
})
2017-03-02T17:22:17.609748Z 0 [Note] WSREP: save pc into disk
2017-03-02T17:22:17.609931Z 0 [Note] WSREP: forgetting c854375a (tcp://node2IP:4567)

node2:
WSREP_SST: [ERROR] Parent mysqld process (PID:128979) terminated unexpectedly. (20170302 17:22:17.601)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20170302 17:22:17.603)

please move to forums as I don't think there is any bug here but rather some configuration that has to be fixed.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1958

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-1430

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.