Bug #1608680 “SST donor node has status Joined after successful ...” : Bugs : Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Revision history for this message

Dan Urist (durist-ucar) wrote on 2016-08-02:

#1

Here are the log entries from the donor, from the end of the SST:

2016-08-01 14:50:23 19837 [Note] WSREP: 0.0 (dev-drupalsql2.ucar.edu): State transfer to 1.0 (dev-drupalsqlmaster.ucar.edu) complete.
2016-08-01 14:50:23 19837 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 3208942)
2016-08-01 14:50:23 19837 [Note] WSREP: SYNC message ignored as node 0.0 (dev-drupalsql2.ucar.edu) was re-transitioned to DONOR mode before it synced.
2016-08-01 14:50:23 19837 [ERROR] WSREP: sst sent called when not SST donor, state JOINED
WSREP_SST: [INFO] Total time on donor: 0 seconds (20160801 14:50:23.764)
WSREP_SST: [INFO] Cleaning up temporary directories (20160801 14:50:23.776)
2016-08-01 14:50:37 19837 [Note] WSREP: 1.0 (dev-drupalsqlmaster.ucar.edu): State transfer from 0.0 (dev-drupalsql2.ucar.edu) complete.
2016-08-01 14:50:37 19837 [Note] WSREP: Member 1.0 (dev-drupalsqlmaster.ucar.edu) synced with group.

Revision history for this message

Dan Urist (durist-ucar) wrote on 2016-08-02:

#2

Configuration:

[mysqld]
basedir = /usr
binlog_format = ROW
default-storage-engine = INNODB
expire_logs_days = 10
innodb_additional_mem_pool_size = 16M
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_size = 512M
innodb_log_file_size = 256M
key_buffer_size = 512M
max_allowed_packet = 64M
max_binlog_size = 100M
max_connections = 200
max_user_connections = 20
open_files_limit = 50000
query_cache_limit = 1M
query_cache_size = 0
query_cache_type = 0
read_buffer_size = 1M
skip-external-locking
socket = /var/run/mysqld/mysqld.sock
sort_buffer_size = 1M
table_open_cache = 64
thread_cache_size = 9000
thread_stack = 512K
tmpdir = /tmp
user = mysql
wsrep_cluster_name = dev_drupalsql
wsrep_notify_cmd = /usr/local/bin/galera_notify
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_provider_options = 'gcache.size=1G'
wsrep_slave_threads = 8
wsrep_sst_auth = XXXXX:XXXXXXXX
wsrep_sst_method = xtrabackup-v2

[sst]
encrypt = 2
inno-apply-opts = '--use-memory=1G'
inno-backup-opts = '--no-backup-locks'
sockopt = ',verify=0'
tca = /etc/mysql/server.crt
tcert = /etc/mysql/server.pem

Revision history for this message

Dan Urist (durist-ucar) wrote on 2016-08-02:

#3

Maybe relevant: https://github.com/codership/galera/issues/143

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2016-08-03:

#4

Dan,

Thanks for the report and log file snippet.

Log-file snippet could help us get some clue of what may be going on and so we have further questions too:

a. While the node was acting as donor was there any active workload that was running against it.

b. What is interesting is node is transition from DONOR -> JOINED and sync message post that is ignored given that node was re-transitioned to DONOR so wondering if there is any action that is forcing it to get back to DONOR state (RSU/FLUSH TABLE ? action).

(If possible can you share complete log file or at-least till the point incident took place).

Also, is this easily repeatable at your end then may be you can help with steps.
Of-course in normal sequence we don't see it.

Revision history for this message

Dan Urist (durist-ucar) wrote on 2016-08-03: Re: [Bug 1608680] Re: SST donor node has status Joined after successful sync

#5

error.log_dev-drupalsql2 Edit (497.9 KiB, application/octet-stream; name="error.log_dev-drupalsql2")
error.log_dev-drupalsqlmaster Edit (401.7 KiB, application/octet-stream; name="error.log_dev-drupalsqlmaster")

Download full text (5.3 KiB)

On Tue, Aug 2, 2016 at 7:56 PM, Krunal Bauskar <email address hidden>
wrote:

> Dan,
>
> Thanks for the report and log file snippet.
>
> Log-file snippet could help us get some clue of what may be going on and
> so we have further questions too:
>
> a. While the node was acting as donor was there any active workload that
> was running against it.
>

Very little, if any. This database just backs two development drupal sites
that are only getting regularly hit by monitoring software.

>
> b. What is interesting is node is transition from DONOR -> JOINED and
> sync message post that is ignored given that node was re-transitioned to
> DONOR so wondering if there is any action that is forcing it to get back
> to DONOR state (RSU/FLUSH TABLE ? action).
>
> (If possible can you share complete log file or at-least till the point
> incident took place).
>

Oddly, when I got up this morning, both nodes are in Donor/Desynced status.
I've attached log files from each.

>
> Also, is this easily repeatable at your end then may be you can help with
> steps.
> Of-course in normal sequence we don't see it.
>

Yes, I've been able to replicate the behavior on the development cluster
with the steps detailed in the bug report.

Also, in addition to the development cluster, I have a test cluster and a
production cluster. The problem first appeared in the production cluster,
which right now is running in a degraded state with only two nodes up (so
this is fairly urgent for me). The test cluster hasn't (yet) shown this
behavior, but it hasn't done an SST since the upgrade from the previous
version of Xtradb. If it's useful, I can send along logs from the
production cluster as well.

As far as I can tell, the issue started following the last xtradb upgrade;
we've been running xtradb clusters for a couple of years without issue.

Thanks very much for your help.

> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1608680
>
> Title:
> SST donor node has status Joined after successful sync
>
> Status in Percona XtraDB Cluster:
> New
>
> Bug description:
> The SST donor node has status "Joined" rather than "Synced" following
> successful SST.
>
> To replicate:
>
> 1) Existing cluster is down. Bootstrap first node by adding
> "wsrep_cluster_address = gcomm://" to my.cnf and start mysql server
> with "/etc/init.d/mysql bootstrap-pxc"
>
> 2) Check status of bootstrapped node:
>
> mysql> SHOW STATUS LIKE 'wsrep_local_state%';
> +---------------------------+--------------------------------------+
> | Variable_name | Value |
> +---------------------------+--------------------------------------+
> | wsrep_local_state_uuid | 7cb7d33a-8d28-11e3-a157-fb52b065702a |
> | wsrep_local_state | 4 |
> | wsrep_local_state_comment | Synced |
> +---------------------------+--------------------------------------+
>
> 3) Start second node (force SST by deleting grastate.dat if it exists)
>
> /etc/init.d/mysql start
>
> 4) Check status on second node
>
> m...

On Tue, Aug 2, 2016 at 7:56 PM, Krunal Bauskar <krunal.bauskar@percona.com>
wrote: