rsync SST fails when innodb_data_home is different between the nodes

Bug #1251342 reported by Alex Yurchenko on 2013-11-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Wishlist
Unassigned
5.5
Wishlist
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Confirmed
Medium
Unassigned
5.6
Confirmed
Medium
Unassigned

Bug Description

> If you use two nodes with different locations for ibdata, then rsync will
> get you a broken mysql node.
>
> First node:
>
> datadir = "/var/lib/mysql"
> innodb_data_home_dir = "/srv/mysql/"
>
> Second node:
>
> datadir = "/var/lib/mysql"
> innodb_data_home_dir = "/var/lib/mysql"
>
> If you use "wsrep_sst_method = rsync_wan" only DataDir is copied correctly.
> The ibdata1 file is not copied at all.
> So mysql crashes and creats a empty 10MB ibdata1 file.

Alex Yurchenko (ayurchen) wrote :
tags: added: rsync sst
Alex Yurchenko (ayurchen) wrote :

lp:1098566 resolution is xtrabackup-dependent. For now workaround for this issue is to use xtrabackup.

Przemek (pmalkowski) wrote :

In general it is IMHO a very bad idea to use different data or log paths on different nodes. Cluster should be formed from homogeneous nodes in all aspects.

With rsync SST method, even if both nodes have the same datadir and innodb_data_home_dir paths, the SST fails. So the problem here is datadir != innodb_data_home_dir.
With wsrep_sst_method=xtrabackup-v2 this is no problem though.

This is the configuration I used on nodes.
Donor:

[root@percona1 ~]# cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
innodb_data_home_dir = /srv/mysql
user=mysql
log_error=percona1_error.log
binlog_format=ROW
wsrep_provider=/usr/lib64/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.3.2,192.168.3.3,192.168.3.4
wsrep_node_address=192.168.3.2
wsrep_slave_threads=2
wsrep_cluster_name=L1
wsrep_sst_method=rsync
#wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=root:
#wsrep_provider_options = "pc.ignore_sb=true;gcache.size=64M"
wsrep_node_name=percona1
innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
innodb_log_file_size=64M
bind-address=192.168.3.2
innodb_file_per_table=1
log_slave_updates
server-id=1
#support GTID
enforce_gtid_consistency=1
gtid_mode=on
log-bin=percona1-bin

Joiner:

[root@percona2 ~]# cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
innodb_data_home_dir = /srv/mysql

user=mysql
log_error=percona2_error.log
binlog_format=ROW
wsrep_provider=/usr/lib64/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.3.2,192.168.3.3,192.168.3.4
wsrep_node_address=192.168.3.3
wsrep_slave_threads=2
wsrep_cluster_name=L1
wsrep_sst_method=rsync
#wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=root:
wsrep_node_name=percona2
innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
innodb_log_file_size=64M
bind-address=192.168.3.3
innodb_file_per_table=1
log_slave_updates
server-id=2
#support GTID
enforce_gtid_consistency=1
gtid_mode=on
log-bin=percona2-bin

After SST on joiner:

[root@percona2 ~]# ls -lh /var/lib/mysql/
total 321M
-rw-------. 1 mysql mysql 129M Jun 17 12:04 galera.cache
-rw-rw----. 1 mysql mysql 104 Jun 17 12:04 grastate.dat
-rw-rw----. 1 mysql mysql 64M Jun 17 12:04 ib_logfile0
-rw-rw----. 1 mysql mysql 64M Jun 17 12:04 ib_logfile1
-rw-rw----. 1 mysql mysql 64M Jun 17 12:04 ib_logfile101
drwx------. 2 mysql mysql 4.0K Jun 17 12:04 mysql
-rw-rw----. 1 mysql mysql 191 Jun 17 12:04 percona1-bin.000008
-rw-rw----. 1 mysql mysql 0 Jun 17 12:04 percona2-bin.index
-rw-r-----. 1 mysql root 15K Jun 17 12:04 percona2_error.log
drwx------. 2 mysql mysql 4.0K Jun 17 12:04 performance_schema
drwx------. 2 mysql mysql 4.0K Jun 17 12:04 test
[root@percona2 ~]# ls -lh /srv/mysql/
total 12M
-rw-rw----. 1 mysql mysql 12M Jun 17 12:04 ibdata1

Error logs in attachments. Clearly the ibdata1 was not copied from donor.

Przemek (pmalkowski) wrote :
Przemek (pmalkowski) wrote :
Download full text (11.2 KiB)

The situation where rsync SST fails on both nodes with settings:
datadir=/var/lib/mysql
innodb_data_home_dir = /srv/mysql
happens on both PXC 5.6.15 and 5.6.19 (testing repo).

Moreover, when the directory specified in innodb_data_home_dir is missing on joiner node, not only the joiner node will fail to join the cluster, but also the donor node will get stuck in non-primary state (donor is the only working node).

Err log snippet on joiner:
"2014-06-17 14:43:49 15154 [Note] WSREP: Member 0.0 (percona20) requested state transfer from '*any*'. Selected 1.0 (percona10)(SYNCED) as donor.
2014-06-17 14:43:49 15154 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1)
2014-06-17 14:43:49 15154 [Note] WSREP: Requesting state transfer: success, donor: 1
2014-06-17 14:43:49 15154 [Note] WSREP: (0589678f, 'tcp://0.0.0.0:4567') turning message relay requesting off
2014-06-17 14:44:00 15154 [Note] WSREP: 1.0 (percona10): State transfer to 0.0 (percona20) complete.
2014-06-17 14:44:00 15154 [Note] WSREP: Member 1.0 (percona10) synced with group.
WSREP_SST: [INFO] Extracting binlog files: (20140617 14:44:00.801)
percona10-bin.000014
ls: cannot access percona20-bin.*: No such file or directory
WSREP_SST: [INFO] Joiner cleanup. (20140617 14:44:00.822)
WSREP_SST: [INFO] Joiner cleanup done. (20140617 14:44:01.331)
2014-06-17 14:44:01 15154 [Note] WSREP: SST complete, seqno: 1
2014-06-17 14:44:01 15154 [Note] Plugin 'FEDERATED' is disabled.
2014-06-17 14:44:01 7fa1f935d7e0 InnoDB: Warning: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option may be removed in future releases. Please use READ COMMITTED transaction isolation level instead, see http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html.
2014-06-17 14:44:01 15154 [Note] InnoDB: Using atomics to ref count buffer pool pages
2014-06-17 14:44:01 15154 [Note] InnoDB: The InnoDB memory heap is disabled
2014-06-17 14:44:01 15154 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2014-06-17 14:44:01 15154 [Note] InnoDB: Compressed tables use zlib 1.2.3
2014-06-17 14:44:01 15154 [Note] InnoDB: Using Linux native AIO
2014-06-17 14:44:01 15154 [Note] InnoDB: Not using CPU crc32 instructions
2014-06-17 14:44:01 15154 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2014-06-17 14:44:01 15154 [Note] InnoDB: Completed initialization of buffer pool
2014-06-17 14:44:01 7fa1f935d7e0 InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
2014-06-17 14:44:01 15154 [ERROR] InnoDB: File /srv/mysql/ibdata1: 'create' returned OS error 71. Cannot continue operation
140617 14:44:01 mysqld_safe mysqld from pid file /var/lib/mysql/percona20.pid ended"

Err log snippet on donor:
"2014-06-17 14:43:50 3498 [Note] WSREP: Member 0.0 (percona20) requested state transfer from '*any*'. Selected 1.0 (percona10)(SYNCED) as donor.
2014-06-17 14:43:50 3498 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 1)
2014-06-17 14:43:50 3498 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-06-17 14...

Przemek (pmalkowski) wrote :

Confirmed also on Server version: 5.5.37-35.0-55 Percona XtraDB Cluster.

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1087

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers