Galera Replication from 5.6 node to 5.5 node fails

Bug #1251137 reported by Raghavendra D Prabhu
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
In Progress
Medium
Seppo Jaakola
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Invalid
High
Unassigned
5.6
Fix Committed
Undecided
Unassigned

Bug Description

Galera replication stream from a 5.6 host to a 5.5 node fails

5.5 node:
==============================================

131114 12:28:57 [ERROR] Error in Log_event::read_log_event(): 'Sanity check failed', data_len: 36, event_type: 30
131114 12:28:57 [ERROR] WSREP: applier could not read binlog event, seqno: 213, len: 0
131114 12:28:57 [Warning] WSREP: Failed to apply app buffer: seqno: 213, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 2th time
131114 12:28:57 [ERROR] Error in Log_event::read_log_event(): 'Sanity check failed', data_len: 36, event_type: 30
131114 12:28:57 [ERROR] WSREP: applier could not read binlog event, seqno: 213, len: 0
131114 12:28:57 [Warning] WSREP: Failed to apply app buffer: seqno: 213, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 3th time
131114 12:28:57 [ERROR] Error in Log_event::read_log_event(): 'Sanity check failed', data_len: 36, event_type: 30
131114 12:28:57 [ERROR] WSREP: applier could not read binlog event, seqno: 213, len: 0
131114 12:28:57 [Warning] WSREP: Failed to apply app buffer: seqno: 213, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 4th time
131114 12:28:57 [ERROR] Error in Log_event::read_log_event(): 'Sanity check failed', data_len: 36, event_type: 30
131114 12:28:57 [ERROR] WSREP: applier could not read binlog event, seqno: 213, len: 0
131114 12:28:57 [Warning] WSREP: Failed to apply app buffer: seqno: 213, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 5th time
131114 12:28:57 [ERROR] Error in Log_event::read_log_event(): 'Sanity check failed', data_len: 36, event_type: 30
131114 12:28:57 [ERROR] WSREP: applier could not read binlog event, seqno: 213, len: 0
131114 12:28:57 [Warning] WSREP: Failed to apply app buffer: seqno: 213, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 6th time
131114 12:28:57 [ERROR] Error in Log_event::read_log_event(): 'Sanity check failed', data_len: 36, event_type: 30
131114 12:28:57 [ERROR] WSREP: applier could not read binlog event, seqno: 213, len: 0
131114 12:28:57 [Warning] WSREP: Failed to apply app buffer: seqno: 213, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 7th time
131114 12:28:57 [ERROR] Error in Log_event::read_log_event(): 'Sanity check failed', data_len: 36, event_type: 30
131114 12:28:57 [ERROR] WSREP: applier could not read binlog event, seqno: 213, len: 0
131114 12:28:57 [Warning] WSREP: Failed to apply app buffer: seqno: 213, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 8th time
131114 12:28:57 [ERROR] Error in Log_event::read_log_event(): 'Sanity check failed', data_len: 36, event_type: 30
131114 12:28:57 [ERROR] WSREP: applier could not read binlog event, seqno: 213, len: 0
131114 12:28:57 [Warning] WSREP: Failed to apply app buffer: seqno: 213, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 9th time
131114 12:28:57 [ERROR] Error in Log_event::read_log_event(): 'Sanity check failed', data_len: 36, event_type: 30
131114 12:28:57 [ERROR] WSREP: applier could not read binlog event, seqno: 213, len: 0
131114 12:28:57 [Warning] WSREP: Failed to apply app buffer: seqno: 213, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 10th time
131114 12:28:57 [ERROR] Error in Log_event::read_log_event(): 'Sanity check failed', data_len: 36, event_type: 30
131114 12:28:57 [ERROR] WSREP: applier could not read binlog event, seqno: 213, len: 0
131114 12:28:57 [ERROR] WSREP: Failed to apply trx: source: e87c1c41-4cf9-11e3-bea1-52ac20638b33 version: 2 local: 0 state: APPLYING flags: 129 conn_id: 4 trx_id: 16908 seqnos (l: 4, g: 213, s: 212, d: 212, ts: 1384412336898091623)
131114 12:28:57 [ERROR] WSREP: Failed to apply trx 213 10 times
131114 12:28:57 [ERROR] WSREP: Node consistency compromized, aborting...
131114 12:28:57 [Note] WSREP: Closing send monitor...
131114 12:28:57 [Note] WSREP: Closed send monitor.
131114 12:28:57 [Note] WSREP: gcomm: terminating thread
131114 12:28:57 [Note] WSREP: gcomm: joining thread
131114 12:28:57 [Note] WSREP: gcomm: closing backend
131114 12:28:58 [Note] WSREP: view(view_id(NON_PRIM,e87c1c41-4cf9-11e3-bea1-52ac20638b33,2) memb {
        faef1ff9-4cf9-11e3-934c-b651d62adf92,
} joined {
} left {
} partitioned {
        e87c1c41-4cf9-11e3-bea1-52ac20638b33,
}

5.6 node:
=========================================================================

2013-11-14 12:28:57 7110 [Note] WSREP: New cluster view: global state: 11264cec-06e6-11e2-0800-61616b1fc754:213, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 2
2013-11-14 12:28:57 7110 [Warning] WSREP: Unsupported protocol downgrade: incremental data collection disabled. Expect abort.
2013-11-14 12:28:57 7110 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2013-11-14 12:28:57 7110 [Note] WSREP: REPL Protocols: 5 (3, 1)
2013-11-14 12:28:57 7110 [Note] WSREP: Assign initial position for certification: 213, protocol version: 3
2013-11-14 12:28:57 7110 [Note] WSREP: Service thread queue flushed.
2013-11-14 12:28:57 7110 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 12:29:03 7110 [Note] WSREP: cleaning up faef1ff9-4cf9-11e3-934c-b651d62adf92 (tcp://10.0.2.154:4567)

==================================================================================================

5.5 node has 25.2.8 galera
5.6 node has 25.3.1 galera

Note, that this is even with socket.checksum=1 on 5.6 host.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

5.6 node is on:

rpm -qa | grep -i percona
Percona-XtraDB-Cluster-client-56-5.6.14-rel25.1.1.rhel6.x86_64
Percona-XtraDB-Cluster-galera-56-3.1-1.169.rhel6.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.14-rel25.1.1.rhel6.x86_64
Percona-Server-shared-51-5.1.72-rel14.10.597.rhel6.x86_64
percona-xtrabackup-2.1.5-680.rhel6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.14-rel25.1.1.rhel6.x86_64
percona-testing-0.0-1.noarch

(Server version: 5.6.14-56 Percona XtraDB Cluster (GPL), Release 25.1, Revision 557, wsrep_25.1.r4019)

5.5 node is on:

rpm -qa | grep -i percona
percona-xtrabackup-test-2.0.2-461.rhel6.x86_64
Percona-XtraDB-Cluster-client-5.5.34-25.9.575.rhel6.x86_64
Percona-Server-shared-compat-5.5.34-rel32.0.591.rhel6.x86_64
percona-toolkit-2.1.3-2.noarch
Percona-XtraDB-Cluster-galera-2.8-1.165.rhel6.x86_64
Percona-XtraDB-Cluster-shared-5.5.34-25.9.575.rhel6.x86_64
Percona-XtraDB-Cluster-server-5.5.34-25.9.575.rhel6.x86_64
Percona-XtraDB-Cluster-debuginfo-5.5.34-25.9.575.rhel6.x86_64
Percona-XtraDB-Cluster-galera-debuginfo-2.8-1.162.rhel6.x86_64
percona-testing-0.0-1.noarch
percona-xtrabackup-2.1.5-680.rhel6.x86_64
Percona-XtraDB-Cluster-test-5.5.34-25.9.575.rhel6.x86_64
Percona-XtraDB-Cluster-devel-5.5.34-25.9.575.rhel6.x86_64

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Galera on 3.1

2013-11-14 12:26:38 7110 [Note] WSREP: wsrep_load(): Galera 3.1(r169) by Codership Oy <email address hidden> loaded successfully.

on 2.8:

131114 11:05:56 [Note] WSREP: wsrep_load(): Galera 2.8(r165) by Codership Oy <email address hidden> loaded successfully.

summary: - Galera Replication from 5.5 to 5.6 fails
+ Galera Replication between 5.6 and 5.5 fails
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote : Re: Galera Replication between 5.6 and 5.5 fails

5.6 config:
==========================================================================
[mysqld]
datadir=/var/lib/mysql

#log_slave_updates

server-id=341
#log_bin = /var/lib/mysql/mysql-bin.log

binlog_format = ROW
innodb_buffer_pool_size = 100M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_log_files_in_group = 2
innodb_log_file_size = 512M
innodb_file_per_table = 1
wsrep-node-address=10.0.2.153

wsrep_cluster_address='gcomm://Pxc1,Pxc2'
wsrep_provider=/usr/lib64/libgalera_smm.so
wsrep_provider_options = "socket.checksum = 1"

wsrep_slave_threads=2
wsrep_cluster_name=PXC
wsrep_sst_method=xtrabackup-v2
wsrep_node_name=Pxc1

innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2

[client]
user=root
password=test

5.5 config:
============================================================
[mysqld]
datadir=/var/lib/mysql

server-id=248

binlog_format = ROW
thread_stack = 256K
thread_cache_size = 512
tmp_table_size = 32M
max_heap_table_size = 32M
max_connections = 10000
open-files-limit = 65535
table_open_cache = 8192
table_definition_cache = 8192
key_buffer_size = 64M
innodb_buffer_pool_size = 500M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_log_files_in_group = 2
innodb_log_file_size = 512M
innodb_file_per_table = 1
wsrep-node-address=10.0.2.154

loose-query_response_time_stats

wsrep_cluster_address='gcomm://Pxc1,Pxc2'
wsrep_provider=/usr/lib64/libgalera_smm.so

wsrep_slave_threads=2
wsrep_cluster_name=PXC
wsrep_sst_method=xtrabackup-v2
wsrep_node_name=Pxc2

innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2

[client]
user=root
password=test

========================================================================================

As you can see I had binlogging enabled earlier on nodes but disabled it, even then it fails.

Changed in percona-xtradb-cluster:
importance: Undecided → High
Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

quick manual testing suggest that replication from MySQL 5.5 node to MySQL 5.6 node works.
But trying to replicate from 5.6 -> 5.5 will cause immediate crash.

So it looks like migrating to 5.6 cluster would be possible by allowing writes to 5.5 nodes only, until all nodes have been upgraded to 5.6 level

Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)
Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

MySQL 5.6 -> 5.5 replication is violated if 5.6 node uses gtid, binlog checksums or new ROW event formats. These can be prevented by configuring 5.6 node with:

log_bin_use_v1_row_events=1
gtid_mode=0
binlog_checksum=NONE

With this configuration, at least basic 5.6 -> 5.5 replication seems to work. But more testing is needed...

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :
Download full text (9.3 KiB)

While the configuration worked with this:

=================
mysql> use sbtest;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> create table xyz (x int(11) auto_increment primary key);
Query OK, 0 rows affected (0.09 sec)

mysql> insert into xyz values (NULL);
Query OK, 1 row affected (0.01 sec)

mysql> insert into xyz values (NULL);
Query OK, 1 row affected (0.01 sec)

mysql> insert into xyz values (NULL);
Query OK, 1 row affected (0.01 sec)

mysql> insert into xyz values (NULL);
Query OK, 1 row affected (0.00 sec)

mysql> insert into xyz values (NULL);
Query OK, 1 row affected (0.02 sec)
========================================================

A sysbench workload didn't go well.

sysbench --test=./oltp.lua --db-driver=mysql --mysql-engine-trx=yes --mysql-table-engine=innodb --mysql-user=root --mysql-password=test --oltp-table-size=30000 --num-threads=16 --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=30000 --max-requests=300000 run
sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16
Random number generator seed is 0 and will be ignored

Threads started!

ALERT: failed to execute MySQL query: `INSERT INTO sbtest1 (id, k, c, pad) VALUES (14851, 14946, '24726585045-91236881311-60534147758-46321582953-08975433339-12295930364-91635364131-39593067613-88729288733-07642591607', '15735895994-49834668127-10360676632-98841189449-62687644138')`:
ALERT: Error 1062 Duplicate entry '14851' for key 'PRIMARY'
FATAL: failed to execute function `event': (null)
^C

5.6 node
==============
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21:20:32 6370 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-14 21...

Read more...

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

#3 is for 5.6 to 5.5 replication.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :
Download full text (5.4 KiB)

a) s/#3/#6/ in previous comment.

b) Tested 5.5 --------> 5.6, works even with multiple sysbench threads.

c) 5.6 --------> 5.5 OTOH fails even with sysbench of 1 thread. (so multi-threading is not an issue here):

^[[A131114 21:57:46 [ERROR] Slave SQL: Could not execute Update_rows event on table sbtest.sbtest1; Column 'k' cannot be null, Error_code: 1048; Duplicate entry '15073' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 174, Error_code: 1062
131114 21:57:46 [Warning] WSREP: RBR event 3 Update_rows apply warning: 121, 9970
131114 21:57:46 [Warning] WSREP: Failed to apply app buffer: seqno: 9970, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 2th time
131114 21:57:46 [ERROR] Slave SQL: Could not execute Update_rows event on table sbtest.sbtest1; Column 'k' cannot be null, Error_code: 1048; Duplicate entry '15073' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 174, Error_code: 1062
131114 21:57:46 [Warning] WSREP: RBR event 3 Update_rows apply warning: 121, 9970
131114 21:57:46 [Warning] WSREP: Failed to apply app buffer: seqno: 9970, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 3th time
131114 21:57:46 [ERROR] Slave SQL: Could not execute Update_rows event on table sbtest.sbtest1; Column 'k' cannot be null, Error_code: 1048; Duplicate entry '15073' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 174, Error_code: 1062
131114 21:57:46 [Warning] WSREP: RBR event 3 Update_rows apply warning: 121, 9970
131114 21:57:46 [Warning] WSREP: Failed to apply app buffer: seqno: 9970, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 4th time
131114 21:57:46 [ERROR] Slave SQL: Could not execute Update_rows event on table sbtest.sbtest1; Column 'k' cannot be null, Error_code: 1048; Duplicate entry '15073' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 174, Error_code: 1062
131114 21:57:46 [Warning] WSREP: RBR event 3 Update_rows apply warning: 121, 9970
131114 21:57:46 [Warning] WSREP: Failed to apply app buffer: seqno: 9970, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 5th time
131114 21:57:46 [ERROR] Slave SQL: Could not execute Update_rows event on table sbtest.sbtest1; Column 'k' cannot be null, Error_code: 1048; Duplicate entry '15073' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 174, Error_code: 1062
131114 21:57:46 [Warning] WSREP: RBR event 3 Update_rows apply warning: 121, 9970
131114 21:57:46 [Warning] WSREP: Failed to apply app buffer: seqno: 9970, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 6th time
131114 21:57:46 [ERROR] Slave SQL: Could not execute Update_rows event on table sbtest.sbtest1; Column 'k' cannot be null, Error_code: 1048; Duplicate entry '15073' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the ...

Read more...

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

sysbench --test=./oltp.lua --db-driver=mysql --mysql-engine-trx=yes --mysql-table-engine=innodb --mysql-user=root --mysql-password=test --oltp-table-size=30000 --num-threads=1 --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=30000 --max-requests=30 run

was the one used for in #8's 5.6 -----> 5.5 replication test.

Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

Replication in direction 5.5 -> 5.6 can also crash if parallel applying is enabled, following error message follows (in 5.6 node that is):

2013-11-15 12:29:34 30316 [ERROR] WSREP: Trx 27782 tries to abort slave trx 27783. This could be caused by:
        1) unsupported configuration options combination, please check documentation.
        2) a bug in the code.
        3) a database corruption.

The affected table has both primary key and unique key. Dependency calculation goes obviously wrong in 5.6 node.
(same load between 5.5 nodes goes fine)

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Even with PA off, I get this on 5.5 node:

131120 21:49:33 [ERROR] Slave SQL: Could not execute Update_rows event on table sbtest.sbtest1; Column 'k' cannot be null, Error_code: 1048; Duplicate entry '14991' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 174, Error_code: 1062
131120 21:49:33 [Warning] WSREP: RBR event 3 Update_rows apply warning: 121, 3051
131120 21:49:33 [Warning] WSREP: Failed to apply app buffer: seqno: 3051, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 2th time
131120 21:49:33 [ERROR] Slave SQL: Could not execute Update_rows event on table sbtest.sbtest1; Column 'k' cannot be null, Error_code: 1048; Duplicate entry '14991' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 174, Error_code: 1062
131120 21:49:33 [Warning] WSREP: RBR event 3 Update_rows apply warning: 121, 3051
131120 21:49:33 [Warning] WSREP: Failed to apply app buffer: seqno: 3051, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 3th time
131120 21:49:33 [ERROR] Slave SQL: Could not execute Update_rows event on table sbtest.sbtest1; Column 'k' cannot be null, Error_code: 1048; Duplicate entry '14991' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 174, Error_code: 1062
131120 21:49:33 [Warning] WSREP: RBR event 3 Update_rows apply warning: 121, 3051
131120 21:49:33 [Warning] WSREP: Failed to apply app buffer: seqno: 3051, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 4th time

=================================================================================

5.6 node cnf:

[mysqld]
datadir=/var/lib/mysql
binlog_format = ROW
innodb_buffer_pool_size = 100M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_log_files_in_group = 2
innodb_log_file_size = 512M
innodb_file_per_table = 1

wsrep_cluster_address='gcomm://Pxc1,Pxc2'
wsrep_provider=/usr/lib64/libgalera_smm.so
wsrep_provider_options = "socket.checksum=1"

wsrep_slave_threads=1
wsrep_cluster_name=PXC
wsrep_sst_method=xtrabackup-v2
wsrep_node_name=Pxc1

log_bin_use_v1_row_events=1
gtid_mode=0
binlog_checksum=NONE

innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2

[client]
user=root
password=test

5.5 node cnf:
[mysqld]
datadir=/var/lib/mysql
binlog_format = ROW
innodb_buffer_pool_size = 100M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_log_files_in_group = 2
innodb_log_file_size = 512M
innodb_file_per_table = 1

wsrep_cluster_address='gcomm://Pxc1,Pxc2'
wsrep_provider=/usr/lib64/libgalera_smm.so

wsrep_slave_threads=1
wsrep_cluster_name=PXC
wsrep_sst_method=xtrabackup-v2
wsrep_node_name=Pxc2

innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2

[client]
user=root
password=test

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

To add to #11,

on 5.6 node, even with compat config options and PA off, I see

2013-11-21 10:08:29 3140 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3
2013-11-21 10:08:29 3140 [Warning] WSREP: trx protocol version: 2 does not match certification protocol version: 3

So, there may be some protocol level violation that we are facing here.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

For #12 ,

from galera::Certification::do_test in certification.cpp:

    if (trx->version() != version_)
    {
        log_warn << "trx protocol version: "
                 << trx->version()
                 << " does not match certification protocol version: "
                 << version_;
        return TEST_FAILED;
    }

Does this test need to be relaxed/modified for cross-version
replication?

From

"2013-11-21 10:08:29 3140 [Warning] WSREP: trx protocol version:
2 does not match certification protocol version: 3"

it looks like trx protocol 2 (of 5.5 node) is not compatible with
certification protocol 3 (of Galera 3 on PXC 5.6 node). Is this
right?

summary: - Galera Replication between 5.6 and 5.5 fails
+ Galera Replication from 5.6 node to 5.5 node fails
Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

Changed the title to:
"Galera Replication from 5.6 node to 5.5 node fails"

This bug is used to track issues with replication from 5.6 node to 5.5 node. There is a separate bug to track issues with replication in opposite direction: https://bugs.launchpad.net/codership-mysql/+bug/1267494

Note that 5.6 -> 5.5 replication is not critical for migration to 5.6 cluster. The migration can work by using one 5.5 master while upgrading all slaves to 5.6 level, and for this process only 5.5 -> 5.6 replication will be needed

5.6 -> 5.5 replication will be needed only if the 5.6 migration needs to happen in multi-master mode, or there is need to maintain hybrid 5.6-5.6 cluster for long term

Changed in codership-mysql:
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

Tested 5.6 -> 5.5 replication (Galera 3.1), with the required compatibility configuration:

log_bin_use_v1_row_events=1
gtid_mode=0
binlog_checksum=NONE

And can see that 5.5 node crashes at slave applying for:

140109 15:42:46 [ERROR] Slave SQL: Column 5 of table 'test.comm00' cannot be converted from type '<unknown type>' to type 'timestamp', Error_code: 1677

This happens with sqlgen load, which updates table with timestamp datatype.

I tried the same load using MySQL replication from 5.6 to 5.5 node, and same error happens there as well. So we probably have a MySQL bug to deal with. However, this may not be worth fixing, if migration is the only target here.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :
Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

Yes, http://bugs.mysql.com/bug.php?id=70085 is the only remaining issue preventing replication in 5.6 -> 5.5 direction.
Fixing this is not seen as priority, as there is a working migration path to 5.6 based cluster, regardless of this bug.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-983

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.