Inconsistencies crash PXC without being verbose why

Bug #1281124 reported by Kenny Gryp
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Won't Fix
Medium
Unassigned
5.6
Fix Committed
Medium
Unassigned

Bug Description

node3 mysql> create table a (a int) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

node3 mysql> set wsrep_on=0;
Query OK, 0 rows affected (0.00 sec)

node3 mysql> insert into a values (2);
Query OK, 1 row affected (0.00 sec)

node3 mysql> set wsrep_on=1;
Query OK, 0 rows affected (0.00 sec)

node3 mysql> select * from a;
+------+
| a |
+------+
| 2 |
+------+
1 row in set (0.00 sec)

node3 mysql> delete from a;
Query OK, 1 row affected (0.00 sec)

So I caused an inconsistency on purpose, and deleted that row that was missing on the other nodes.
I know this is is not normal, I should not be doing that.
The reason why I wanted to do that is to see what error message you get. I wondered: Does it tell anything on what was inconsistent? Can we find out which table, which row? Maybe output the row-event/writeset in readable format?

[root@node1 ~]# rpm -qa | grep Percona-Xt
Percona-XtraDB-Cluster-client-56-5.6.15-25.3.706.rhel6.x86_64
Percona-XtraDB-Cluster-galera-3-3.3-1.203.rhel6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.15-25.3.706.rhel6.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.15-25.3.706.rhel6.x86_64

Config:

[mysqld]
datadir = /var/lib/mysql
log_error = error.log

query_cache_size=0
query_cache_type=0

binlog_format = ROW

innodb_buffer_pool_size = 128M
innodb_log_file_size = 64M
innodb_flush_method = O_DIRECT
innodb_file_per_table
innodb_flush_log_at_trx_commit = 0

wsrep_cluster_name = mycluster
wsrep_node_name = node1

wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_provider_options = ""

wsrep_slave_threads = 4
wsrep_auto_increment_control = ON

wsrep_sst_method = xtrabackup
wsrep_sst_auth = sst:secret

wsrep-debug=1
wsrep-provider-options="debug=1"

innodb_autoinc_lock_mode = 2

Revision history for this message
Kenny Gryp (gryp) wrote :
Download full text (53.7 KiB)

Here's what I got on the node that was executed::

2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():45: failed handler from read_completion_condition:474 socket 0x7f041400d2d8 13 error 1 1 state 2
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():52: local endpoint tcp://192.168.70.2:4567 remote endpoint tcp://192.168.70.3:43628
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:handle_up():1367: socket in state 3
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast_proto.cpp:set_state():13: State change: OK -> FAILED
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:handle_failed():637: handle failed: v=0,lu=9d24bc26-97d8-11e3-860f-fab1d79f1e19,ru=749ac05c-97d9-11e3-9ae6-0f4dd301e31f,ls=0,rs=0,la=tcp://0.0.0.0:4567,ra=tcp://192.168.70.3:4567,mc=,gn=mycluster,ch=0,st=FAILED,pr=1,tp=0x7f041400d230,ts=4406011659568
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:handle_failed():666: (9d24bc26-97d8-11e3-860f-fab1d79f1e19, 'tcp://0.0.0.0:4567') setting next reconnect time to 4407096155576 for tcp://192.168.70.3:4567
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:close():240: closing 0x7f041400d2d8 state 3 send_q size 0
2014-02-17 14:15:25 5046 [Note] WSREP: gcomm/src/gmcast.cpp:update_addresses():805: (9d24bc26-97d8-11e3-860f-fab1d79f1e19, 'tcp://0.0.0.0:4567') address 'tcp://192.168.70.2:4567' pointing to uuid 9d24bc26-97d8-11e3-860f-fab1d79f1e19 is blacklisted, skipping
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:update_addresses():846: (9d24bc26-97d8-11e3-860f-fab1d79f1e19, 'tcp://0.0.0.0:4567') --- mcast tree begin ---
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:update_addresses():863: Proto: v=0,lu=9d24bc26-97d8-11e3-860f-fab1d79f1e19,ru=3ef17bf4-97da-11e3-8b92-57d14582fd80,ls=0,rs=0,la=tcp://0.0.0.0:4567,ra=tcp://192.168.70.4:4567,mc=,gn=mycluster,ch=0,st=OK,pr=1,tp=0x7f0414003160,ts=4406011676004
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:update_addresses():887: (9d24bc26-97d8-11e3-860f-fab1d79f1e19, 'tcp://0.0.0.0:4567') self index: 1
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:update_addresses():888: (9d24bc26-97d8-11e3-860f-fab1d79f1e19, 'tcp://0.0.0.0:4567') --- mcast tree end ---
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():45: failed handler from read_handler:373 socket 0x7f041400d2d8 -1 error 1 0 state 0
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():52: local endpoint tcp://192.168.70.2:4567 remote endpoint tcp://192.168.70.3:43628
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:~AsioTcpSocket():33: dtor for 0x7f041400d2d8
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():45: failed handler from read_completion_condition:474 socket 0x7f0414003208 14 error 1 1 state 2
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():52: local endpoint tcp://192.168.70.2:4567 remote endpoint tcp://192.168.70.4:58637
2...

Revision history for this message
Muhammad Irfan (muhammad-irfan) wrote :
Download full text (6.6 KiB)

I tried to reproduce with lastest PXC version 5.6.22-72.0-56 ercona XtraDB Cluster (GPL), Release rel72.0, Revision 978, WSREP version 25.8, wsrep_25.8.r4150. After deleting record on master i.e. writer node I got below error on other nodes of the cluster

2015-06-09 05:49:25 21598 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.a; Can't find record in 'a', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log FIRST, end_log_pos 149, Error_code: 1032
2015-06-09 05:49:25 21598 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 137, 48483717
2015-06-09 05:49:25 21598 [Warning] WSREP: Failed to apply app buffer: seqno: 48483717, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
2015-06-09 05:49:25 21598 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.a; Can't find record in 'a', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log FIRST, end_log_pos 149, Error_code: 1032
2015-06-09 05:49:25 21598 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 137, 48483717
2015-06-09 05:49:25 21598 [Warning] WSREP: Failed to apply app buffer: seqno: 48483717, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 3th time
2015-06-09 05:49:25 21598 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.a; Can't find record in 'a', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log FIRST, end_log_pos 149, Error_code: 1032
2015-06-09 05:49:25 21598 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 137, 48483717
2015-06-09 05:49:25 21598 [Warning] WSREP: Failed to apply app buffer: seqno: 48483717, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 4th time
2015-06-09 05:49:25 21598 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.a; Can't find record in 'a', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log FIRST, end_log_pos 149, Error_code: 1032
2015-06-09 05:49:25 21598 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 137, 48483717
2015-06-09 05:49:25 21598 [ERROR] WSREP: Failed to apply trx: source: 6697faad-0e8b-11e5-b4d9-3f7722481adf version: 3 local: 0 state: APPLYING flags: 1 conn_id: 13 trx_id: 141946374 seqnos (l: 5, g: 48483717, s: 48483716, d: 48483716, ts: 5046077371079801)
2015-06-09 05:49:25 21598 [ERROR] WSREP: Failed to apply trx 48483717 4 times
2015-06-09 05:49:25 21598 [ERROR] WSREP: Node consistency compromized, aborting...
2015-06-09 05:49:25 21598 [Note] WSREP: Closing send monitor...
2015-06-09 05:49:25 21598 [Note] WSREP: Closed send monitor.
2015-06-09 05:49:25 21598 [Note] WSREP: gcomm: terminating thread
2015-06-09 05:49:25 21598 [Note] WSREP: gcomm: joining thread
2015-06-09 05:49:25 21598 [Note] WSREP: gcomm: closing backend
2015-06-09 05:49:25 21598 [Note] WSREP: view(view_id(NON_PRIM,6697faad,3) memb {
        be72198a,0
} joined {
} left {
} partitioned {
        6697faad,102
        9a8c7599,0
})
2015-06-09 05:49:25 21598 [Note] WSREP: view((empty))
2015-06-09 05:49:25 21598 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2015-06-09 05:49:25 2159...

Read more...

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Don't this see anymore with latest PXC-5.6

mysql>
mysql> create table a (a int) engine=innodb;
Query OK, 0 rows affected (0.05 sec)

mysql> set wsrep_on=0;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into a values (2);
Query OK, 1 row affected (0.00 sec)

mysql> set wsrep_on=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from a;
+------+
| a |
+------+
| 2 |
+------+
1 row in set (0.00 sec)

mysql> delete from a;
Query OK, 1 row affected (0.01 sec)

mysql> select * from a;
Empty set (0.00 sec)

mysql> set wsrep_on=0;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into a values (2);
Query OK, 1 row affected (0.01 sec)

mysql> set wsrep_on=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from a;
+------+
| a |
+------+
| 2 |
+------+
1 row in set (0.00 sec)

mysql> delete from a where a = 2;
Query OK, 1 row affected (0.02 sec)

mysql>

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1099

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.