Inconsistencies crash PXC without being verbose why

Bug #1281124 reported by Kenny Gryp on 2014-02-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Won't Fix
Medium
Unassigned
5.6
Fix Committed
Medium
Unassigned

Bug Description

node3 mysql> create table a (a int) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

node3 mysql> set wsrep_on=0;
Query OK, 0 rows affected (0.00 sec)

node3 mysql> insert into a values (2);
Query OK, 1 row affected (0.00 sec)

node3 mysql> set wsrep_on=1;
Query OK, 0 rows affected (0.00 sec)

node3 mysql> select * from a;
+------+
| a |
+------+
| 2 |
+------+
1 row in set (0.00 sec)

node3 mysql> delete from a;
Query OK, 1 row affected (0.00 sec)

So I caused an inconsistency on purpose, and deleted that row that was missing on the other nodes.
I know this is is not normal, I should not be doing that.
The reason why I wanted to do that is to see what error message you get. I wondered: Does it tell anything on what was inconsistent? Can we find out which table, which row? Maybe output the row-event/writeset in readable format?

[root@node1 ~]# rpm -qa | grep Percona-Xt
Percona-XtraDB-Cluster-client-56-5.6.15-25.3.706.rhel6.x86_64
Percona-XtraDB-Cluster-galera-3-3.3-1.203.rhel6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.15-25.3.706.rhel6.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.15-25.3.706.rhel6.x86_64

Config:

[mysqld]
datadir = /var/lib/mysql
log_error = error.log

query_cache_size=0
query_cache_type=0

binlog_format = ROW

innodb_buffer_pool_size = 128M
innodb_log_file_size = 64M
innodb_flush_method = O_DIRECT
innodb_file_per_table
innodb_flush_log_at_trx_commit = 0

wsrep_cluster_name = mycluster
wsrep_node_name = node1

wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_provider_options = ""

wsrep_slave_threads = 4
wsrep_auto_increment_control = ON

wsrep_sst_method = xtrabackup
wsrep_sst_auth = sst:secret

wsrep-debug=1
wsrep-provider-options="debug=1"

innodb_autoinc_lock_mode = 2

Kenny Gryp (gryp) wrote :
Download full text (53.7 KiB)

Here's what I got on the node that was executed::

2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():45: failed handler from read_completion_condition:474 socket 0x7f041400d2d8 13 error 1 1 state 2
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():52: local endpoint tcp://192.168.70.2:4567 remote endpoint tcp://192.168.70.3:43628
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:handle_up():1367: socket in state 3
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast_proto.cpp:set_state():13: State change: OK -> FAILED
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:handle_failed():637: handle failed: v=0,lu=9d24bc26-97d8-11e3-860f-fab1d79f1e19,ru=749ac05c-97d9-11e3-9ae6-0f4dd301e31f,ls=0,rs=0,la=tcp://0.0.0.0:4567,ra=tcp://192.168.70.3:4567,mc=,gn=mycluster,ch=0,st=FAILED,pr=1,tp=0x7f041400d230,ts=4406011659568
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:handle_failed():666: (9d24bc26-97d8-11e3-860f-fab1d79f1e19, 'tcp://0.0.0.0:4567') setting next reconnect time to 4407096155576 for tcp://192.168.70.3:4567
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:close():240: closing 0x7f041400d2d8 state 3 send_q size 0
2014-02-17 14:15:25 5046 [Note] WSREP: gcomm/src/gmcast.cpp:update_addresses():805: (9d24bc26-97d8-11e3-860f-fab1d79f1e19, 'tcp://0.0.0.0:4567') address 'tcp://192.168.70.2:4567' pointing to uuid 9d24bc26-97d8-11e3-860f-fab1d79f1e19 is blacklisted, skipping
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:update_addresses():846: (9d24bc26-97d8-11e3-860f-fab1d79f1e19, 'tcp://0.0.0.0:4567') --- mcast tree begin ---
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:update_addresses():863: Proto: v=0,lu=9d24bc26-97d8-11e3-860f-fab1d79f1e19,ru=3ef17bf4-97da-11e3-8b92-57d14582fd80,ls=0,rs=0,la=tcp://0.0.0.0:4567,ra=tcp://192.168.70.4:4567,mc=,gn=mycluster,ch=0,st=OK,pr=1,tp=0x7f0414003160,ts=4406011676004
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:update_addresses():887: (9d24bc26-97d8-11e3-860f-fab1d79f1e19, 'tcp://0.0.0.0:4567') self index: 1
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:update_addresses():888: (9d24bc26-97d8-11e3-860f-fab1d79f1e19, 'tcp://0.0.0.0:4567') --- mcast tree end ---
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():45: failed handler from read_handler:373 socket 0x7f041400d2d8 -1 error 1 0 state 0
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():52: local endpoint tcp://192.168.70.2:4567 remote endpoint tcp://192.168.70.3:43628
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:~AsioTcpSocket():33: dtor for 0x7f041400d2d8
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():45: failed handler from read_completion_condition:474 socket 0x7f0414003208 14 error 1 1 state 2
2014-02-17 14:15:25 5046 [Note] [Debug] WSREP: gcomm/src/asio_tcp.cpp:failed_handler():52: local endpoint tcp://192.168.70.2:4567 remote endpoint tcp://192.168.70.4:58637
2...

Muhammad Irfan (muhammad-irfan) wrote :
Download full text (6.6 KiB)

I tried to reproduce with lastest PXC version 5.6.22-72.0-56 ercona XtraDB Cluster (GPL), Release rel72.0, Revision 978, WSREP version 25.8, wsrep_25.8.r4150. After deleting record on master i.e. writer node I got below error on other nodes of the cluster

2015-06-09 05:49:25 21598 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.a; Can't find record in 'a', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log FIRST, end_log_pos 149, Error_code: 1032
2015-06-09 05:49:25 21598 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 137, 48483717
2015-06-09 05:49:25 21598 [Warning] WSREP: Failed to apply app buffer: seqno: 48483717, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
2015-06-09 05:49:25 21598 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.a; Can't find record in 'a', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log FIRST, end_log_pos 149, Error_code: 1032
2015-06-09 05:49:25 21598 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 137, 48483717
2015-06-09 05:49:25 21598 [Warning] WSREP: Failed to apply app buffer: seqno: 48483717, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 3th time
2015-06-09 05:49:25 21598 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.a; Can't find record in 'a', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log FIRST, end_log_pos 149, Error_code: 1032
2015-06-09 05:49:25 21598 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 137, 48483717
2015-06-09 05:49:25 21598 [Warning] WSREP: Failed to apply app buffer: seqno: 48483717, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 4th time
2015-06-09 05:49:25 21598 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.a; Can't find record in 'a', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log FIRST, end_log_pos 149, Error_code: 1032
2015-06-09 05:49:25 21598 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 137, 48483717
2015-06-09 05:49:25 21598 [ERROR] WSREP: Failed to apply trx: source: 6697faad-0e8b-11e5-b4d9-3f7722481adf version: 3 local: 0 state: APPLYING flags: 1 conn_id: 13 trx_id: 141946374 seqnos (l: 5, g: 48483717, s: 48483716, d: 48483716, ts: 5046077371079801)
2015-06-09 05:49:25 21598 [ERROR] WSREP: Failed to apply trx 48483717 4 times
2015-06-09 05:49:25 21598 [ERROR] WSREP: Node consistency compromized, aborting...
2015-06-09 05:49:25 21598 [Note] WSREP: Closing send monitor...
2015-06-09 05:49:25 21598 [Note] WSREP: Closed send monitor.
2015-06-09 05:49:25 21598 [Note] WSREP: gcomm: terminating thread
2015-06-09 05:49:25 21598 [Note] WSREP: gcomm: joining thread
2015-06-09 05:49:25 21598 [Note] WSREP: gcomm: closing backend
2015-06-09 05:49:25 21598 [Note] WSREP: view(view_id(NON_PRIM,6697faad,3) memb {
        be72198a,0
} joined {
} left {
} partitioned {
        6697faad,102
        9a8c7599,0
})
2015-06-09 05:49:25 21598 [Note] WSREP: view((empty))
2015-06-09 05:49:25 21598 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2015-06-09 05:49:25 2159...

Read more...

Krunal Bauskar (krunal-bauskar) wrote :

Don't this see anymore with latest PXC-5.6

mysql>
mysql> create table a (a int) engine=innodb;
Query OK, 0 rows affected (0.05 sec)

mysql> set wsrep_on=0;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into a values (2);
Query OK, 1 row affected (0.00 sec)

mysql> set wsrep_on=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from a;
+------+
| a |
+------+
| 2 |
+------+
1 row in set (0.00 sec)

mysql> delete from a;
Query OK, 1 row affected (0.01 sec)

mysql> select * from a;
Empty set (0.00 sec)

mysql> set wsrep_on=0;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into a values (2);
Query OK, 1 row affected (0.01 sec)

mysql> set wsrep_on=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from a;
+------+
| a |
+------+
| 2 |
+------+
1 row in set (0.00 sec)

mysql> delete from a where a = 2;
Query OK, 1 row affected (0.02 sec)

mysql>

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1099

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers