Percona XtraDB Cluster - HA scalable solution for MySQL

Error "no such a transition EXECUTING -> COMMITTED" on the master node, table with external key

Reported by Dmitry Gribov on 2013-02-12
44
This bug affects 6 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Undecided
Seppo Jaakola
Percona XtraDB Cluster
Status tracked in Trunk
5.6
Undecided
Unassigned
Trunk
Undecided
Unassigned

Bug Description

We have 3-nodes setup, huge tables, heavy load. Experienced frequent "no such a transition EXECUTING -> COMMITTED" shutdown. After some experiments We've found quite simple setup: 3 nodes, only first node is used for writing, two other nodes are just "hot-swap" to handle request upon first node failure. Error happened more and more frequently until cluster could not live for 5 minutes without destruction. Writing to "user_best_times" table causes master node (first node) to fail. Repeating the same write to the second node brings second node down as well. Third node then hangs in "Unknown comand" state (desync or whatever) or (sometimes) survives.

We intensively write to this table:
CREATE TABLE `user_best_times` (
  `user` int(10) unsigned NOT NULL,
  `week_day` tinyint(3) unsigned NOT NULL,
  `hour` tinyint(3) unsigned NOT NULL,
  `hits` smallint(5) unsigned NOT NULL,
  PRIMARY KEY (`user`,`week_day`,`hour`),
  KEY `fk_user_best_times_1` (`user`),
  CONSTRAINT `fk_user_best_times_1` FOREIGN KEY (`user`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Writes are like this:
INSERT INTO user_best_times (hits, hour, week_day, user)
                                        VALUES ('0.75','19','1','14240291')
                                ON DUPLICATE KEY UPDATE hits=hits+VALUES(hits);

Table is quite huge (87000000+ rows) and under heavy concurrency (4-50 requests) Percona goes down in a minute. Linked "users" table is updated not that frequently, it has 1868675 rows.

Finally it looks like I've found a workaround: I've removed FOREIGN KEY fk_user_best_times_1, doing it's job on the application side. I see no crushes since then.
my.cnf in the attachment.
I can send logs (wsrep_debug=1) for the crush time via e-mail (don't want to publish it for everyone).

We use Percona cluster 5.5.29, Galera 23.2.2rc2(r136):

mysql> \s
--------------
/usr/local/mysql/bin/mysql Ver 14.14 Distrib 5.5.29, for Linux (x86_64) using EditLine wrapper

Connection id: 400
Current database: schema_lock
Current user: fbhub@192.168.100.62
SSL: Not in use
Current pager: less
Using outfile: ''
Using delimiter: ;
Server version: 5.5.29 Source distribution, wsrep_23.7.1.rXXXX
Protocol version: 10
Connection: dbp01 via TCP/IP
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
TCP port: 8661
Uptime: 40 min 22 sec

Threads: 415 Questions: 257794 Slow queries: 2 Opens: 991 Flush tables: 3 Open tables: 182 Queries per second avg: 106.438
--------------

mysql> mysql> show status like 'wsrep%';
+----------------------------+-------------------------------------------------------------+
| Variable_name | Value |
+----------------------------+-------------------------------------------------------------+
| wsrep_local_state_uuid | f0d4d641-7475-11e2-0800-d4e89001eedb |
| wsrep_protocol_version | 4 |
| wsrep_last_committed | 1822295 |
| wsrep_replicated | 66873 |
| wsrep_replicated_bytes | 15768877 |
| wsrep_received | 7362 |
| wsrep_received_bytes | 1640124 |
| wsrep_local_commits | 66208 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.003955 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.000000 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 642.792723 |
| wsrep_apply_oooe | 0.007636 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 1.007636 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.000000 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 764 |
| wsrep_causal_reads | 0 |
| wsrep_incoming_addresses | 192.168.100.68:8661,192.168.100.70:8661,192.168.100.69:8661 |
| wsrep_cluster_conf_id | 19 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | f0d4d641-7475-11e2-0800-d4e89001eedb |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <email address hidden> |
| wsrep_provider_version | 23.2.2rc2(r136) |
| wsrep_ready | ON |
+----------------------------+-------------------------------------------------------------+

Dmitry Gribov (grib-d) wrote :

@Dmitry,

Can you provide your error log (containing the backtrace etc.) to
check if this is a known issue - lp:1110561

Dmitry Gribov (grib-d) wrote :

No, nothing simmilar. Sent to your e-mail with my logs.

Also noticed that with the very same table on the 5.2 MySQL (developer PC) query sometimes just hangs forever in "Sending data" on some requiest. Changing datatype `hits` smallint(5) unsigned NOT NULL, to `hits` int(10) unsigned NOT NULL, helps.
This huge tables make lots of trouble, somehow.

And I almost always see errors like "mysqldump: Error 1213: Deadlock found when trying to get lock; try restarting transaction when dumping table `spam_hits` at row: 5696359" while dumping unloaded node with options like this:

--single-transaction --allow-keyword --extended-insert --max-allowed-packet=20M\
 --default-character-set=utf8 --quick

I guess this is something really abnormal since --single-transaction dump is not acquiring any read-locks or whatever. And yet here is the write-error on read-request. Do not understand it.

And yet the databese without the index is stable (exept I can't dump it unless made inconsistent with --skip-lock-tables).

This is same as lp:1149735 - from the backtrace and the statement which causes it.

Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)
Seppo Jaakola (seppo-jaakola) wrote :

Dmitry, can you share your error log with me as well? (<email address hidden>) I'm troubleshooting the lp:1149735 , which indeed looks very similar case.

Seppo Jaakola (seppo-jaakola) wrote :

Dmitry, thanks for the logs. They contain lots of useful information, but still I'm not able to reproduce the crash in my lab :(. My test scenarios must miss some important element. I'm testing with various insert/delete sessions for 'user_best_times' and 'users' tables.
I'm also feeding in frequent "CREATE TABLE IF NOT EXISTS 'users' ..." queries, which seem to be part f your application logic.

Can you show the full definition of 'users table', it is only partial in the logs. The logs show a number of cases where INSERT to 'users' table was done with NULL key, I wonder why application would try smth like that.

Can you find a sample query for delete operations for these two tables? Can you think of some other write activity for these tables, which I should add to my test scenario?

Dmitry Gribov (grib-d) wrote :
Download full text (4.7 KiB)

While crushes took place no delete for table "users" was done. At least it wasn't the trigger. I guess the problem is the foreign key check with the (probably) key failure and then kind of internal rollback/ignore for the "not matching the key" query. Execution of "on delete cascade" is NOT a reason, I'm 99.9% sure.
And at the same time there is a bunch of other tables with foreign keys, no problems (except it seems like integrity constraints like "on delete cascade" sometimes do not work and produce "zimbie" records, but I still can't understand how exactly that happens).
And this problem arised with time, all was fine until the table grow to 80M of rows. Do you have a really LARGE table in the lab?

Here comes "users":

| users | CREATE TABLE `users` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `login` varchar(100) DEFAULT NULL,
  `pwd` varchar(100) DEFAULT NULL,
  `s_mail` varchar(50) DEFAULT NULL,
  `s_www` varchar(255) DEFAULT NULL,
  `s_inn` varchar(50) DEFAULT NULL,
  `s_descr` text,
  `s_phone` varchar(100) DEFAULT NULL,
  `offert_accepted` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `s_full_name` varchar(255) DEFAULT NULL,
  `s_first_name` varchar(100) DEFAULT NULL,
  `s_middle_name` varchar(100) DEFAULT NULL,
  `s_last_name` varchar(100) DEFAULT NULL,
  `s_city` varchar(100) DEFAULT NULL,
  `s_address` varchar(255) DEFAULT NULL,
  `last_used` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `last_host_id` smallint(5) unsigned DEFAULT NULL,
  `mail_confirmed` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `msisdn_confirmed` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `msisdn_confirm_code` varchar(4) DEFAULT NULL,
  `msisdn_req_time` datetime DEFAULT NULL,
  `java_phone_model` varchar(30) DEFAULT NULL,
  `java_font` varchar(30) DEFAULT NULL,
  `java_safe_mode` tinyint(4) NOT NULL DEFAULT '0',
  `user_pic` varchar(16) DEFAULT NULL,
  `denied_libs` varchar(255) DEFAULT NULL,
  `recenser_type` int(11) DEFAULT NULL,
  `partner_id` int(11) DEFAULT NULL,
  `creat_date` datetime DEFAULT NULL,
  `partner` int(10) unsigned DEFAULT NULL,
  `partner_valid_till` date DEFAULT NULL,
  `partner_pin` int(10) unsigned DEFAULT NULL,
  `account` decimal(9,2) NOT NULL DEFAULT '0.00',
  `abonement_start` datetime DEFAULT NULL,
  `abonement_expires` date NOT NULL DEFAULT '2006-01-01',
  `abonement_period` smallint(6) DEFAULT NULL,
  `abonement_delay` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `abonement_max_price` decimal(6,2) DEFAULT NULL,
  `abonement_downloads` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `abonement_left_clicks` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `abonement_left_summ` decimal(6,2) DEFAULT NULL,
  `user_pic_height` tinyint(3) unsigned DEFAULT NULL,
  `user_pic_width` tinyint(3) unsigned DEFAULT NULL,
  `show_pay_btn` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `s_puid` varchar(255) DEFAULT NULL,
  `discount` decimal(4,4) NOT NULL DEFAULT '0.0000',
  `money_bonus` decimal(9,2) NOT NULL DEFAULT '0.00',
  `subscr_last_reminded` datetime DEFAULT NULL,
  `subscr_free_arts_given` datetime DEFAULT NULL,
  `subscr_type` tinyint(3) unsigned NOT NULL DEFAULT '2',
  `subscr_period` tinyint(3) unsigned DEFAULT '1',
  `subscr_content` ti...

Read more...

Seppo Jaakola (seppo-jaakola) wrote :

during testing for reproducing this issue, another issue surfaced: lp:1155183, these issues can be related

Changed in codership-mysql:
status: New → In Progress
Seppo Jaakola (seppo-jaakola) wrote :

From the logs and core file, it is very clear what is the immediate reason for the crash. There is a transaction which has not logged any modified row key values for the replication provider, and from provider perspective this transaction does not exists. However, at commit time, the transaction cache contains some replication events and node decides to do emergency shutdown to avoid potential inconsistency.

Attempts to reproduce this issue have not been successful so far, and it is still a mystery why transaction processing ended into this troubled state.

Pushed new revision: http://bazaar.launchpad.net/~codership/codership-mysql/5.5-23/revision/3857, which logs a warning in error log and prints the excessive RBR buffer in GRA_* file for later inspection.

Seppo Jaakola (seppo-jaakola) wrote :

These NULL key warnings don't make sense:

130210 13:21:06 [Note] WSREP: NULL key skipped: INSERT IGNORE INTO users VALUES ('14214700','iph9027E4342B35','<skipped for security reasons>','','',NULL,NULL,'','0',' ','','','','',NULL,'2013-02-10 13:21:06','1965','0','0',NULL,NULL,NULL,NULL,'0',NULL,NULL,NULL,NULL,'2013-02-10 13:21:06',NULL,NULL,NULL,'0.00',NULL,'2006-01-01',NULL,'0',NULL,'0','0',NULL,NULL,NULL,'0',NULL,'0.0000','0.00','2012-08-10 13:21:06',NULL,'2','1','2',NULL,NULL,'0000-00-00 00:00:00',NULL,NULL,NULL,NULL,NULL,NULL,'0000-01-01 00:00:00',NULL,NULL,'1')

users table has two unique keys defined ('id' and 'login' columns, two first columns in value list) and they both have value passed in the value list.

Has the definition of users table changed? I wonder if I'm using wrong schema definition in test cluster...

btw. I'm now growing the size of user_best_times table in the testing to see if problem is triggered by row count somehow. I'm now at 30M rows - no issues so far.

Dmitry Gribov (grib-d) wrote :

> Has the definition of users table changed?
No

Seppo Jaakola (seppo-jaakola) wrote :

okay, this is getting spooky... I cannot get such warnings from my test cluster. I have to create test scenarios for making these warnings surface, they may relate to the actual crash issue.

@Dmitry,

Are you still seeing this issue with latest PXC release?

tags: added: foreign-key
Dmitry Gribov (grib-d) wrote :

I must alter a table to see this. If you believe the issue to be fixed, we can spend a couple of hours on this.

@Dmitry,

If you can provide a test case to verify this - in a vanilla environment (hence starting from table creation)
- we can test this.

tags: added: foreign-keys
removed: foreign-key
Dmitry Gribov (grib-d) wrote :

Unfortunately, the bug shows up only when the table is huge and the load is hi. It will take several work days to reproduce the whole situation. May be there is a simple trigger, but we had the bug under the heavy load on a massive table :(

Dmitry Gribov (grib-d) wrote :

ps. We can install some patched version of the server with full logging or whatever may help trapping the bug.

Pasting the full backtrace here: (since the other bug with trace has been marked duplicate)

130228 19:37:16 [ERROR] WSREP: FSM: no such a transition EXECUTING -> COMMITTED
00:37:16 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=1874
max_threads=5000
thread_count=1469
connection_count=1469
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 10951434 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0xa859e260
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f6f68fc0e48 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7c83f5]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x6a1e04]
/lib64/libpthread.so.0[0x370440f500]
/lib64/libc.so.6(gsignal+0x35)[0x37040328a5]
/lib64/libc.so.6(abort+0x175)[0x3704034085]
/usr/lib64/libgalera_smm.so(_ZN6galera3FSMINS_9TrxHandle5StateENS1_10TransitionENS_10EmptyGuardENS_11EmptyActionEE8shift_toES2_+0x2d9)[0x7f8cff6f6529]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM11post_commitEPNS_9TrxHandleE+0xd2)[0x7f8cff715c22]
/usr/lib64/libgalera_smm.so(galera_post_commit+0x5e)[0x7f8cff72e1be]
/usr/sbin/mysqld(_Z25wsrep_cleanup_transactionP3THD+0x97)[0x6599e7]
/usr/sbin/mysqld[0x6a7b06]
/usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x4b9)[0x6a9069]
/usr/sbin/mysqld(_Z17trans_commit_stmtP3THD+0x27)[0x644ff7]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x50c)[0x593c6c]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x333)[0x598de3]
/usr/sbin/mysqld[0x5997a0]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1a82)[0x59c072]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x59c499]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x15f)[0x63710f]
/usr/sbin/mysqld(handle_one_connection+0x51)[0x6372d1]
/lib64/libpthread.so.0[0x3704407851]
/lib64/libc.so.6(clone+0x6d)[0x37040e811d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f6ee507d620): is an invalid pointer
Connection ID (thread ID): 1710299
Status: NOT_KILLED

Download full text (145.0 KiB)

I was able to reproduce this on jenkins environment successfully (with debug build). However, since one of the other issues lp:1210509 was masked, further investigation is required.

Full backtrace:

======================================================================================
Started by upstream project "PXC-5.5-mrandgen" build number 20
originally caused by:
 Started by user Raghavendra D Prabhu
[EnvInject] - Loading node environment variables.
Building remotely on jhc-new-2-centos6-64 in workspace /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64
Copied 1 artifact from "build-xtradb-cluster-binaries ? debug,centos6-64" build number 74
[centos6-64] $ /bin/sh -xe /tmp/hudson68892146020080466.sh
++ pwd
+ ROOT_FS=/mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64
+ '[' -d percona-qa ']'
+ cd percona-qa
+ bzr pull
Warning: Permanently added 'bazaar.launchpad.net,91.189.95.84' (RSA) to the list of known hosts.

Using saved parent location: bzr+ssh://bazaar.launchpad.net/%2Bbranch/percona-qa/
No revisions to pull.
+ ./pxc-rqg.sh /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64
/mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/pxcgen /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64
Warning: Permanently added 'bazaar.launchpad.net,91.189.95.84' (RSA) to the list of known hosts.

Using saved parent location: bzr+ssh://bazaar.launchpad.net/~raghavendra-prabhu/randgen/pxc/
No revisions to pull.
/mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64
Workdir: /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/20
Basedir: /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/Percona-XtraDB-Cluster-5.5.32-23.7.5.460.Linux.x86_64
/mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/pxcgen /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64
Running with galera grammar
# 2013-08-11T01:26:53 [24767] Copyright (c) 2010,2011 Oracle and/or its affiliates. All rights reserved. Use is subject to license terms.
# 2013-08-11T01:26:53 [24767] Please see http://forge.mysql.com/wiki/Category:RandomQueryGenerator for more information on this test framework.
# 2013-08-11T01:26:53 [24767] Starting
# 2013-08-11T01:26:53 [24767] # runall-new.pl \
# 2013-08-11T01:26:53 [24767] # --basedir=/mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/Percona-XtraDB-Cluster-5.5.32-23.7.5.460.Linux.x86_64 \
# 2013-08-11T01:26:53 [24767] # --galera=ms \
# 2013-08-11T01:26:53 [24767] # --grammar=conf/galera/galera_stress.yy \
# 2013-08-11T01:26:53 [24767] # --gendata=conf/galera/galera_stress.zz \
# 2013-08-11T01:26:53 [24767] # --threads=16 \
# 2013-08-11T01:26:53 [24767] # --queries=10000 \
# 2013-08-11T01:26:53 [24767] # --mysqld=--wsrep-provider=/mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/Percona-XtraDB-Cluster-5.5.32-23.7.5.460.Linux.x86_64/lib/libgalera_smm.so \
# 2013-08-11T01:26:53 [24767] # --seed=time
# 2013-08-11T01:26:53 [24767] master_port : 19300 slave_port : 19302 ports : 19300 19302 MTR_BUILD_THREAD : 930
# 2013-08-11T01:26:53 [24767] /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/pxcgen Revno: 879
# 2013-08-11T01:26:53 [24767] /mnt...

Changed in percona-xtradb-cluster:
status: New → Confirmed
Download full text (3.3 KiB)

To gather more info, tried running it with general_log enabled.

However, in that case, node0 doesn't crash and node1 dies with:

========================================
InnoDB: There was a problem in converting'table100_innodb_key_pk_parts_2_int#P#p0' in charset filename to charset utf8_general_ciInnoDB: There was a problem in converting'table100_innodb_key_pk_parts_2_int#P#p1' in charset filename to charset utf8_general_ci130812 4:36:04 [ERROR] Slave SQL: Error 'Table 'test.table100_innodb_int_autoinc' doesn't exist' on query. Default database: 'test'. Query: 'CREATE TABLE r LIKE `table100_innodb_int_autoinc`', Error_code: 1146
130812 4:36:04 [Warning] WSREP: RBR event 1 Query apply warning: 1, 496
130812 4:36:04 [Warning] WSREP: Ignoring error for TO isolated action: source: 9c997c31-0308-11e3-b4e2-1f1ac96f6f57 version: 2 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 507, g: 496, s: 493, d: 495, ts: 1376282164810854701)
InnoDB: DEBUG: update_statistics for test/#sql-1a20_2#P#p0.
InnoDB: DEBUG: update_statistics for test/#sql-1a20_2#P#p1.
InnoDB: DEBUG: update_statistics for test/#sql-1a20_2#P#p0.
InnoDB: DEBUG: update_statistics for test/#sql-1a20_2#P#p1.
InnoDB: There was a problem in converting'table0_innodb_key_pk_parts_2_int_autoinc#P#p0' in charset filename to charset utf8_general_ciInnoDB: There was a problem in converting'table0_innodb_key_pk_parts_2_int_autoinc#P#p1' in charset filename to charset utf8_general_ci130812 4:36:04 [ERROR] Slave SQL: Error 'Table 'test.table1_innodb_int_autoinc' doesn't exist' on query. Default database: 'test'. Query: 'ALTER TABLE `table1_innodb_int_autoinc` MODIFY `col_int` BIT FIRST', Error_code: 1146
130812 4:36:04 [Warning] WSREP: RBR event 1 Query apply warning: 1, 498
130812 4:36:04 [Warning] WSREP: Ignoring error for TO isolated action: source: 9c997c31-0308-11e3-b4e2-1f1ac96f6f57 version: 2 local: 0 state: APPLYING flags: 65 conn_id: 11 trx_id: -1 seqnos (l: 509, g: 498, s: 495, d: 497, ts: 1376282164863088550)
130812 4:36:04 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1146, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 0 seqno: 501)
130812 4:36:04 [ERROR] Slave SQL: Error executing row event: 'Table 'test.r' doesn't exist', Error_code: 1146
130812 4:36:04 [Warning] WSREP: RBR event 2 Write_rows apply warning: 1146, 501
130812 4:36:04 [ERROR] WSREP: Failed to apply trx: source: 9c997c31-0308-11e3-b4e2-1f1ac96f6f57 version: 2 local: 0 state: APPLYING flags: 129 conn_id: 18 trx_id: 2630 seqnos (l: 512, g: 501, s: 496, d: 500, ts: 1376282164870997033)
130812 4:36:04 [ERROR] WSREP: Failed to apply app buffer: seqno: 501, status: WSREP_FATAL
  at galera/src/replicator_smm.cpp:apply_wscoll():52
  at galera/src/replicator_smm.cpp:apply_trx_ws():118
130812 4:36:04 [ERROR] WSREP: Node consistency compromized, aborting...
130812 4:36:04 [Note] WSREP: Closing send monitor...
130812 4:36:04 [Note] WSREP: Closed send monitor.
130812 4:36:04 [Note] WSREP: gcomm: terminating thread
130812 4:36:04 [Note] WSREP: gcomm: joining thread
130812 4:36:04 [Note] WSREP: gcomm: closing backend
130812 4:36:05 [Note] WSREP: view(view_id(NON_PRIM,9c997c31-0308-1...

Read more...

Log of node0

Log of node1

Attached node-logs with wsrep-debug enabled.

MySQL general log

Attached logs - error and general - of node0 and node1.

Seppo Jaakola (seppo-jaakola) wrote :

In the Raghu's test, the problem is due to: CREATE TABLE LIKE t, where t is a temporary table.
Here is a simple test to reliably crash the slave:

mysql> create table pp (i int);
mysql> insert into pp values (1);
mysql> create temporary table pptmp (i int, j int);
mysql> insert into pptmp values (1,2);
mysql> drop table pp;
mysql> create table pp like pptmp;
mysql> insert into pp values (1,2);

tags: added: i34695
Seppo Jaakola (seppo-jaakola) wrote :

This issue could be reproduced with Raghu's randgen grammar (galera_stress.yy). However, this problem incarnation could be different, as the error log has always before the crash a message:

[Warning] WSREP: SQL statement was ineffective: ...

A fix for this specific case was pushed in revision: http://bazaar.launchpad.net/~codership/codership-mysql/5.5-23/revision/3909

Note that this fix does not help if "SQL statement was ineffective.." message was not logged in the error log.

Download full text (28.8 KiB)

Crashing with following:

perl runall-new.pl --basedir=/pxc --galera=ms --reporter=Shutdown,Backtrace,ErrorLog,ErrorLogAlarm,Deadlock --grammar=$HOME/galera_stress.yy --gendata=$HOME/galera_stress.zz --threads=16 --queries=50000 --mysqld=--wsrep-provider=/pxc/lib/libgalera_smm.so --mysqld=--innodb_flush_method=O_DIRECT --mysqld=--lock_wait_timeout=50 --mysqld=--wsrep-debug=1 --mysqld=--wsrep_provider_options='debug=1' --mysqld=--wsrep_log_conflicts=ON
# 2013-09-03T20:51:15 [37861] Using Log::Log4perl
# 2013-09-03T20:51:15 [37861] Copyright (c) 2010,2011 Oracle and/or its affiliates. All rights reserved. Use is subject to license terms.
# 2013-09-03T20:51:15 [37861] Please see http://forge.mysql.com/wiki/Category:RandomQueryGenerator for more information on this test framework.
# 2013-09-03T20:51:15 [37861] Starting
# 2013-09-03T20:51:15 [37861] # runall-new.pl \
# 2013-09-03T20:51:15 [37861] # --basedir=/pxc \
# 2013-09-03T20:51:15 [37861] # --galera=ms \
# 2013-09-03T20:51:15 [37861] # --reporter=Shutdown,Backtrace,ErrorLog,ErrorLogAlarm,Deadlock \
# 2013-09-03T20:51:15 [37861] # --grammar=/home/raghavendra/galera_stress.yy \
# 2013-09-03T20:51:15 [37861] # --gendata=/home/raghavendra/galera_stress.zz \
# 2013-09-03T20:51:15 [37861] # --threads=16 \
# 2013-09-03T20:51:15 [37861] # --queries=50000 \
# 2013-09-03T20:51:15 [37861] # --mysqld=--wsrep-provider=/pxc/lib/libgalera_smm.so \
# 2013-09-03T20:51:15 [37861] # --mysqld=--innodb_flush_method=O_DIRECT \
# 2013-09-03T20:51:15 [37861] # --mysqld=--lock_wait_timeout=50 \
# 2013-09-03T20:51:15 [37861] # --mysqld=--wsrep-debug=1 \
# 2013-09-03T20:51:15 [37861] # --mysqld=--wsrep_provider_options=debug=1 \
# 2013-09-03T20:51:15 [37861] # --mysqld=--wsrep_log_conflicts=ON
# 2013-09-03T20:51:15 [37861] master_port : 19300 slave_port : 19302 ports : 19300 19302 MTR_BUILD_THREAD : 930
# 2013-09-03T20:51:15 [37861] /media/Tintin/Work/code/randgen/pxc Revno: 906
# 2013-09-03T20:51:15 [37861] /media/Tintin/Work/code/randgen/pxc Revision-Id: <email address hidden>
# 2013-09-03T20:51:15 [37861] /pxc does not look like a bzr branch, cannot get revision info.
# 2013-09-03T20:51:16 [37861] Creating MySQL 5.5.33 database at /pxc/mysql-test/var/node0/data
# 2013-09-03T20:51:22 [37861] Creating MySQL 5.5.33 database at /pxc/mysql-test/var/node1/data
# 2013-09-03T20:51:31 [37861] MySQL Version:5.5.33
# 2013-09-03T20:51:31 [37861] Binary: /pxc/bin/mysqld
# 2013-09-03T20:51:31 [37861] Type: Debug
# 2013-09-03T20:51:31 [37861] Datadir: /pxc/mysql-test/var/node0/data
# 2013-09-03T20:51:31 [37861] Tmpdir: /pxc/mysql-test/var/node0/tmp
# 2013-09-03T20:51:31 [37861] Corefile: /pxc/mysql-test/var/node0/data/core.
# 2013-09-03T20:51:31 [37861] Starting MySQL 5.5.33: "/pxc/bin/mysqld" "--no-defaults" "--basedir=/pxc" "--datadir=/pxc/mysql-test/var/node0/data" "--lc-messages-dir=/pxc/share/mysql" "--character-sets-dir=/pxc/share/mysql/charsets" "--tmpdir=/pxc/mysql-test/var/node0/tmp" "--core-file" "--max-allowed-packet=128Mb" "--port=19300" "--socket=/pxc/mysql-test/var/node0/mysql.sock" "--pid-file=/pxc/mysql-test/var/node0/mysql.pid" "--general-log" "--general-log-file=/pxc/m...

Seppo Jaakola (seppo-jaakola) wrote :

Debugging further shows that replaying transactions can also cause this same symptom. The scenario is like:

1. transaction was BF aborted
2. the aborted transaction had to replay
3. next statement in the connection issued implicit commit immediately after the replaying

A fix for this scenario is pushed in revision: http://bazaar.launchpad.net/~codership/codership-mysql/5.5-23/revision/3911

Download full text (29.0 KiB)

Crash with:

perl runall-new.pl --basedir=/pxc --galera=ms --reporter=Shutdown,Backtrace,ErrorLog,ErrorLogAlarm,Deadlock --grammar=$HOME/galera_stress.yy --gendata=$HOME/galera_stress.zz --threads=16 --queries=100000 --mysqld=--wsrep-provider=/pxc/lib/libgalera_smm.so --mysqld=--innodb_flush_method=O_DIRECT --mysqld=--wsrep-debug=1 --mysqld=--wsrep_provider_options='debug=1' --mysqld=--wsrep_log_conflicts=ON
# 2013-09-04T23:17:28 [37691] Using Log::Log4perl
# 2013-09-04T23:17:28 [37691] Copyright (c) 2010,2011 Oracle and/or its affiliates. All rights reserved. Use is subject to license terms.
# 2013-09-04T23:17:28 [37691] Please see http://forge.mysql.com/wiki/Category:RandomQueryGenerator for more information on this test framework.
# 2013-09-04T23:17:28 [37691] Starting
# 2013-09-04T23:17:28 [37691] # runall-new.pl \
# 2013-09-04T23:17:28 [37691] # --basedir=/pxc \
# 2013-09-04T23:17:28 [37691] # --galera=ms \
# 2013-09-04T23:17:28 [37691] # --reporter=Shutdown,Backtrace,ErrorLog,ErrorLogAlarm,Deadlock \
# 2013-09-04T23:17:28 [37691] # --grammar=/home/raghavendra/galera_stress.yy \
# 2013-09-04T23:17:28 [37691] # --gendata=/home/raghavendra/galera_stress.zz \
# 2013-09-04T23:17:28 [37691] # --threads=16 \
# 2013-09-04T23:17:28 [37691] # --queries=100000 \
# 2013-09-04T23:17:28 [37691] # --mysqld=--wsrep-provider=/pxc/lib/libgalera_smm.so \
# 2013-09-04T23:17:28 [37691] # --mysqld=--innodb_flush_method=O_DIRECT \
# 2013-09-04T23:17:28 [37691] # --mysqld=--wsrep-debug=1 \
# 2013-09-04T23:17:28 [37691] # --mysqld=--wsrep_provider_options=debug=1 \
# 2013-09-04T23:17:28 [37691] # --mysqld=--wsrep_log_conflicts=ON
# 2013-09-04T23:17:28 [37691] master_port : 19300 slave_port : 19302 ports : 19300 19302 MTR_BUILD_THREAD : 930
# 2013-09-04T23:17:28 [37691] /media/Tintin/Work/code/randgen/pxc Revno: 907
# 2013-09-04T23:17:28 [37691] /media/Tintin/Work/code/randgen/pxc Revision-Id: <email address hidden>
# 2013-09-04T23:17:28 [37691] /pxc does not look like a bzr branch, cannot get revision info.
# 2013-09-04T23:17:28 [37691] Creating MySQL 5.5.33 database at /pxc/mysql-test/var/node0/data
# 2013-09-04T23:17:35 [37691] Creating MySQL 5.5.33 database at /pxc/mysql-test/var/node1/data
# 2013-09-04T23:17:51 [37691] MySQL Version:5.5.33
# 2013-09-04T23:17:51 [37691] Binary: /pxc/bin/mysqld
# 2013-09-04T23:17:51 [37691] Type: Debug
# 2013-09-04T23:17:51 [37691] Datadir: /pxc/mysql-test/var/node0/data
# 2013-09-04T23:17:51 [37691] Tmpdir: /pxc/mysql-test/var/node0/tmp
# 2013-09-04T23:17:51 [37691] Corefile: /pxc/mysql-test/var/node0/data/core.
# 2013-09-04T23:17:51 [37691] Starting MySQL 5.5.33: "/pxc/bin/mysqld" "--no-defaults" "--basedir=/pxc" "--datadir=/pxc/mysql-test/var/node0/data" "--lc-messages-dir=/pxc/share/mysql" "--character-sets-dir=/pxc/share/mysql/charsets" "--tmpdir=/pxc/mysql-test/var/node0/tmp" "--core-file" "--max-allowed-packet=128Mb" "--port=19300" "--socket=/pxc/mysql-test/var/node0/mysql.sock" "--pid-file=/pxc/mysql-test/var/node0/mysql.pid" "--general-log" "--general-log-file=/pxc/mysql-test/var/node0/mysql.log" --wsrep_sst_method=rsync --innodb_autoinc_lock_mode=2 --default-storage-engi...

Aside the crash reported above, when using lock_wait_timeout of 50s (it was not used in crash of #33):

a)

Compressed - node compromise

$tables = {
        partitions => [ undef , 'KEY (pk) PARTITIONS 2' ],
        engines => [ 'InnoDB' ],
        pk => [ 'int', 'int auto_increment' ],
        rows => [ 1, 100, 1000 ],
        row_formats => ['COMPRESSED','DEFAULT']
};

$fields = {
        types => [ 'int', 'char(12)'],
        indexes => [undef, 'key' ]
};

$data = {
        numbers => [ 'digit', 'null', undef ],
        strings => [ 'letter', 'english', 'char(12)' ]
}

b)
Combined - node compromise

$tables = {
        partitions => [ undef , 'KEY (pk) PARTITIONS 2' ],
        engines => [ 'InnoDB' ],
        pk => [ 'int', 'int auto_increment' ],
        rows => [ 1, 100, 1000 ],
        row_formats => ['COMPRESSED','DEFAULT']
};

$fields = {
        types => [ 'int', 'char(12)', 'varchar', 'date', 'bigint', 'enum' ],
        indexes => [undef, 'key' ],
        charsets => ['utf8','latin1']
};

$data = {
        numbers => [ 'digit', 'null', undef ],
 temporals => ['date', 'year', 'null', undef ],
        enum => [ 'letter' ],
        strings => [ 'letter', 'english', 'char(12)', 'varchar(246)', 'varchar(100)' ]
}

=========================================================================

By node compromise I mean when two nodes are used (as with --galera=ms), node2 shuts down during the run with following:

130902 17:25:00 [ERROR] Slave SQL: Could not execute Write_rows event on table sbtest.b; Cannot add or update a child row: a foreign key constraint fails (`sbtest`.`b`, CONSTRAINT `aid_fk` FOREIGN KEY (`aid`) REFERENCES `a` (`id`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 94, Error_code: 1452
130902 17:25:00 [Warning] WSREP: RBR event 2 Write_rows apply warning: 151, 914
130902 17:25:00 [ERROR] WSREP: Failed to apply trx: source: 8ce7edb7-13c5-11e3-b1ae-36191a17aa6e version: 2 local: 0 state: APPLYING flags: 1 conn_id: 469 trx_id: 2191 seqnos (l: 917, g: 914, s: 913, d: 683, ts: 1378122900523716090)
130902 17:25:00 [ERROR] WSREP: Failed to apply app buffer: seqno: 914, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():52
         at galera/src/replicator_smm.cpp:apply_trx_ws():118
130902 17:25:00 [ERROR] WSREP: Node consistency compromized, aborting...

Alex Yurchenko (ayurchen) wrote :

Raghu, this looks awfully like https://bugs.launchpad.net/codership-mysql/+bug/1204361
Have you merged in the fix?

@Alex,

If you mean the crash of #33 (expanded in https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1123233/comments/33/+download) then I don't see resemblance to lp:1204361 since there is no " [ERROR] WSREP: galera/src/fsm.hpp:shift_to():77: FSM: no such a transition EXECUTING -> COMMITTED" in that crash. The stacktrace also doesn't match.

Also, was https://bazaar.launchpad.net/~codership/codership-mysql/5.5-23/revision/3896 the only fix done for lp:1204361? (the other one seems to be a mysql merge). Also, I do merge 5.5-23 tree, so it should be there.

Moritz Heiber (moritz-heiber) wrote :

We're affected by this bug as well. The backtrace is at

http://paste.ubuntu.com/6082595/

Let me know if you need anything else.

Changed in percona-xtradb-cluster:
status: Confirmed → In Progress
Download full text (24.8 KiB)

Crash with:

 perl runall-new.pl --basedir=/pxc --galera=ms --reporter=Shutdown,Backtrace,ErrorLog,ErrorLogAlarm,Deadlock --grammar=$HOME/galera_stress.yy --gendata=$HOME/galera_stress.zz --threads=16 --queries=50000 --mysqld=--wsrep-provider=/pxc/lib/libgalera_smm.so --mysqld=--innodb_flush_method=O_DIRECT --mysqld=--wsrep-debug=1 --mysqld=--wsrep_provider_options='debug=1' --mysqld=--wsrep_log_conflicts=ON# 2013-09-09T16:14:48 [13025] Using Log::Log4perl
# 2013-09-09T16:14:48 [13025] Copyright (c) 2010,2011 Oracle and/or its affiliates. All rights reserved. Use is subject to license terms.
# 2013-09-09T16:14:48 [13025] Please see http://forge.mysql.com/wiki/Category:RandomQueryGenerator for more information on this test framework.
# 2013-09-09T16:14:48 [13025] Starting
# 2013-09-09T16:14:48 [13025] # runall-new.pl \
# 2013-09-09T16:14:48 [13025] # --basedir=/pxc \
# 2013-09-09T16:14:48 [13025] # --galera=ms \
# 2013-09-09T16:14:48 [13025] # --reporter=Shutdown,Backtrace,ErrorLog,ErrorLogAlarm,Deadlock \
# 2013-09-09T16:14:48 [13025] # --grammar=/home/raghavendra/galera_stress.yy \
# 2013-09-09T16:14:48 [13025] # --gendata=/home/raghavendra/galera_stress.zz \
# 2013-09-09T16:14:48 [13025] # --threads=16 \
# 2013-09-09T16:14:48 [13025] # --queries=50000 \
# 2013-09-09T16:14:48 [13025] # --mysqld=--wsrep-provider=/pxc/lib/libgalera_smm.so \
# 2013-09-09T16:14:48 [13025] # --mysqld=--innodb_flush_method=O_DIRECT \
# 2013-09-09T16:14:48 [13025] # --mysqld=--wsrep-debug=1 \
# 2013-09-09T16:14:48 [13025] # --mysqld=--wsrep_provider_options=debug=1 \
# 2013-09-09T16:14:48 [13025] # --mysqld=--wsrep_log_conflicts=ON
# 2013-09-09T16:14:48 [13025] master_port : 19300 slave_port : 19302 ports : 19300 19302 MTR_BUILD_THREAD : 930
# 2013-09-09T16:14:48 [13025] /media/Tintin/Work/code/randgen/pxc Revno: 910
# 2013-09-09T16:14:48 [13025] /media/Tintin/Work/code/randgen/pxc Revision-Id: <email address hidden>
# 2013-09-09T16:14:48 [13025] /pxc does not look like a bzr branch, cannot get revision info.
# 2013-09-09T16:14:49 [13025] Creating MySQL 5.5.33 database at /pxc/mysql-test/var/node0/data
# 2013-09-09T16:14:55 [13025] Creating MySQL 5.5.33 database at /pxc/mysql-test/var/node1/data
# 2013-09-09T16:15:13 [13025] MySQL Version:5.5.33
# 2013-09-09T16:15:13 [13025] Binary: /pxc/bin/mysqld
# 2013-09-09T16:15:13 [13025] Type: Debug
# 2013-09-09T16:15:13 [13025] Datadir: /pxc/mysql-test/var/node0/data
# 2013-09-09T16:15:13 [13025] Tmpdir: /pxc/mysql-test/var/node0/tmp
# 2013-09-09T16:15:13 [13025] Corefile: /pxc/mysql-test/var/node0/data/core.
# 2013-09-09T16:15:13 [13025] Starting MySQL 5.5.33: "/pxc/bin/mysqld" "--no-defaults" "--basedir=/pxc" "--datadir=/pxc/mysql-test/var/node0/data" "--lc-messages-dir=/pxc/share/mysql" "--character-sets-dir=/pxc/share/mysql/charsets" "--tmpdir=/pxc/mysql-test/var/node0/tmp" "--core-file" "--max-allowed-packet=128Mb" "--port=19300" "--socket=/pxc/mysql-test/var/node0/mysql.sock" "--pid-file=/pxc/mysql-test/var/node0/mysql.pid" "--general-log" "--general-log-file=/pxc/mysql-test/var/node0/mysql.log" --wsrep_sst_method=rsync --innodb_autoinc_lock_mode=2 --default-storage-engine=...

For easier reference:

The above is with (without timeout)

$tables = {
        partitions => [ undef , 'KEY (pk) PARTITIONS 2' ],
        engines => [ 'InnoDB' ],
        pk => [ 'int', 'int auto_increment' ],
        rows => [ 1, 100, 1000 ],
        row_formats => [''DEFAULT']
};

$fields = {
        types => [ 'int', 'char(12)', 'varchar', 'date', 'bigint', 'enum' ],
        indexes => [undef, 'key' ],
        charsets => ['utf8','latin1']
};

$data = {
        numbers => [ 'digit', 'null', undef ],
 temporals => ['date', 'year', 'null', undef ],
        enum => [ 'letter' ],
        strings => [ 'letter', 'english', 'char(12)', 'varchar(246)', 'varchar(100)' ]
}

No node compromise or hangs after last fix.

There was a crash which has been mentioned above.

This is how it looks now:

Timeout:

Compressed - OK
Others - OK
Combined - OK

without:

combined - OK
Others - crash
Compressed: OK

Download full text (23.8 KiB)

Running with

$tables = {
        partitions => [ undef , 'KEY (pk) PARTITIONS 2' ],
        engines => [ 'InnoDB' ],
        pk => [ 'int', 'int auto_increment' ],
        rows => [ 1, 100, 1000 ]
};

$fields = {
        types => [ 'int', 'char(12)'],
        indexes => [undef, 'key' ]
};

$data = {
        numbers => [ 'digit', 'null', undef ],
        strings => [ 'letter', 'english', 'char(12)' ]
}

alone also caused the crash

11:47:33 Running with galera grammar
11:47:36 # 2013-09-09T11:36:39 [22019] Copyright (c) 2010,2011 Oracle and/or its affiliates. All rights reserved. Use is subject to license terms.
11:47:36 # 2013-09-09T11:36:39 [22019] Please see http://forge.mysql.com/wiki/Category:RandomQueryGenerator for more information on this test framework.
11:47:36 # 2013-09-09T11:36:39 [22019] Starting
11:47:36 # 2013-09-09T11:36:39 [22019] # runall-new.pl \
11:47:36 # 2013-09-09T11:36:39 [22019] # --basedir=/mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/Percona-XtraDB-Cluster-5.5.33-23.7.6.483.Linux.x86_64 \
11:47:36 # 2013-09-09T11:36:39 [22019] # --galera=ms \
11:47:36 # 2013-09-09T11:36:39 [22019] # --reporter=Shutdown,Backtrace,ErrorLog,ErrorLogAlarm,Deadlock,QueryTimeout \
11:47:36 # 2013-09-09T11:36:39 [22019] # --grammar=conf/galera/galera_stress.yy \
11:47:36 # 2013-09-09T11:36:39 [22019] # --gendata=conf/galera/galera_stress.zz \
11:47:36 # 2013-09-09T11:36:39 [22019] # --threads=16 \
11:47:36 # 2013-09-09T11:36:39 [22019] # --queries=50000 \
11:47:36 # 2013-09-09T11:36:39 [22019] # --mysqld=--wsrep-provider=/mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/Percona-XtraDB-Cluster-5.5.33-23.7.6.483.Linux.x86_64/lib/libgalera_smm.so \
11:47:36 # 2013-09-09T11:36:39 [22019] # --seed=time \
11:47:36 # 2013-09-09T11:36:39 [22019] # --mysqld=--innodb_flush_method=O_DIRECT
11:47:36 # 2013-09-09T11:36:39 [22019] master_port : 19300 slave_port : 19302 ports : 19300 19302 MTR_BUILD_THREAD : 930
11:47:37 # 2013-09-09T11:36:39 [22019] /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/pxcgen Revno: 911
11:47:37 # 2013-09-09T11:36:39 [22019] /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/pxcgen Revision-Id: <email address hidden>
11:47:38 # 2013-09-09T11:36:40 [22019] /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/Percona-XtraDB-Cluster-5.5.33-23.7.6.483.Linux.x86_64 does not look like a bzr branch, cannot get revision info.
11:47:38 # 2013-09-09T11:36:40 [22019] Creating MySQL 5.5.33 database at /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/Percona-XtraDB-Cluster-5.5.33-23.7.6.483.Linux.x86_64/mysql-test/var/node0/data
11:47:44 # 2013-09-09T11:36:46 [22019] Creating MySQL 5.5.33 database at /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/Percona-XtraDB-Cluster-5.5.33-23.7.6.483.Linux.x86_64/mysql-test/var/node1/data
11:47:53 # 2013-09-09T11:36:55 [22019] MySQL Version:5.5.33
11:47:53 # 2013-09-09T11:36:55 [22019] Binary: /mnt/workspace/PXC-5.5-mrandgen/BTYPE/debug/Host/centos6-64/Percona-XtraDB-Cluster-5.5.33-23.7.6.483.Linux.x86_64/bin/mysqld
11:47:53 # 2013-09-09T11:36:55 [22019] Type: Debug
11:47:...

Another thing to note, these crashes have only been in debug builds (== UNIV_DEBUG) so far, the release builds are good (as far as jenkins is concerned).

Release build crashed as well.

perl runall-new.pl --basedir=${MYSQL_BASEDIR} --galera=ms --reporter=Shutdown,Backtrace,ErrorLog,ErrorLogAlarm,Deadlock --grammar=conf/galera/galera_stress.yy --gendata=conf/galera/galera_stress.zz --threads=16 --queries=50000 --mysqld=--wsrep-provider=$MYSQL_BASEDIR/lib/libgalera_smm.so --mysqld=--innodb_flush_method=O_DIRECT --mysqld=--lock_wait_timeout=50 --seed=time

http://jenkins.percona.com/job/PXC-5.5-mrandgen/78/BTYPE=release,Host=centos6-64/consoleText

where galera_stress.zz

is

 $tables = {
         partitions => [ undef , 'KEY (pk) PARTITIONS 2' ],
         engines => [ 'InnoDB' ],
         pk => [ 'int', 'int auto_increment' ],
         rows => [ 1, 100, 1000 ]
 };

 $fields = {
         types => [ 'int', 'char(12)'],
         indexes => [undef, 'key' ]
 };

 $data = {
         numbers => [ 'digit', 'null', undef ],
         strings => [ 'letter', 'english', 'char(12)' ]
 }

Also crashes with normal grammar + lock_wait_timeout and with complex grammar.

normal grammar being:

 $tables = {
         partitions => [ undef , 'KEY (pk) PARTITIONS 2' ],
         engines => [ 'InnoDB' ],
         pk => [ 'int', 'int auto_increment' ],
         rows => [ 1, 100, 1000 ]
 };

 $fields = {
         types => [ 'int', 'char(12)'],
         indexes => [undef, 'key' ]
 };

 $data = {
         numbers => [ 'digit', 'null', undef ],
         strings => [ 'letter', 'english', 'char(12)' ]
 }

Complex one:

 $tables = {
         partitions => [ undef , 'KEY (pk) PARTITIONS 2' ],
         engines => [ 'InnoDB' ],
         pk => [ 'int', 'int auto_increment' ],
         rows => [ 1, 100, 1000 ],
         row_formats => ['COMPRESSED','DEFAULT']
 };

 $fields = {
         types => [ 'int', 'char(12)', 'varchar', 'date', 'bigint', 'enum' ],
         indexes => [undef, 'key' ],
         charsets => ['utf8','latin1']
 };

 $data = {
         numbers => [ 'digit', 'null', undef ],
         temporals => ['date', 'year', 'null', undef ],
         enum => [ 'letter' ],
         strings => [ 'letter', 'english', 'char(12)', 'varchar(246)', 'varchar(100)' ]
 }

Changed in percona-xtradb-cluster:
milestone: 5.5.33-23.7.6 → future-5.5

I see this issue with 5.6 tree as well, so the fix will need to ported there.

Getting node consistency issues (in two node case, on second node) with 5.6 with grammar same as in 5.5.

Not seen this issue for a while now - in both 5.5 and 5.6.

Dmitry Gribov (grib-d) wrote :

Issue is still not reproduced.
For me this looks like it should be considered "closed".

Alex Yurchenko (ayurchen) wrote :

Must have been fixed in one of the recent releases.

Thanks, Dmitry!

Changed in codership-mysql:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers