Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Bug #1189430
Comment #22

Comment 22 for bug 1189430

Revision history for this message

Teemu Ollakka (teemu-ollakka) wrote on 2013-10-27:

#22

Reproduced this with ROLLBACK heavy pure DML load against single node.

From multimaster pure DML run, helgrind errors indicate that there might be possibility or race:

==7868== Possible data race during read of size 1 at 0x1483C0A8 by thread #32
==7868== Locks held: none
==7868== at 0x85132F: rec_get_deleted_flag (mach0data.ic:54)
==7868== by 0x85AF05: row_search_for_mysql (row0sel.c:4360)
==7868== by 0x81ED5C: ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int) (ha_innodb.cc:6812)
==7868== by 0x81EEAC: ha_innobase::rnd_next(unsigned char*) (ha_innodb.cc:7006)
==7868== by 0x7A2286: rr_sequential(READ_RECORD*) (records.cc:461)
==7868== by 0x5D0E0E: sub_select(JOIN*, st_join_table*, bool) (sql_select.cc:11801)
==7868== by 0x5D9BEC: do_select(JOIN*, List<Item>*, TABLE*, Procedure*) (sql_select.cc:11562)
==7868== by 0x5EDB08: JOIN::exec() (sql_select.cc:2385)
==7868== by 0x5E7FE6: mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) (sql_select.cc:2604)
==7868== by 0x5EE219: handle_select(THD*, LEX*, select_result*, unsigned long) (sql_select.cc:297)
==7868== by 0x59AF7C: execute_sqlcom_select(THD*, TABLE_LIST*) (sql_parse.cc:5040)
==7868== by 0x5A36DE: mysql_execute_command(THD*) (sql_parse.cc:2475)
==7868== by 0x5AB83A: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6266)
==7868== by 0x5ABA90: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6099)
==7868== by 0x5AD0F8: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1214)
==7868== by 0x5AF918: do_command(THD*) (sql_parse.cc:869)
==7868== by 0x658320: do_handle_one_connection(THD*) (sql_connect.cc:887)
==7868== by 0x6584DC: handle_one_connection (sql_connect.cc:799)
==7868== by 0x4C2DC3D: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7868== by 0x5899E99: start_thread (pthread_create.c:308)
==7868== by 0x67D7CCC: clone (clone.S:112)
==7868==
==7868== This conflicts with a previous write of size 1 by thread #29
==7868== Locks held: none
==7868== at 0x86160B: row_upd_rec_in_place (mach0data.ic:41)
==7868== by 0x8BF796: btr_cur_update_in_place (btr0cur.c:1919)
==7868== by 0x8BFDBA: btr_cur_optimistic_update (btr0cur.c:2023)
==7868== by 0x9C0D38: row_undo_mod_clust_low (row0umod.c:103)
==7868== by 0x9C23C9: row_undo_mod_clust (row0umod.c:247)
==7868== by 0x9C3694: row_undo_mod (row0umod.c:891)
==7868== by 0x9B8B6A: row_undo_step (row0undo.c:306)
==7868== by 0x9A6905: que_run_threads (que0que.c:1245)

Thread #29 stack trace was not complete, but there is a high possibility that it was still doing the same rollback as reported in previous error:

==7868== Possible data race during read of size 1 at 0x14118C98 by thread #29
==7868== Locks held: none
==7868== at 0x8DD996: buf_page_get_gen (buf0buf.ic:1000)
==7868== by 0x89698C: trx_sys_update_mysql_binlog_offset (trx0sys.ic:119)
==7868== by 0x89A7B2: trx_write_serialisation_history (trx0trx.c:893)
==7868== by 0x89C056: trx_commit_off_kernel (trx0trx.c:945)
==7868== by 0x891944: trx_finish_rollback_off_kernel (trx0roll.c:1286)
==7868== by 0x9A58AE: que_thr_dec_refer_count (que0que.c:869)
==7868== by 0x9A6A1B: que_run_threads (que0que.c:1317)
==7868== by 0x892D46: trx_general_rollback_for_mysql (trx0roll.c:100)
==7868== by 0x89332A: trx_rollback_for_mysql (trx0roll.c:149)
==7868== by 0x8180B9: innobase_rollback(handlerton*, THD*, bool) (ha_innodb.cc:3009)
==7868== by 0x6CDC15: ha_rollback_trans(THD*, bool) (handler.cc:1518)
==7868== by 0x6CE337: ha_commit_trans(THD*, bool) (handler.cc:1353)
==7868== by 0x66663B: trans_commit(THD*) (transaction.cc:199)
==7868== by 0x5A7593: mysql_execute_command(THD*) (sql_parse.cc:4143)
==7868== by 0x5AB83A: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6266)
==7868== by 0x5ABA90: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6099)
==7868== by 0x5AD0F8: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1214)
==7868== by 0x5AF918: do_command(THD*) (sql_parse.cc:869)
==7868== by 0x658320: do_handle_one_connection(THD*) (sql_connect.cc:887)
==7868== by 0x6584DC: handle_one_connection (sql_connect.cc:799)
==7868== by 0x4C2DC3D: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7868== by 0x5899E99: start_thread (pthread_create.c:308)
==7868== by 0x67D7CCC: clone (clone.S:112)

As both threads #29 and #32 were running in local mode, it is quite likely that locally executed rollback is the culprit here.

Reproduced this with ROLLBACK heavy pure DML load against single node.

From multimaster pure DML run, helgrind errors indicate that there might be possibility or race:

==7868== Possible data race during read of size 1 at 0x1483C0A8 by thread #32
==7868== Locks held: none
==7868==    at 0x85132F: rec_get_deleted_flag (mach0data.ic:54)
==7868==    by 0x85AF05: row_search_for_mysql (row0sel.c:4360)
==7868==    by 0x81ED5C: ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int) (ha_innodb.cc:6812)
==7868==    by 0x81EEAC: ha_innobase::rnd_next(unsigned char*) (ha_innodb.cc:7006)
==7868==    by 0x7A2286: rr_sequential(READ_RECORD*) (records.cc:461)
==7868==    by 0x5D0E0E: sub_select(JOIN*, st_join_table*, bool) (sql_select.cc:11801)
==7868==    by 0x5D9BEC: do_select(JOIN*, List<Item>*, TABLE*, Procedure*) (sql_select.cc:11562)
==7868==    by 0x5EDB08: JOIN::exec() (sql_select.cc:2385)
==7868==    by 0x5E7FE6: mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) (sql_select.cc:2604)
==7868==    by 0x5EE219: handle_select(THD*, LEX*, select_result*, unsigned long) (sql_select.cc:297)
==7868==    by 0x59AF7C: execute_sqlcom_select(THD*, TABLE_LIST*) (sql_parse.cc:5040)
==7868==    by 0x5A36DE: mysql_execute_command(THD*) (sql_parse.cc:2475)
==7868==    by 0x5AB83A: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6266)
==7868==    by 0x5ABA90: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6099)
==7868==    by 0x5AD0F8: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1214)
==7868==    by 0x5AF918: do_command(THD*) (sql_parse.cc:869)
==7868==    by 0x658320: do_handle_one_connection(THD*) (sql_connect.cc:887)
==7868==    by 0x6584DC: handle_one_connection (sql_connect.cc:799)
==7868==    by 0x4C2DC3D: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7868==    by 0x5899E99: start_thread (pthread_create.c:308)
==7868==    by 0x67D7CCC: clone (clone.S:112)
==7868== 
==7868== This conflicts with a previous write of size 1 by thread #29
==7868== Locks held: none
==7868==    at 0x86160B: row_upd_rec_in_place (mach0data.ic:41)
==7868==    by 0x8BF796: btr_cur_update_in_place (btr0cur.c:1919)
==7868==    by 0x8BFDBA: btr_cur_optimistic_update (btr0cur.c:2023)
==7868==    by 0x9C0D38: row_undo_mod_clust_low (row0umod.c:103)
==7868==    by 0x9C23C9: row_undo_mod_clust (row0umod.c:247)
==7868==    by 0x9C3694: row_undo_mod (row0umod.c:891)
==7868==    by 0x9B8B6A: row_undo_step (row0undo.c:306)
==7868==    by 0x9A6905: que_run_threads (que0que.c:1245)

Thread #29 stack trace was not complete, but there is a high possibility that it was still doing the same rollback as reported in previous error:

==7868== Possible data race during read of size 1 at 0x14118C98 by thread #29
==7868== Locks held: none
==7868==    at 0x8DD996: buf_page_get_gen (buf0buf.ic:1000)
==7868==    by 0x89698C: trx_sys_update_mysql_binlog_offset (trx0sys.ic:119)
==7868==    by 0x89A7B2: trx_write_serialisation_history (trx0trx.c:893)
==7868==    by 0x89C056: trx_commit_off_kernel (trx0trx.c:945)
==7868==    by 0x891944: trx_finish_rollback_off_kernel (trx0roll.c:1286)
==7868==    by 0x9A58AE: que_thr_dec_refer_count (que0que.c:869)
==7868==    by 0x9A6A1B: que_run_threads (que0que.c:1317)
==7868==    by 0x892D46: trx_general_rollback_for_mysql (trx0roll.c:100)
==7868==    by 0x89332A: trx_rollback_for_mysql (trx0roll.c:149)
==7868==    by 0x8180B9: innobase_rollback(handlerton*, THD*, bool) (ha_innodb.cc:3009)
==7868==    by 0x6CDC15: ha_rollback_trans(THD*, bool) (handler.cc:1518)
==7868==    by 0x6CE337: ha_commit_trans(THD*, bool) (handler.cc:1353)
==7868==    by 0x66663B: trans_commit(THD*) (transaction.cc:199)
==7868==    by 0x5A7593: mysql_execute_command(THD*) (sql_parse.cc:4143)
==7868==    by 0x5AB83A: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6266)
==7868==    by 0x5ABA90: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6099)
==7868==    by 0x5AD0F8: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1214)
==7868==    by 0x5AF918: do_command(THD*) (sql_parse.cc:869)
==7868==    by 0x658320: do_handle_one_connection(THD*) (sql_connect.cc:887)
==7868==    by 0x6584DC: handle_one_connection (sql_connect.cc:799)
==7868==    by 0x4C2DC3D: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7868==    by 0x5899E99: start_thread (pthread_create.c:308)
==7868==    by 0x67D7CCC: clone (clone.S:112)

As both threads #29 and #32 were running in local mode, it is quite likely that locally executed rollback is the culprit here.