Percona XtraDB Cluster - HA scalable solution for MySQL

void Diagnostics_area::set_ok_status(ulonglong, ulonglong, const char*): Assertion `! is_set()' failed.

Reported by Raghavendra D Prabhu on 2013-10-02
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Medium
Teemu Ollakka
5.5
Undecided
Unassigned
Percona XtraDB Cluster
Status tracked in Trunk
5.6
Undecided
Unassigned
Trunk
Undecided
Unassigned

Bug Description

===========================================================

mysqld: /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_error.cc:430: void Diagnostics_area::set_ok_status(ulonglong, ulonglong, const char*): Assertion `! is_set()' failed.
21:37:02 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=20
max_threads=153
thread_count=19
connection_count=19
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 69207 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x1d9cd90
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f06b86f3ed8 thread_stack 0x40000
/pxc56/bin/mysqld(my_print_stacktrace+0x2c)[0x942498]
/pxc56/bin/mysqld(handle_fatal_signal+0x304)[0x6d8248]
/usr/lib/libpthread.so.0(+0xf870)[0x7f06db262870]
/usr/lib/libc.so.6(gsignal+0x39)[0x7f06d95f03d9]
/usr/lib/libc.so.6(abort+0x148)[0x7f06d95f17d8]
/usr/lib/libc.so.6(+0x2e506)[0x7f06d95e9506]
/usr/lib/libc.so.6(+0x2e5b2)[0x7f06d95e95b2]
/pxc56/bin/mysqld(_ZN16Diagnostics_area13set_ok_statusEyyPKc+0x62)[0x72fbcc]
/pxc56/bin/mysqld(_Z5my_okP3THDyyPKc+0x35)[0x63b188]
/pxc56/bin/mysqld(_Z24wsrep_replay_transactionP3THD+0x453)[0x63aa30]
/pxc56/bin/mysqld[0x75af8b]
/pxc56/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xbab)[0x75be6d]
/pxc56/bin/mysqld(_Z10do_commandP3THD+0x50c)[0x75d3c3]
/pxc56/bin/mysqld(_Z24do_handle_one_connectionP3THD+0x129)[0x729a7d]
/pxc56/bin/mysqld(handle_one_connection+0x20)[0x729ba5]
/usr/lib/libpthread.so.0(+0x80a2)[0x7f06db25b0a2]
/usr/lib/libc.so.6(clone+0x6d)[0x7f06d96a043d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 15
Status: NOT_KILLED

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
Writing a core file
# 2013-10-03T03:07:25 [26414] Test completed with failure status STATUS_SERVER_CRASHED (101)
# 2013-10-03T03:07:25 [26414] GenTest exited with exit status STATUS_SERVER_CRASHED (101)
# 2013-10-03T03:07:25 [26414] Stopping server on port 19300
# 2013-10-03T03:07:25 [26414] Stale connection. Reconnecting
# 2013-10-03T03:07:25 [26414] Reconnect failed due to 2003:Can't connect to MySQL server on '127.0.0.1' (111)
# 2013-10-03T03:07:25 [26414] Server would not shut down properly. Terminate it
# 2013-10-03T03:07:25 [26414] Terminated process 26829
# 2013-10-03T03:07:25 [26414] Stopping server on port 19301
# 2013-10-03T03:07:25 [26414] Stale connection. Reconnecting
# 2013-10-03T03:07:25 [26414] Reconnect failed due to 2003:Can't connect to MySQL server on '127.0.0.1' (111)
# 2013-10-03T03:07:25 [26414] Server would not shut down properly. Terminate it
# 2013-10-03T03:07:25 [26414] Terminated process 26867
# 2013-10-03T03:07:25 [26414] [26414] runall-new.pl will exit with exit status STATUS_SERVER_CRASHED (101)
perl runall-new.pl --basedir=/pxc56 --galera=ms --threads=16 --seed=tim 9.45s user 0.82s system 20% cpu 50.862 total

=====================================================================

The crash in https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1234395 is masking this one since I am not getting many crashes with this since most of the times it is crashing with former.

Actually when node1 crashed with lp:1234395, node0 crashed with this error.

(gdb) bt
#0 0x00007f86f86780b1 in pthread_kill () from /usr/lib/libpthread.so.0
#1 0x0000000000941cbf in my_write_core (sig=6) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/mysys/stacktrace.c:422
#2 0x00000000006d83ec in handle_fatal_signal (sig=6) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/signal_handler.cc:254
#3 <signal handler called>
#4 0x00007f86f6a083d9 in raise () from /usr/lib/libc.so.6
#5 0x00007f86f6a097d8 in abort () from /usr/lib/libc.so.6
#6 0x00007f86f6a01506 in __assert_fail_base () from /usr/lib/libc.so.6
#7 0x00007f86f6a015b2 in __assert_fail () from /usr/lib/libc.so.6
#8 0x000000000072fbcc in Diagnostics_area::set_ok_status (this=0x2825078, affected_rows=affected_rows@entry=0, last_insert_id=last_insert_id@entry=0, message=message@entry=0x0)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_error.cc:430
#9 0x000000000063b188 in my_ok (thd=thd@entry=0x2821800, affected_rows=affected_rows@entry=0, id=id@entry=0, message=message@entry=0x0)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_class.h:4348
#10 0x000000000063aa30 in wsrep_replay_transaction (thd=thd@entry=0x2821800) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/wsrep_thd.cc:222
#11 0x000000000075af8b in wsrep_mysql_parse (thd=thd@entry=0x2821800, rawbuf=0x7f8664004c50 '\217' <repeats 200 times>..., length=length@entry=22, parser_state=parser_state@entry=0x7f86e4231820)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:6814
#12 0x000000000075be6d in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x2821800, packet=packet@entry=0x2814a61 "DROP TABLE IF EXISTS b ", packet_length=packet_length@entry=23)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:1628
#13 0x000000000075d3c3 in do_command (thd=0x2821800) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:1133
#14 0x0000000000729a7d in do_handle_one_connection (thd_arg=thd_arg@entry=0x2821800) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_connect.cc:1640
#15 0x0000000000729ba5 in handle_one_connection (arg=0x2821800) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_connect.cc:1544
#16 0x00007f86f86730a2 in start_thread () from /usr/lib/libpthread.so.0
#17 0x00007f86f6ab843d in clone () from /usr/lib/libc.so.6
(

Attached gdb.txt

Tested with:

perl runall-new.pl --basedir=/pxc56 --galera=ms --mysqld=--skip-performance-schema --reporter=Shutdown,Backtrace,ErrorLog,ErrorLogAlarm,Deadlock --grammar=conf/galera/galera_stress-56.yy --gendata=conf/galera/galera_stress-56.zz --threads=16 --queries=50000 --mysqld=--wsrep-provider=/pxc56/lib/libgalera_smm.so --seed=time --mysqld=--innodb_flush_method=O_DIRECT --mysqld=--lock_wait_timeout=50

Grammar files attached.

The " --mysqld=--lock_wait_timeout=50" is optional, so test with either for the crash.

Few more things:

  main_da = {
    m_is_sent = false,
    m_can_overwrite_status = false,
    m_message = "\000uery execution was interrupted\000_parts_2_int' doesn't exist\000t exist\000st\000", '\217' <repeats 441 times>,
    m_sql_errno = 1317,
    m_sqlstate = "70100",
    m_affected_rows = 0,
    m_last_insert_id = 0,
    m_statement_warn_count = 0,
    m_status = Diagnostics_area::DA_OK,
    m_main_wi = {
      m_warn_root = {
        free = 0x7f04d4051d70,
        used = 0x0,
        pre_alloc = 0x0,
        min_malloc = 32,
        block_size = 2016,
        block_num = 5,
        first_block_usage = 0,
        error_handler = 0x88215c <sql_alloc_error_handler()>
      },
      m_warn_list = {
        <I_P_List_counter> = {
          m_counter = 1
        },
        <I_P_List_fast_push_back<Sql_condition>> = {
          m_last = 0x7f04d4051ef0
        },
        members of I_P_List<Sql_condition, I_P_List_adapter<Sql_condition, &Sql_condition::next_in_wi, &Sql_condition::prev_in_wi>, I_P_List_counter, I_P_List_fast_push_back<Sql_condition> >:
        m_first = 0x7f04d4051d80
      },
      m_warn_count = {0, 0, 1},
      m_current_statement_warn_count = 0,
      m_current_row_for_warning = 1,
      m_warn_id = 130821,
      m_error_condition = 0x0,
      m_allow_unlimited_warnings = false,
      m_read_only = false,
      m_next_in_da = 0x0,
      m_prev_in_da = 0x1d83ab8,
      m_marked_sql_conditions = {
        <base_list> = {
          <Sql_alloc> = {<No data fields>},
          members of base_list:
          first = 0x149d720 <end_of_list>,
          last = 0x1d83a90,
          elements = 0
        }, <No data fields>}
    },
    m_wi_stack = {
      <I_P_List_counter> = {
        m_counter = 1
      },
      <I_P_List_fast_push_back<Warning_info>> = {
        m_last = 0x1d83a80
      },
      members of I_P_List<Warning_info, I_P_List_adapter<Warning_info, &Warning_info::m_next_in_da, &Warning_info::m_prev_in_da>, I_P_List_counter, I_P_List_fast_push_back<Warning_info> >:
      m_first = 0x1d83a00
    }
  },

Another thing, error checking is missing for wsrep->post_commit in wsrep_replay_transaction unlike wsrep_cleanup_transaction.

print *thd

Also, note that since this usually happens after other node crashes, you will need to run node0 with pc.ignore_sb=true (otherwise it won't proceed), the pxc randgen has this set by default.

Download full text (8.5 KiB)

Ran with --mysqld=--wsrep-debug=1 --mysqld=--wsrep_provider_options='debug=1'

From following:

2013-10-09 00:54:58 43148 [Note] WSREP: TO BEGIN: 638, 2
2013-10-09 00:54:58 43148 [Note] WSREP: MDL conflict
request: (15 seqno 638 wsrep (2, 1, 0) cmd 3 9 DROP TABLE `table100_innodb_key_pk_parts_2_int_autoinc`)
granted: (12 seqno 639 wsrep (0, 2, 0) cmd 3 54 COMMIT)
2013-10-09 00:54:58 43148 [Note] WSREP: MDL ticket: type: shared read space: TABLE db: test name: table100_innodb_key_pk_parts_2_int_autoinc
2013-10-09 00:54:58 43148 [Note] WSREP: DROP caused BF abort
2013-10-09 00:54:58 43148 [Note] WSREP: MDL ticket: type: shared read space: TABLE db: test name: table100_innodb_key_pk_parts_2_int_autoinc
2013-10-09 00:54:58 43148 [Note] WSREP: wsrep_abort_thd, by: 139940104804096, victim: 139940105602816
2013-10-09 00:54:58 43148 [Note] WSREP: abort transaction: BF: DROP TABLE `table100_innodb_key_pk_parts_2_int_autoinc` victim: COMMIT
2013-10-09 00:54:58 43148 [Note] WSREP: cluster conflict due to high priority abort for threads:
2013-10-09 00:54:58 43148 [Note] WSREP: Winning thread:
   THD: 15, mode: total order, state: executing, conflict: no conflict, seqno: 638
   SQL: DROP TABLE `table100_innodb_key_pk_parts_2_int_autoinc`
2013-10-09 00:54:58 43148 [Note] WSREP: Victim thread:
   THD: 12, mode: local, state: committing, conflict: no conflict, seqno: 639
   SQL: COMMIT
2013-10-09 00:54:58 43148 [Note] WSREP: BF kill (1, seqno: 638), victim: (12) trx: 7691
2013-10-09 00:54:58 43148 [Note] WSREP: Aborting query: COMMIT
2013-10-09 00:54:58 43148 [Note] WSREP: kill trx QUERY_COMMITTING for 7691
2013-10-09 00:54:58 43148 [Note] [Debug] WSREP: galera/src/replicator_smm.cpp:abort_trx():672: aborting trx source: 34df666e-304f-11e3-b505-e799705b723b version: 3 local: 1 state: COMMITTING flags: 133 conn_id: 12 trx_id: 7691 seqnos (l: 661, g: 639, s: 636, d: 638, ts: 101562216784543) 0x7f46040b9f00
2013-10-09 00:54:58 43148 [Note] WSREP: MDL conflict
request: (15 seqno 638 wsrep (2, 1, 0) cmd 3 9 DROP TABLE `table100_innodb_key_pk_parts_2_int_autoinc`)
granted: (12 seqno 639 wsrep (0, 2, 1) cmd 3 54 COMMIT)
2013-10-09 00:54:58 43148 [Note] WSREP: MDL ticket: type: shared read space: TABLE db: test name: table100_innodb_key_pk_parts_2_int_autoinc
2013-10-09 00:54:58 43148 [Note] WSREP: DROP caused BF abort
2013-10-09 00:54:58 43148 [Note] WSREP: MDL ticket: type: shared read space: TABLE db: test name: table100_innodb_key_pk_parts_2_int_autoinc
2013-10-09 00:54:58 43148 [Note] WSREP: wsrep_abort_thd, by: 139940104804096, victim: 139940105602816
2013-10-09 00:54:59 43148 [Note] WSREP: abort transaction: BF: DROP TABLE `table100_innodb_key_pk_parts_2_int_autoinc` victim: COMMIT
2013-10-09 00:54:59 43148 [Note] WSREP: cluster conflict due to high priority abort for threads:
2013-10-09 00:54:59 43148 [Note] WSREP: Winning thread:
   THD: 15, mode: total order, state: executing, conflict: no conflict, seqno: 638
   SQL: DROP TABLE `table100_innodb_key_pk_parts_2_int_autoinc`
2013-10-09 00:54:59 43148 [Note] WSREP: Victim thread:
   THD: 12, mode: local, state: committing, conflict: must abort, seqno: 639
   SQL: COMMIT
2013-10-09 00:54:5...

Read more...

vardir (but only node0/mysql.err should be of interest in this bug). This is with wsrep/galera debug enabled.

Changed in codership-mysql:
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Teemu Ollakka (teemu-ollakka)
milestone: none → 5.6.14-24.1
Teemu Ollakka (teemu-ollakka) wrote :
Changed in codership-mysql:
status: In Progress → Fix Committed

Added 5.5 components, since port to 5.5/5.5-23 is required.

Changed in codership-mysql:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers