wsrep never stops when mysqld fails to start

Bug #794391 reported by Vadim Tkachenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Fix Released
Medium
Alex Yurchenko
codership-maria
Confirmed
Medium
Alex Yurchenko

Bug Description

It seems when mysqld fails to start with ERROR, wsrep can't stop properly.

short version:
libexec/mysqld --defaults-file=/etc/my.gal.cnf
110607 23:22:38 [ERROR] Can't find messagefile '/usr/local/mysql/share/mysql/english/errmsg.sys'
....
110607 23:22:44 Percona XtraDB (http://www.percona.com) 1.0.15-12.5 started; log sequence number 228184303874
110607 23:22:44 [ERROR] Aborting

110607 23:22:44 [Note] waiting for 17 wsrep threads to finish

it seems mysqld stays in this state forever without any activity and never exits,
I have to kill `pidof mysqld` to stop process.

Full log:

libexec/mysqld --defaults-file=/etc/my.gal.cnf
110607 23:22:38 [ERROR] Can't find messagefile '/usr/local/mysql/share/mysql/english/errmsg.sys'
110607 23:22:38 [Note] WSREP: wsrep_load(): loading provider library '/data/vadim/galera/galera/libgalera_smm.so'
110607 23:22:38 [Note] WSREP: wsrep_load(): Galera 0.8pre by Codership Oy <email address hidden> loaded succesfully.
110607 23:22:38 [Note] WSREP: wsrep_sst_grab()
110607 23:22:38 [Note] WSREP: Start replication
110607 23:22:38 [Note] WSREP: Passing config to GCS: 'gcs.fc_debug' = '0'
'gcs.fc_factor' = '0.5'
'gcs.fc_limit' = '16'
'gcs.fc_master_slave' = 'NO'
'gcs.max_packet_size' = '64500'
'gcs.max_throttle' = '0.25'
'gcs.recv_q_hard_limit' = '9223372036854775807'
'gcs.recv_q_soft_limit' = '0.25'
'replicator.commit_order' = '3'

110607 23:22:38 [Warning] WSREP: state file not found: grastate.dat
110607 23:22:38 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
110607 23:22:38 [Note] WSREP: protonet asio version 0
110607 23:22:38 [Note] WSREP: backend: asio
110607 23:22:38 [Note] WSREP: GMCast version 0
110607 23:22:38 [Note] WSREP: (b53c588c-9197-11e0-0800-ffcf7716b941, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
110607 23:22:38 [Note] WSREP: (b53c588c-9197-11e0-0800-ffcf7716b941, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
110607 23:22:38 [Note] WSREP: EVS version 0
110607 23:22:38 [Note] WSREP: PC version 0
110607 23:22:38 [Note] WSREP: gcomm: connecting to group 'big_monkey', peer ''
110607 23:22:38 [Note] WSREP: GMCast::handle_stable_view: view(view_id(PRIM,b53c588c-9197-11e0-0800-ffcf7716b941,1) memb {
        b53c588c-9197-11e0-0800-ffcf7716b941,
} joined {
} left {
} partitioned {
})
110607 23:22:39 [Note] WSREP: gcomm: connected
110607 23:22:39 [Note] WSREP: Changing maximum message size 0 -> 32636
110607 23:22:39 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
110607 23:22:39 [Note] WSREP: Opened channel 'big_monkey'
110607 23:22:39 [Note] WSREP: Waiting for SST to complete.
110607 23:22:39 [Note] WSREP: New COMPONENT: primary = yes, my_idx = 0, memb_num = 1
110607 23:22:39 [Note] WSREP: Starting new group from scratch: b589380a-9197-11e0-0800-b488da7dfa52
110607 23:22:39 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b589538a-9197-11e0-0800-85f021d754fa
110607 23:22:39 [Note] WSREP: STATE EXCHANGE: sent state msg: b589538a-9197-11e0-0800-85f021d754fa
110607 23:22:39 [Note] WSREP: STATE EXCHANGE: got state msg: b589538a-9197-11e0-0800-85f021d754fa from 0 (localhost.localdomain)
110607 23:22:39 [Note] WSREP: Quorum results:
        PRIMARY,
        act_id = 0,
        conf_id = 0,
        last_appl. = -1,
        protocol = 0,
        group UUID = b589380a-9197-11e0-0800-b488da7dfa52
110607 23:22:39 [Note] WSREP: Flow-control interval: [8, 16]
110607 23:22:39 [Note] WSREP: Restored state OPEN -> JOINED (0)
110607 23:22:39 [Note] WSREP: Member 0 (localhost.localdomain) synced with group.
110607 23:22:39 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)
110607 23:22:39 [Note] WSREP: New cluster view: group UUID: b589380a-9197-11e0-0800-b488da7dfa52, conf# 1: Primary, number of nodes: 1, my index: 0, first seqno: 1
110607 23:22:39 [Note] WSREP: SST complete, status: 0
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Compressed tables use zlib 1.2.3
110607 23:22:39 InnoDB: Initializing buffer pool, size = 24.0G
110607 23:22:42 InnoDB: Completed initialization of buffer pool
110607 23:22:42 InnoDB: highest supported file format is Barracuda.
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
110607 23:22:42 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
110607 23:22:44 Percona XtraDB (http://www.percona.com) 1.0.15-12.5 started; log sequence number 228184303874
110607 23:22:44 [ERROR] Aborting

110607 23:22:44 [Note] waiting for 17 wsrep threads to finish

Changed in codership-maria:
status: New → Confirmed
affects: codership-maria → codership-mysql
Changed in codership-mysql:
importance: Undecided → Medium
assignee: nobody → Alex Yurchenko (ayurchen)
milestone: none → 0.8.1
Changed in codership-maria:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Alex Yurchenko (ayurchen)
Changed in codership-mysql:
status: Confirmed → Fix Committed
Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Still the mystery remains. These are the first two lines in the log:

110705 9:08:41 [ERROR] Can't find messagefile '/tmp/galera0/mysql/share/mysql/english/errmsg.sys'
110705 9:08:41 [Note] WSREP: wsrep_load(): loading provider library '/tmp/galera0/galera/lib/libgalera_smm.so'

The first comes from mysqld.cc:4942:

  if (init_common_variables(MYSQL_CONFIG_NAME,
                                           argc, argv, load_default_groups))
    unireg_abort(1);

and should result in immediate abort. The second, however, comes from mysqld.cc:5040:

  if (init_server_components())
    unireg_abort(1);

which seems to complete successfully:

110705 9:08:42 InnoDB: Initializing buffer pool, size = 595.0M
110705 9:08:42 InnoDB: Completed initialization of buffer pool
110705 9:08:43 InnoDB: Started; log sequence number 6 2033279309

but instead is followed by unireg_abort():

110705 9:08:43 [ERROR] Aborting

What is going on there? What we should be seeing is:

110705 9:08:41 [ERROR] Can't find messagefile '/tmp/galera0/mysql/share/mysql/english/errmsg.sys'
110705 9:08:43 [ERROR] Aborting

Changed in codership-mysql:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.