codership-maria

wsrep never stops when mysqld fails to start

Bug #794391 reported by Vadim Tkachenko on 2011-06-08

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MySQL patches by Codership	Fix Released	Medium	Alex Yurchenko	MySQL patches by Codership 0.8.1
	codership-maria	Confirmed	Medium	Alex Yurchenko

Bug Description

It seems when mysqld fails to start with ERROR, wsrep can't stop properly.

short version:
libexec/mysqld --defaults-file=/etc/my.gal.cnf
110607 23:22:38 [ERROR] Can't find messagefile '/usr/local/mysql/share/mysql/english/errmsg.sys'
....
110607 23:22:44 Percona XtraDB (http://www.percona.com) 1.0.15-12.5 started; log sequence number 228184303874
110607 23:22:44 [ERROR] Aborting

110607 23:22:44 [Note] waiting for 17 wsrep threads to finish

it seems mysqld stays in this state forever without any activity and never exits,
I have to kill `pidof mysqld` to stop process.

Full log:

libexec/mysqld --defaults-file=/etc/my.gal.cnf
110607 23:22:38 [ERROR] Can't find messagefile '/usr/local/mysql/share/mysql/english/errmsg.sys'
110607 23:22:38 [Note] WSREP: wsrep_load(): loading provider library '/data/vadim/galera/galera/libgalera_smm.so'
110607 23:22:38 [Note] WSREP: wsrep_load(): Galera 0.8pre by Codership Oy <email address hidden> loaded succesfully.
110607 23:22:38 [Note] WSREP: wsrep_sst_grab()
110607 23:22:38 [Note] WSREP: Start replication
110607 23:22:38 [Note] WSREP: Passing config to GCS: 'gcs.fc_debug' = '0'
'gcs.fc_factor' = '0.5'
'gcs.fc_limit' = '16'
'gcs.fc_master_slave' = 'NO'
'gcs.max_packet_size' = '64500'
'gcs.max_throttle' = '0.25'
'gcs.recv_q_hard_limit' = '9223372036854775807'
'gcs.recv_q_soft_limit' = '0.25'
'replicator.commit_order' = '3'

110607 23:22:38 [Warning] WSREP: state file not found: grastate.dat
110607 23:22:38 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
110607 23:22:38 [Note] WSREP: protonet asio version 0
110607 23:22:38 [Note] WSREP: backend: asio
110607 23:22:38 [Note] WSREP: GMCast version 0
110607 23:22:38 [Note] WSREP: (b53c588c-9197-11e0-0800-ffcf7716b941, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
110607 23:22:38 [Note] WSREP: (b53c588c-9197-11e0-0800-ffcf7716b941, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
110607 23:22:38 [Note] WSREP: EVS version 0
110607 23:22:38 [Note] WSREP: PC version 0
110607 23:22:38 [Note] WSREP: gcomm: connecting to group 'big_monkey', peer ''
110607 23:22:38 [Note] WSREP: GMCast::handle_stable_view: view(view_id(PRIM,b53c588c-9197-11e0-0800-ffcf7716b941,1) memb {
        b53c588c-9197-11e0-0800-ffcf7716b941,
} joined {
} left {
} partitioned {
})
110607 23:22:39 [Note] WSREP: gcomm: connected
110607 23:22:39 [Note] WSREP: Changing maximum message size 0 -> 32636
110607 23:22:39 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
110607 23:22:39 [Note] WSREP: Opened channel 'big_monkey'
110607 23:22:39 [Note] WSREP: Waiting for SST to complete.
110607 23:22:39 [Note] WSREP: New COMPONENT: primary = yes, my_idx = 0, memb_num = 1
110607 23:22:39 [Note] WSREP: Starting new group from scratch: b589380a-9197-11e0-0800-b488da7dfa52
110607 23:22:39 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b589538a-9197-11e0-0800-85f021d754fa
110607 23:22:39 [Note] WSREP: STATE EXCHANGE: sent state msg: b589538a-9197-11e0-0800-85f021d754fa
110607 23:22:39 [Note] WSREP: STATE EXCHANGE: got state msg: b589538a-9197-11e0-0800-85f021d754fa from 0 (localhost.localdomain)
110607 23:22:39 [Note] WSREP: Quorum results:
        PRIMARY,
        act_id = 0,
        conf_id = 0,
        last_appl. = -1,
        protocol = 0,
        group UUID = b589380a-9197-11e0-0800-b488da7dfa52
110607 23:22:39 [Note] WSREP: Flow-control interval: [8, 16]
110607 23:22:39 [Note] WSREP: Restored state OPEN -> JOINED (0)
110607 23:22:39 [Note] WSREP: Member 0 (localhost.localdomain) synced with group.
110607 23:22:39 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)
110607 23:22:39 [Note] WSREP: New cluster view: group UUID: b589380a-9197-11e0-0800-b488da7dfa52, conf# 1: Primary, number of nodes: 1, my index: 0, first seqno: 1
110607 23:22:39 [Note] WSREP: SST complete, status: 0
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Compressed tables use zlib 1.2.3
110607 23:22:39 InnoDB: Initializing buffer pool, size = 24.0G
110607 23:22:42 InnoDB: Completed initialization of buffer pool
110607 23:22:42 InnoDB: highest supported file format is Barracuda.
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
110607 23:22:42 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
110607 23:22:44 Percona XtraDB (http://www.percona.com) 1.0.15-12.5 started; log sequence number 228184303874
110607 23:22:44 [ERROR] Aborting

110607 23:22:44 [Note] waiting for 17 wsrep threads to finish

Related branches

lp:codership-mysql/0.8

lp:codership-mysql/trunk

Alex Yurchenko (ayurchen) on 2011-06-16

Changed in codership-maria:
status:	New → Confirmed

Alex Yurchenko (ayurchen) on 2011-06-27

affects:	codership-maria → codership-mysql
Changed in codership-mysql:
importance:	Undecided → Medium
assignee:	nobody → Alex Yurchenko (ayurchen)
milestone:	none → 0.8.1
Changed in codership-maria:
status:	New → Confirmed
importance:	Undecided → Medium
assignee:	nobody → Alex Yurchenko (ayurchen)

Alex Yurchenko (ayurchen) on 2011-07-05

Changed in codership-mysql:
status:	Confirmed → Fix Committed

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2011-07-05:

Still the mystery remains. These are the first two lines in the log:

110705 9:08:41 [ERROR] Can't find messagefile '/tmp/galera0/mysql/share/mysql/english/errmsg.sys'
110705 9:08:41 [Note] WSREP: wsrep_load(): loading provider library '/tmp/galera0/galera/lib/libgalera_smm.so'

The first comes from mysqld.cc:4942:

  if (init_common_variables(MYSQL_CONFIG_NAME,
                                           argc, argv, load_default_groups))
    unireg_abort(1);

and should result in immediate abort. The second, however, comes from mysqld.cc:5040:

if (init_server_components())
unireg_abort(1);

which seems to complete successfully:

110705 9:08:42 InnoDB: Initializing buffer pool, size = 595.0M
110705 9:08:42 InnoDB: Completed initialization of buffer pool
110705 9:08:43 InnoDB: Started; log sequence number 6 2033279309

but instead is followed by unireg_abort():

110705 9:08:43 [ERROR] Aborting

What is going on there? What we should be seeing is:

110705 9:08:41 [ERROR] Can't find messagefile '/tmp/galera0/mysql/share/mysql/english/errmsg.sys'
110705 9:08:43 [ERROR] Aborting

Alex Yurchenko (ayurchen) on 2011-07-25

Changed in codership-mysql:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.