Not full cleanup at crash

Bug #797396 reported by Vadim Tkachenko on 2011-06-14
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Medium
Alex Yurchenko
codership-maria
Medium
Alex Yurchenko

Bug Description

When JOINER crashes during rsync process , it does not kills bash and rsync, so they stay in memory.

This is netstat -anp I see after crash:

 netstat -anp | grep -P "(bash|rsync)"
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 1776/bash
tcp 0 0 0.0.0.0:4444 0.0.0.0:* LISTEN 1793/rsync
tcp 0 0 10.11.12.220:44326 10.11.12.234:4567 CLOSE_WAIT 1776/bash
tcp 0 0 :::4444 :::* LISTEN 1793/rsync
unix 2 [ ] DGRAM 401833 1793/rsync

With these processes I can't start node, it complains "address is in use"

There is more log how I've got crash:

110614 12:54:00 [Note] WSREP: Requesting state transfer: success, donor: 1
110614 12:54:00 [Warning] WSREP: 1 (localhost.localdomain): State transfer to 0 (localhost.localdomain) failed: -12 (Cannot allocate memory)
110614 12:54:00 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():621: Will never receive state. Need to abort.
110614 12:54:00 [Note] WSREP: gcomm: terminating thread
110614 12:54:00 [Note] WSREP: gcomm: joining thread
110614 12:54:00 [Note] WSREP: gcomm: closing backend
110614 12:54:00 [Note] WSREP: evs::proto(0a06b77c-96c0-11e0-0800-ae72dfc6785b, LEAVING, view_id(REG,0a06b77c-96c0-11e0-0800-ae72dfc6785b,10)) uuid 212b9aa2-9658-11e0-0800-6dde59d39152 missing from install message, assuming partitioned
110614 12:54:00 [Note] WSREP: GMCast::handle_stable_view: view(view_id(NON_PRIM,0a06b77c-96c0-11e0-0800-ae72dfc6785b,10) memb {
        0a06b77c-96c0-11e0-0800-ae72dfc6785b,
} joined {
} left {
} partitioned {
        212b9aa2-9658-11e0-0800-6dde59d39152,
})
110614 12:54:00 [Note] WSREP: GMCast::handle_stable_view: view((empty))
110614 12:54:00 [Note] WSREP: gcomm: closed
110614 12:54:00 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary

Changed in codership-mysql:
status: New → Confirmed
Vadim Tkachenko (vadim-tk) wrote :

On related note:

When JOINER gets error
"110614 12:54:00 [Warning] WSREP: 1 (localhost.localdomain): State transfer to 0 (localhost.localdomain) failed: -12 (Cannot allocate memory)" from DONOR,

I guess it makes sense to try another DONOR, rather than crash mysqld.

Vadim Tkachenko (vadim-tk) wrote :

However in this case, there was only one DONOR, so probably it was correct decision.

Changed in codership-mysql:
importance: Undecided → Medium
assignee: nobody → Alex Yurchenko (ayurchen)
milestone: none → 0.8.1
Changed in codership-maria:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Alex Yurchenko (ayurchen)
Changed in codership-mysql:
status: Confirmed → Fix Committed
Changed in codership-maria:
status: Confirmed → Fix Committed
Vadim Tkachenko (vadim-tk) wrote :
Download full text (3.8 KiB)

I still have this problem using revision 3099

110714 19:21:59 [Note] [DEBUG] WSREP: Prepared SST request: xtrabackup|192.168.0.99/xtrabackup_sst
110714 19:21:59 [Warning] WSREP: wsrep_notify_cmd is not defined, skipping notification.
110714 19:21:59 [Note] WSREP: Assign initial position for certification: 67681588, protocol version: 1
110714 19:21:59 [Note] WSREP: State transfer required:
        Group state: 8d0cd8e8-ab08-11e0-0800-bfeb931854e6:67681588
        Local state: 00000000-0000-0000-0000-000000000000:-1
110714 19:21:59 [Note] WSREP: Node 0 (cisco.office.percona.com) requested state transfer from '*any*'. Selected 1 (r815.office.percona.com)(SYNCED) as donor.
110714 19:21:59 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 67681588)
110714 19:21:59 [Note] WSREP: Requesting state transfer: success, donor: 1
110714 19:21:59 [Warning] WSREP: 1 (r815.office.percona.com): State transfer to 0 (cisco.office.percona.com) failed: -12 (Cannot allocate memory)
110714 19:21:59 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():645: Will never receive state. Need to abort.
110714 19:21:59 [Note] WSREP: gcomm: terminating thread
110714 19:21:59 [Note] WSREP: gcomm: joining thread
110714 19:21:59 [Note] WSREP: gcomm: closing backend
110714 19:22:00 [Note] WSREP: evs::proto(360eb770-ae89-11e0-0800-bbd6923083d1, LEAVING, view_id(REG,360eb770-ae89-11e0-0800-bbd6923083d1,18)) uuid ce90c4f4-ae2f-11e0-0800-d841c420f3bf missing from install message, assuming partitioned
110714 19:22:00 [Note] WSREP: GMCast::handle_stable_view: view(view_id(NON_PRIM,360eb770-ae89-11e0-0800-bbd6923083d1,18) memb {
        360eb770-ae89-11e0-0800-bbd6923083d1,
} joined {
} left {
} partitioned {
        ce90c4f4-ae2f-11e0-0800-d841c420f3bf,
})
110714 19:22:00 [Note] WSREP: GMCast::handle_stable_view: view((empty))
110714 19:22:00 [Note] WSREP: gcomm: closed
110714 19:22:00 [Note] WSREP: libexec/mysqld: Terminated.
Aborted (core dumped)

then next start:

110714 19:25:13 [Note] WSREP: wsrep_load(): loading provider library '/data/opt/data/vadim/src/galera/libgalera_smm.so'
110714 19:25:13 [Note] WSREP: wsrep_load(): Galera 0.8.1 by Codership Oy <email address hidden> loaded succesfully.
110714 19:25:13 [Note] WSREP: Passing config to GCS: gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; replicator.commit_order = 3
110714 19:25:13 [Note] WSREP: wsrep_sst_grab()
110714 19:25:13 [Note] WSREP: Start replication
110714 19:25:13 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
110714 19:25:13 [Note] WSREP: Assign initial position for certification: -1, protocol version: 1
110714 19:25:13 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
110714 19:25:13 [Note] WSREP: protonet asio version 0
110714 19:25:13 [Note] WSREP: backend: asio
110714 19:25:13 [Note] WSREP: GMCast version 0
110714 19:25:13 [Note] WSREP: (ab7a341f-ae89-11e0-0800-fd6780d48ba3, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
110714 19:25:13 [Note] WSREP: (ab7a341f-ae89-11e0-0800-fd...

Read more...

Alex Yurchenko (ayurchen) wrote :

Vadim,

You need to update xtrabackup sst script to monitor the state of the parent mysqld process. Check how it is done in wsrep_sst_rsync.sh.

Changed in codership-mysql:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers