Realizing that this is really short on info, my apologies, this is what happened:
I had a cluster up, ubuntu precise, 3 virtual servers. I was busy importing quite a large dataset (161G), a mysqldump from another server, which made the binlogs fill up the disk (and there was general logging turned on). So the imported failed halfway:
Mar 3 23:15:12 localhost mysqld: 130303 23:15:12 InnoDB: Error: Write to file ./ccpr/mycc_blob.ibd failed at offset 0 2478833664.
Mar 3 23:15:12 localhost mysqld: InnoDB: 1048576 bytes should have been written, only 0 were written.
Mar 3 23:15:12 localhost mysqld: InnoDB: Operating system error number 28.
So I '/etc/init.d/mysql stop'-ped the one I was importing on, since that was also the one with the .sql files that were being imported. Then I discovered general logging was on. I commented out the general logging, removed the general log and the binlogs, and started up the server again with /etc/init.d/mysql start. The server would not start:
Mar 3 23:47:06 localhost mysqld_safe: WSREP: Running position recovery with --log_error=/tmp/tmp.DUXaE2PwQ3
Mar 3 23:47:17 localhost mysqld_safe: WSREP: Failed to recover position:
[...]
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
Mar 3 23:47:18 localhost mysqld: pc::Proto{uuid=4cf3fb18-8454-11e2-0800-2ede888bd881,start_prim=0,npvo=0,ignore_sb=0,ignore_quorum=0,state=1,last_sent_seq=0,checksum=1,instances=
Mar 3 23:47:18 localhost mysqld: #0110d56e0a3-7a9f-11e2-0800-ab346407151b,prim=1,last_seq=51994,last_prim=view_id(PRIM,0d56e0a3-7a9f-11e2-0800-ab346407151b,1),to_seq=51993,weight=1
Mar 3 23:47:18 localhost mysqld: #0114cf3fb18-8454-11e2-0800-2ede888bd881,prim=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,00000000-0000-0000-0000-000000000000,0),to_seq=-1,weight=1
Mar 3 23:47:18 localhost mysqld: #0117f1d550f-7a9b-11e2-0800-596f0bbb244c,prim=1,last_seq=2,last_prim=view_id(PRIM,7f1d550f-7a9b-11e2-0800-596f0bbb244c,13),to_seq=2852875,weight=1
[...]
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [Note] WSREP: evs::msg{version=0,type=1,user_type=255,order=4,seq=0,seq_range=0,aru_seq=-1,flags=4,source=7f1d550f-7a9b-11e2-0800-596f0bbb244c,source_view_id=view_id(REG,0d56e0a3-7a9f-11e2-0800-ab346407151b,14),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=4778331,node_list=()
Mar 3 23:47:18 localhost mysqld: } 64
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: exception caused by message: evs::msg{version=0,type=3,user_type=255,order=1,seq=0,seq_range=-1,aru_seq=0,flags=4,source=7f1d550f-7a9b-11e2-0800-596f0bbb244c,source_view_id=view_id(REG,0d56e0a3-7a9f-11e2-0800-ab346407151b,14),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=4778333,node_list=()
Mar 3 23:47:18 localhost mysqld: }
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: state after handling message: evs::proto(evs::proto(4cf3fb18-8454-11e2-0800-2ede888bd881, OPERATIONAL, view_id(REG,0d56e0a3-7a9f-11e2-0800-ab346407151b,14)), OPERATIONAL) {
Mar 3 23:47:18 localhost mysqld: current_view=view(view_id(REG,0d56e0a3-7a9f-11e2-0800-ab346407151b,14) memb {
[...]
Mar 3 23:47:18 localhost mysqld: ,recovery_index=#011(0,0),evs::msg{version=0,type=1,user_type=255,order=4,seq=0,seq_range=0,aru_seq=-1,flags=4,source=0d56e0a3-7a9f
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: failed to open gcomm backend connection: 131: 4cf3fb18-8454-11e2-0800-2ede888bd881 last prims not consistent (FATAL)
Mar 3 23:47:18 localhost mysqld: #011 at gcomm/src/pc_proto.cpp:is_prim():717
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -131 (State not recoverable)
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel 'ii.nl cluster' at 'gcomm://172.16.13.4,172.16.14.4,172.16.15.4': -131 (State not recoverable)
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: gcs connect failed: State not recoverable
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: wsrep::connect() failed: 6
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] Aborting
Mar 3 23:47:18 localhost mysqld:
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [Note] WSREP: Service disconnected.
Mar 3 23:47:19 localhost mysqld: 130303 23:47:19 [Note] WSREP: Some threads may fail to exit.
Mar 3 23:47:19 localhost mysqld: 130303 23:47:19 [Note] /usr/sbin/mysqld: Shutdown complete
Understandable, the disks were full.
A naive "well, try again" showed that the cluster needed an IST, and fails because rsync is already there:
Mar 3 23:49:55 localhost mysqld: 130303 23:49:55 [Warning] WSREP: Gap in state sequence. Need state transfer.
Mar 3 23:49:57 localhost /etc/init.d/mysql[23744]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in
Mar 3 23:49:57 localhost /etc/init.d/mysql[23744]: #007/usr/bin/mysqladmin: connect to server at 'localhost' failed
Mar 3 23:49:57 localhost /etc/init.d/mysql[23744]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Mar 3 23:49:57 localhost /etc/init.d/mysql[23744]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
Mar 3 23:49:57 localhost /etc/init.d/mysql[23744]:
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '23693''
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '23693'
Mar 3 23:49:57 localhost mysqld: #011Read: 'rsync daemon already running.'
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '23693': 114 (Operation already in progress)
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] Aborting
killall rsync, then the next restart works (but no IST):
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '25257''
Mar 3 23:50:56 localhost rsyncd[25325]: rsyncd version 3.0.9 starting, listening on port 4444
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Prepared SST request: rsync|172.16.15.4:4444/rsync_sst
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Assign initial position for certification: 119284, protocol version: 2
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state seqno is undefined: 1 (Operation not permitted)
Mar 3 23:50:56 localhost mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():451. IST will be unavailable.
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Node 1 (database-3) requested state transfer from '*any*'. Selected 0 (database-2)(SYNCED) as donor.
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 119284)
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Requesting state transfer: success, donor: 0
[...]
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [Note] WSREP: 0 (database-2): State transfer to 1 (database-3) complete.
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [Note] WSREP: Member 0 (database-2) synced with group.
Mar 4 00:02:40 localhost rsyncd[28132]: sent 54 bytes received 179 bytes total size 44
Mar 4 00:02:40 localhost mysqld: /usr//bin/wsrep_sst_common: line 94: /dev/stderr: Permission denied
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '25257': 1 (Operation not permitted)
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [Note] WSREP: SST complete, seqno: 119284
Realizing that this is really short on info, my apologies, this is what happened:
I had a cluster up, ubuntu precise, 3 virtual servers. I was busy importing quite a large dataset (161G), a mysqldump from another server, which made the binlogs fill up the disk (and there was general logging turned on). So the imported failed halfway:
Mar 3 23:15:12 localhost mysqld: 130303 23:15:12 InnoDB: Error: Write to file ./ccpr/ mycc_blob. ibd failed at offset 0 2478833664.
Mar 3 23:15:12 localhost mysqld: InnoDB: 1048576 bytes should have been written, only 0 were written.
Mar 3 23:15:12 localhost mysqld: InnoDB: Operating system error number 28.
So I '/etc/init.d/mysql stop'-ped the one I was importing on, since that was also the one with the .sql files that were being imported. Then I discovered general logging was on. I commented out the general logging, removed the general log and the binlogs, and started up the server again with /etc/init.d/mysql start. The server would not start:
Mar 3 23:47:06 localhost mysqld_safe: WSREP: Running position recovery with --log_error= /tmp/tmp. DUXaE2PwQ3 uuid=4cf3fb18- 8454-11e2- 0800-2ede888bd8 81,start_ prim=0, npvo=0, ignore_ sb=0,ignore_ quorum= 0,state= 1,last_ sent_seq= 0,checksum= 1,instances= 7a9f-11e2- 0800-ab34640715 1b,prim= 1,last_ seq=51994, last_prim= view_id( PRIM,0d56e0a3- 7a9f-11e2- 0800-ab34640715 1b,1),to_ seq=51993, weight= 1 8454-11e2- 0800-2ede888bd8 81,prim= 0,last_ seq=4294967295, last_prim= view_id( NON_PRIM, 00000000- 0000-0000- 0000-0000000000 00,0),to_ seq=-1, weight= 1 7a9b-11e2- 0800-596f0bbb24 4c,prim= 1,last_ seq=2,last_ prim=view_ id(PRIM, 7f1d550f- 7a9b-11e2- 0800-596f0bbb24 4c,13), to_seq= 2852875, weight= 1 version= 0,type= 1,user_ type=255, order=4, seq=0,seq_ range=0, aru_seq= -1,flags= 4,source= 7f1d550f- 7a9b-11e2- 0800-596f0bbb24 4c,source_ view_id= view_id( REG,0d56e0a3- 7a9f-11e2- 0800-ab34640715 1b,14), range_uuid= 00000000- 0000-0000- 0000-0000000000 00,range= [-1,-1] ,fifo_seq= 4778331, node_list= () version= 0,type= 3,user_ type=255, order=1, seq=0,seq_ range=- 1,aru_seq= 0,flags= 4,source= 7f1d550f- 7a9b-11e2- 0800-596f0bbb24 4c,source_ view_id= view_id( REG,0d56e0a3- 7a9f-11e2- 0800-ab34640715 1b,14), range_uuid= 00000000- 0000-0000- 0000-0000000000 00,range= [-1,-1] ,fifo_seq= 4778333, node_list= () evs::proto( 4cf3fb18- 8454-11e2- 0800-2ede888bd8 81, OPERATIONAL, view_id( REG,0d56e0a3- 7a9f-11e2- 0800-ab34640715 1b,14)) , OPERATIONAL) { view=view( view_id( REG,0d56e0a3- 7a9f-11e2- 0800-ab34640715 1b,14) memb { index=# 011(0,0) ,evs::msg{ version= 0,type= 1,user_ type=255, order=4, seq=0,seq_ range=0, aru_seq= -1,flags= 4,source= 0d56e0a3- 7a9f 8454-11e2- 0800-2ede888bd8 81 last prims not consistent (FATAL) pc_proto. cpp:is_ prim(): 717 gcs_core. c:gcs_core_ open(): 195: Failed to open backend connection: -131 (State not recoverable) gcs.c:gcs_ open(): 1290: Failed to open channel 'ii.nl cluster' at 'gcomm: //172.16. 13.4,172. 16.14.4, 172.16. 15.4': -131 (State not recoverable)
Mar 3 23:47:17 localhost mysqld_safe: WSREP: Failed to recover position:
[...]
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
Mar 3 23:47:18 localhost mysqld: pc::Proto{
Mar 3 23:47:18 localhost mysqld: #0110d56e0a3-
Mar 3 23:47:18 localhost mysqld: #0114cf3fb18-
Mar 3 23:47:18 localhost mysqld: #0117f1d550f-
[...]
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [Note] WSREP: evs::msg{
Mar 3 23:47:18 localhost mysqld: } 64
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: exception caused by message: evs::msg{
Mar 3 23:47:18 localhost mysqld: }
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: state after handling message: evs::proto(
Mar 3 23:47:18 localhost mysqld: current_
[...]
Mar 3 23:47:18 localhost mysqld: ,recovery_
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: failed to open gcomm backend connection: 131: 4cf3fb18-
Mar 3 23:47:18 localhost mysqld: #011 at gcomm/src/
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: gcs/src/
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: gcs/src/
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: gcs connect failed: State not recoverable
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: wsrep::connect() failed: 6
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] Aborting
Mar 3 23:47:18 localhost mysqld:
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [Note] WSREP: Service disconnected.
Mar 3 23:47:19 localhost mysqld: 130303 23:47:19 [Note] WSREP: Some threads may fail to exit.
Mar 3 23:47:19 localhost mysqld: 130303 23:47:19 [Note] /usr/sbin/mysqld: Shutdown complete
Understandable, the disks were full.
A naive "well, try again" showed that the cluster needed an IST, and fails because rsync is already there:
Mar 3 23:49:55 localhost mysqld: 130303 23:49:55 [Warning] WSREP: Gap in state sequence. Need state transfer. d/mysql[ 23744]: 0 processes alive and '/usr/bin/ mysqladmin --defaults- file=/etc/ mysql/debian. cnf ping' resulted in d/mysql[ 23744]: #007/usr/ bin/mysqladmin: connect to server at 'localhost' failed d/mysql[ 23744]: error: 'Can't connect to local MySQL server through socket '/var/run/ mysqld/ mysqld. sock' (2)' d/mysql[ 23744]: Check that mysqld is running and that the socket: '/var/run/ mysqld/ mysqld. sock' exists! d/mysql[ 23744]:
Mar 3 23:49:57 localhost /etc/init.
Mar 3 23:49:57 localhost /etc/init.
Mar 3 23:49:57 localhost /etc/init.
Mar 3 23:49:57 localhost /etc/init.
Mar 3 23:49:57 localhost /etc/init.
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '23693''
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '23693'
Mar 3 23:49:57 localhost mysqld: #011Read: 'rsync daemon already running.'
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '23693': 114 (Operation already in progress)
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] Aborting
killall rsync, then the next restart works (but no IST):
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '25257'' 16.15.4: 4444/rsync_ sst src/replicator_ str.cpp: prepare_ for_IST( ):451. IST will be unavailable. 2)(SYNCED) as donor. wsrep_sst_ common: line 94: /dev/stderr: Permission denied
Mar 3 23:50:56 localhost rsyncd[25325]: rsyncd version 3.0.9 starting, listening on port 4444
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Prepared SST request: rsync|172.
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Assign initial position for certification: 119284, protocol version: 2
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state seqno is undefined: 1 (Operation not permitted)
Mar 3 23:50:56 localhost mysqld: #011 at galera/
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Node 1 (database-3) requested state transfer from '*any*'. Selected 0 (database-
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 119284)
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Requesting state transfer: success, donor: 0
[...]
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [Note] WSREP: 0 (database-2): State transfer to 1 (database-3) complete.
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [Note] WSREP: Member 0 (database-2) synced with group.
Mar 4 00:02:40 localhost rsyncd[28132]: sent 54 bytes received 179 bytes total size 44
Mar 4 00:02:40 localhost mysqld: /usr//bin/
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '25257': 1 (Operation not permitted)
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [Note] WSREP: SST complete, seqno: 119284
Full logs available on request.