Comment 1 for bug 1143052

Revision history for this message
Mrten (bugzilla-ii) wrote :

Realizing that this is really short on info, my apologies, this is what happened:

I had a cluster up, ubuntu precise, 3 virtual servers. I was busy importing quite a large dataset (161G), a mysqldump from another server, which made the binlogs fill up the disk (and there was general logging turned on). So the imported failed halfway:

Mar 3 23:15:12 localhost mysqld: 130303 23:15:12 InnoDB: Error: Write to file ./ccpr/mycc_blob.ibd failed at offset 0 2478833664.
Mar 3 23:15:12 localhost mysqld: InnoDB: 1048576 bytes should have been written, only 0 were written.
Mar 3 23:15:12 localhost mysqld: InnoDB: Operating system error number 28.

So I '/etc/init.d/mysql stop'-ped the one I was importing on, since that was also the one with the .sql files that were being imported. Then I discovered general logging was on. I commented out the general logging, removed the general log and the binlogs, and started up the server again with /etc/init.d/mysql start. The server would not start:

Mar 3 23:47:06 localhost mysqld_safe: WSREP: Running position recovery with --log_error=/tmp/tmp.DUXaE2PwQ3
Mar 3 23:47:17 localhost mysqld_safe: WSREP: Failed to recover position:
[...]
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
Mar 3 23:47:18 localhost mysqld: pc::Proto{uuid=4cf3fb18-8454-11e2-0800-2ede888bd881,start_prim=0,npvo=0,ignore_sb=0,ignore_quorum=0,state=1,last_sent_seq=0,checksum=1,instances=
Mar 3 23:47:18 localhost mysqld: #0110d56e0a3-7a9f-11e2-0800-ab346407151b,prim=1,last_seq=51994,last_prim=view_id(PRIM,0d56e0a3-7a9f-11e2-0800-ab346407151b,1),to_seq=51993,weight=1
Mar 3 23:47:18 localhost mysqld: #0114cf3fb18-8454-11e2-0800-2ede888bd881,prim=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,00000000-0000-0000-0000-000000000000,0),to_seq=-1,weight=1
Mar 3 23:47:18 localhost mysqld: #0117f1d550f-7a9b-11e2-0800-596f0bbb244c,prim=1,last_seq=2,last_prim=view_id(PRIM,7f1d550f-7a9b-11e2-0800-596f0bbb244c,13),to_seq=2852875,weight=1
[...]
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [Note] WSREP: evs::msg{version=0,type=1,user_type=255,order=4,seq=0,seq_range=0,aru_seq=-1,flags=4,source=7f1d550f-7a9b-11e2-0800-596f0bbb244c,source_view_id=view_id(REG,0d56e0a3-7a9f-11e2-0800-ab346407151b,14),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=4778331,node_list=()
Mar 3 23:47:18 localhost mysqld: } 64
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: exception caused by message: evs::msg{version=0,type=3,user_type=255,order=1,seq=0,seq_range=-1,aru_seq=0,flags=4,source=7f1d550f-7a9b-11e2-0800-596f0bbb244c,source_view_id=view_id(REG,0d56e0a3-7a9f-11e2-0800-ab346407151b,14),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=4778333,node_list=()
Mar 3 23:47:18 localhost mysqld: }
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: state after handling message: evs::proto(evs::proto(4cf3fb18-8454-11e2-0800-2ede888bd881, OPERATIONAL, view_id(REG,0d56e0a3-7a9f-11e2-0800-ab346407151b,14)), OPERATIONAL) {
Mar 3 23:47:18 localhost mysqld: current_view=view(view_id(REG,0d56e0a3-7a9f-11e2-0800-ab346407151b,14) memb {
[...]
Mar 3 23:47:18 localhost mysqld: ,recovery_index=#011(0,0),evs::msg{version=0,type=1,user_type=255,order=4,seq=0,seq_range=0,aru_seq=-1,flags=4,source=0d56e0a3-7a9f
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: failed to open gcomm backend connection: 131: 4cf3fb18-8454-11e2-0800-2ede888bd881 last prims not consistent (FATAL)
Mar 3 23:47:18 localhost mysqld: #011 at gcomm/src/pc_proto.cpp:is_prim():717
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -131 (State not recoverable)
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel 'ii.nl cluster' at 'gcomm://172.16.13.4,172.16.14.4,172.16.15.4': -131 (State not recoverable)
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: gcs connect failed: State not recoverable
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] WSREP: wsrep::connect() failed: 6
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [ERROR] Aborting
Mar 3 23:47:18 localhost mysqld:
Mar 3 23:47:18 localhost mysqld: 130303 23:47:18 [Note] WSREP: Service disconnected.
Mar 3 23:47:19 localhost mysqld: 130303 23:47:19 [Note] WSREP: Some threads may fail to exit.
Mar 3 23:47:19 localhost mysqld: 130303 23:47:19 [Note] /usr/sbin/mysqld: Shutdown complete

Understandable, the disks were full.

A naive "well, try again" showed that the cluster needed an IST, and fails because rsync is already there:

Mar 3 23:49:55 localhost mysqld: 130303 23:49:55 [Warning] WSREP: Gap in state sequence. Need state transfer.
Mar 3 23:49:57 localhost /etc/init.d/mysql[23744]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in
Mar 3 23:49:57 localhost /etc/init.d/mysql[23744]: #007/usr/bin/mysqladmin: connect to server at 'localhost' failed
Mar 3 23:49:57 localhost /etc/init.d/mysql[23744]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Mar 3 23:49:57 localhost /etc/init.d/mysql[23744]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
Mar 3 23:49:57 localhost /etc/init.d/mysql[23744]:
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '23693''
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '23693'
Mar 3 23:49:57 localhost mysqld: #011Read: 'rsync daemon already running.'
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '23693': 114 (Operation already in progress)
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
Mar 3 23:49:57 localhost mysqld: 130303 23:49:57 [ERROR] Aborting

killall rsync, then the next restart works (but no IST):

Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '25257''
Mar 3 23:50:56 localhost rsyncd[25325]: rsyncd version 3.0.9 starting, listening on port 4444
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Prepared SST request: rsync|172.16.15.4:4444/rsync_sst
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Assign initial position for certification: 119284, protocol version: 2
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state seqno is undefined: 1 (Operation not permitted)
Mar 3 23:50:56 localhost mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():451. IST will be unavailable.
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Node 1 (database-3) requested state transfer from '*any*'. Selected 0 (database-2)(SYNCED) as donor.
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 119284)
Mar 3 23:50:56 localhost mysqld: 130303 23:50:56 [Note] WSREP: Requesting state transfer: success, donor: 0
[...]
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [Note] WSREP: 0 (database-2): State transfer to 1 (database-3) complete.
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [Note] WSREP: Member 0 (database-2) synced with group.
Mar 4 00:02:40 localhost rsyncd[28132]: sent 54 bytes received 179 bytes total size 44
Mar 4 00:02:40 localhost mysqld: /usr//bin/wsrep_sst_common: line 94: /dev/stderr: Permission denied
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.16.15.4' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '25257': 1 (Operation not permitted)
Mar 4 00:02:40 localhost mysqld: 130304 0:02:40 [Note] WSREP: SST complete, seqno: 119284

Full logs available on request.