Hi,
I'm trying to enable InnoDB table space encryption in my Galera cluster (running 5.7.19-29.22-3.jessie, xtrabackup 2.4.8-1.jessie), however I cannot get IST to work. I bootstrapped a test cluster with pxc_encrypt_cluster_traffic set to ON (since encryption is mandatory for the keyring transfer). My test looks like this:
1. Stop one of the nodes
2. Issue a few inserts to one of the remaining nodes
3. Start the node back and wait for IST to finish
Step 3 hangs indefinitely (or more specifically, the init script hangs and I cannot connect to MySQL). I haven't enabled encryption on any of the tables yet. It seems the issue is caused just by setting the keyring_file_data option in my.cnf.
From the logs it seems that IST completes but is not signalled to the joiner correctly.
Logs from the joiner:
2017-10-30T18:31:16.518484Z mysqld_safe Logging to '/var/lib/mysql/dbtest2.err'.
2017-10-30T18:31:16.533008Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2017-10-30T18:31:16.538314Z mysqld_safe Skipping wsrep-recover for 590fa85c-bd85-11e7-9d42-468b04ff29c5:34 pair
2017-10-30T18:31:16.539029Z mysqld_safe Assigning 590fa85c-bd85-11e7-9d42-468b04ff29c5:34 to wsrep_start_position
2017-10-30T18:31:16.544596Z 0 [Warning] Changed limits: max_open_files: 5000 (requested 5011)
2017-10-30T18:31:16.544649Z 0 [Warning] Changed limits: table_open_cache: 1995 (requested 2000)
2017-10-30T18:31:16.719955Z 0 [Warning] The syntax '--log_warnings/-W' is deprecated and will be removed in a future release. Please use '--log_error_verbosity' instead.
2017-10-30T18:31:16.720063Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-10-30T18:31:16.721390Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.19-17-57) starting as process 29487 ...
2017-10-30T18:31:16.723807Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2017-10-30T18:31:16.723826Z 0 [Note] WSREP: Skipping automatic SSL certificate generation (enabled only in bootstrap mode)
2017-10-30T18:31:16.723839Z 0 [Note] WSREP: Setting wsrep_ready to false
2017-10-30T18:31:16.723850Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2017-10-30T18:31:16.723856Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2017-10-30T18:31:16.727674Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <email address hidden> loaded successfully.
2017-10-30T18:31:16.727752Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-10-30T18:31:16.728629Z 0 [Note] WSREP: Found saved state: 590fa85c-bd85-11e7-9d42-468b04ff29c5:34, safe_to_bootsrap: 0
2017-10-30T18:31:16.729820Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.10.10.28; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 1G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://10.10.10.28:4567; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992; socket.ssl_ca = ca.pem; socket.ssl_cert = server-cert.pem; socket.ssl_cipher = AES128-SHA; socket.ssl_compression = YES; socket.ssl_key = server-key.pem;
2017-10-30T18:31:16.745742Z 0 [Note] WSREP: GCache history reset: 590fa85c-bd85-11e7-9d42-468b04ff29c5:0 -> 590fa85c-bd85-11e7-9d42-468b04ff29c5:34
2017-10-30T18:31:16.746255Z 0 [Note] WSREP: Assign initial position for certification: 34, protocol version: -1
2017-10-30T18:31:16.746278Z 0 [Note] WSREP: Preparing to initiate SST/IST
2017-10-30T18:31:16.746285Z 0 [Note] WSREP: Starting replication
2017-10-30T18:31:16.746298Z 0 [Note] WSREP: Setting initial position to 590fa85c-bd85-11e7-9d42-468b04ff29c5:34
2017-10-30T18:31:16.746539Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-10-30T18:31:16.746580Z 0 [Note] WSREP: initializing ssl context
2017-10-30T18:31:16.746912Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-10-30T18:31:16.747023Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2017-10-30T18:31:16.747041Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2017-10-30T18:31:16.747586Z 0 [Note] WSREP: GMCast version 0
2017-10-30T18:31:16.747866Z 0 [Note] WSREP: (8424307c, 'ssl://10.10.10.28:4567') listening at ssl://10.10.10.28:4567
2017-10-30T18:31:16.747885Z 0 [Note] WSREP: (8424307c, 'ssl://10.10.10.28:4567') multicast: , ttl: 1
2017-10-30T18:31:16.748276Z 0 [Note] WSREP: EVS version 0
2017-10-30T18:31:16.748397Z 0 [Note] WSREP: gcomm: connecting to group 'test', peer '10.10.10.29:,10.10.10.28:,10.10.10.27:'
2017-10-30T18:31:16.757084Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://10.10.10.27:4567 local endpoint ssl://10.10.10.28:59971 cipher: AES128-SHA compression:
2017-10-30T18:31:16.758336Z 0 [Note] WSREP: (8424307c, 'ssl://10.10.10.28:4567') connection established to c3f5b6b3 ssl://10.10.10.27:4567
2017-10-30T18:31:16.758448Z 0 [Note] WSREP: (8424307c, 'ssl://10.10.10.28:4567') turning message relay requesting on, nonlive peers:
2017-10-30T18:31:16.759279Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://10.10.10.29:4567 local endpoint ssl://10.10.10.28:44410 cipher: AES128-SHA compression:
2017-10-30T18:31:16.760789Z 0 [Note] WSREP: (8424307c, 'ssl://10.10.10.28:4567') connection established to 8c6853ae ssl://10.10.10.29:4567
2017-10-30T18:31:16.814652Z 0 [Note] WSREP: declaring 8c6853ae at ssl://10.10.10.29:4567 stable
2017-10-30T18:31:16.814704Z 0 [Note] WSREP: declaring c3f5b6b3 at ssl://10.10.10.27:4567 stable
2017-10-30T18:31:16.815721Z 0 [Note] WSREP: Node 8c6853ae state primary
2017-10-30T18:31:16.816976Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,8424307c,93)
memb {
8424307c,0
8c6853ae,0
c3f5b6b3,0
}
joined {
}
left {
}
partitioned {
}
)
2017-10-30T18:31:16.817000Z 0 [Note] WSREP: Save the discovered primary-component to disk
2017-10-30T18:31:17.249018Z 0 [Note] WSREP: gcomm: connected
2017-10-30T18:31:17.249175Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-10-30T18:31:17.249293Z 0 [Note] WSREP: Waiting for SST/IST to complete.
2017-10-30T18:31:17.249411Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2017-10-30T18:31:17.249949Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 8470d981-bda0-11e7-b889-431946194c48
2017-10-30T18:31:17.251543Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 8470d981-bda0-11e7-b889-431946194c48
2017-10-30T18:31:17.252341Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 8470d981-bda0-11e7-b889-431946194c48 from 0 (dbtest2)
2017-10-30T18:31:17.252405Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 8470d981-bda0-11e7-b889-431946194c48 from 1 (dbtest1)
2017-10-30T18:31:17.252453Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 8470d981-bda0-11e7-b889-431946194c48 from 2 (dbtest3)
2017-10-30T18:31:17.252499Z 0 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 85,
members = 2/3 (primary/total),
act_id = 37,
last_appl. = -1,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 590fa85c-bd85-11e7-9d42-468b04ff29c5
2017-10-30T18:31:17.252551Z 0 [Note] WSREP: Flow-control interval: [173, 173]
2017-10-30T18:31:17.252589Z 0 [Note] WSREP: Trying to continue unpaused monitor
2017-10-30T18:31:17.252625Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 37)
2017-10-30T18:31:17.252723Z 2 [Note] WSREP: State transfer required:
Group state: 590fa85c-bd85-11e7-9d42-468b04ff29c5:37
Local state: 590fa85c-bd85-11e7-9d42-468b04ff29c5:34
2017-10-30T18:31:17.252769Z 2 [Note] WSREP: New cluster view: global state: 590fa85c-bd85-11e7-9d42-468b04ff29c5:37, view# 86: Primary, number of nodes: 3, my index: 0, protocol version 3
2017-10-30T18:31:17.252779Z 2 [Note] WSREP: Setting wsrep_ready to true
2017-10-30T18:31:17.252784Z 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2017-10-30T18:31:17.252792Z 2 [Note] WSREP: Setting wsrep_ready to false
2017-10-30T18:31:17.252860Z 0 [Note] WSREP: Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.10.10.28' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '29487' '' )
2017-10-30T18:31:17.749475Z 2 [Note] WSREP: Prepared SST/IST request: xtrabackup-v2|10.10.10.28:4444/xtrabackup_sst//1
2017-10-30T18:31:17.749516Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-10-30T18:31:17.749551Z 2 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-10-30T18:31:17.749565Z 2 [Note] WSREP: Assign initial position for certification: 37, protocol version: 3
2017-10-30T18:31:17.749614Z 0 [Note] WSREP: Service thread queue flushed.
2017-10-30T18:31:17.749693Z 2 [Note] WSREP: Check if state gap can be serviced using IST
2017-10-30T18:31:17.749832Z 2 [Note] WSREP: IST receiver addr using ssl://10.10.10.28:4568
2017-10-30T18:31:17.749917Z 2 [Note] WSREP: IST receiver using ssl
2017-10-30T18:31:17.750261Z 2 [Note] WSREP: Prepared IST receiver, listening at: ssl://10.10.10.28:4568
2017-10-30T18:31:17.750282Z 2 [Note] WSREP: State gap can be likely serviced using IST. SST request though present would be void.
2017-10-30T18:31:17.753470Z 0 [Note] WSREP: Member 0.0 (dbtest2) requested state transfer from '*any*'. Selected 1.0 (dbtest1)(SYNCED) as donor.
2017-10-30T18:31:17.753488Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 37)
2017-10-30T18:31:17.753517Z 2 [Note] WSREP: Requesting state transfer: success, donor: 1
2017-10-30T18:31:17.753538Z 2 [Note] WSREP: GCache history reset: 590fa85c-bd85-11e7-9d42-468b04ff29c5:0 -> 590fa85c-bd85-11e7-9d42-468b04ff29c5:37
2017-10-30T18:31:18.176789Z 0 [Note] WSREP: 1.0 (dbtest1): State transfer to 0.0 (dbtest2) complete.
2017-10-30T18:31:18.177862Z 0 [Note] WSREP: Member 1.0 (dbtest1) synced with group.
2017-10-30T18:31:20.249577Z 0 [Note] WSREP: (8424307c, 'ssl://10.10.10.28:4567') turning message relay requesting off
Logs from the donor:
2017-10-30T18:31:15.172649Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://10.10.10.28:44410 local endpoint ssl://10.10.10.29:4567 cipher: AES128-SHA compression:
2017-10-30T18:31:15.173403Z 0 [Note] WSREP: (8c6853ae, 'ssl://10.10.10.29:4567') turning message relay requesting on, nonlive peers: ssl://10.10.10.28:4567
2017-10-30T18:31:15.173947Z 0 [Note] WSREP: (8c6853ae, 'ssl://10.10.10.29:4567') connection established to 8424307c ssl://10.10.10.28:4567
2017-10-30T18:31:15.228234Z 0 [Note] WSREP: declaring 8424307c at ssl://10.10.10.28:4567 stable
2017-10-30T18:31:15.228394Z 0 [Note] WSREP: declaring c3f5b6b3 at ssl://10.10.10.27:4567 stable
2017-10-30T18:31:15.229234Z 0 [Note] WSREP: Node 8c6853ae state primary
2017-10-30T18:31:15.230553Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,8424307c,93)
memb {
8424307c,0
8c6853ae,0
c3f5b6b3,0
}
joined {
}
left {
}
partitioned {
}
)
2017-10-30T18:31:15.230645Z 0 [Note] WSREP: Save the discovered primary-component to disk
2017-10-30T18:31:15.231098Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2017-10-30T18:31:15.231129Z 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2017-10-30T18:31:15.665160Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 8470d981-bda0-11e7-b889-431946194c48
2017-10-30T18:31:15.665970Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 8470d981-bda0-11e7-b889-431946194c48 from 0 (dbtest2)
2017-10-30T18:31:15.666093Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 8470d981-bda0-11e7-b889-431946194c48 from 1 (dbtest1)
2017-10-30T18:31:15.666185Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 8470d981-bda0-11e7-b889-431946194c48 from 2 (dbtest3)
2017-10-30T18:31:15.666211Z 0 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 85,
members = 2/3 (primary/total),
act_id = 37,
last_appl. = 0,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 590fa85c-bd85-11e7-9d42-468b04ff29c5
2017-10-30T18:31:15.666229Z 0 [Note] WSREP: Flow-control interval: [173, 173]
2017-10-30T18:31:15.666240Z 0 [Note] WSREP: Trying to continue unpaused monitor
2017-10-30T18:31:15.666332Z 1 [Note] WSREP: New cluster view: global state: 590fa85c-bd85-11e7-9d42-468b04ff29c5:37, view# 86: Primary, number of nodes: 3, my index: 1, protocol version 3
2017-10-30T18:31:15.666346Z 1 [Note] WSREP: Setting wsrep_ready to true
2017-10-30T18:31:15.666356Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-10-30T18:31:15.666384Z 1 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-10-30T18:31:15.666415Z 1 [Note] WSREP: Assign initial position for certification: 37, protocol version: 3
2017-10-30T18:31:15.666462Z 0 [Note] WSREP: Service thread queue flushed.
2017-10-30T18:31:16.167004Z 0 [Note] WSREP: Member 0.0 (dbtest2) requested state transfer from '*any*'. Selected 1.0 (dbtest1)(SYNCED) as donor.
2017-10-30T18:31:16.167055Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 37)
2017-10-30T18:31:16.167280Z 1 [Note] WSREP: IST request: 590fa85c-bd85-11e7-9d42-468b04ff29c5:34-37|ssl://10.10.10.28:4568
2017-10-30T18:31:16.167316Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-10-30T18:31:16.167447Z 0 [Note] WSREP: Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.10.10.28:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' '' --gtid '590fa85c-bd85-11e7-9d42-468b04ff29c5:34' --bypass)
2017-10-30T18:31:16.168551Z 1 [Note] WSREP: DONOR thread signaled with 0
2017-10-30T18:31:16.168798Z 1 [Note] WSREP: IST sender using ssl
2017-10-30T18:31:16.178384Z 0 [Note] WSREP: async IST sender starting to serve ssl://10.10.10.28:4568 sending 35-37
2017-10-30T18:31:16.503316Z WSREP_SST: [INFO] Bypassing SST. Can work it through IST
2017-10-30T18:31:16.590367Z 0 [Note] WSREP: 1.0 (dbtest1): State transfer to 0.0 (dbtest2) complete.
2017-10-30T18:31:16.590401Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 37)
2017-10-30T18:31:16.591402Z 0 [Note] WSREP: Member 1.0 (dbtest1) synced with group.
2017-10-30T18:31:16.591416Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 37)
2017-10-30T18:31:16.591450Z 5 [Note] WSREP: Synchronized with group, ready for connections
2017-10-30T18:31:16.591503Z 5 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-10-30T18:31:18.256069Z 0 [Note] WSREP: (8c6853ae, 'ssl://10.10.10.29:4567') turning message relay requesting off
Config:
[client]
default_character_set = utf8
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld]
audit_log_file = /var/log/mysql/audit.log
audit_log_format = JSON
audit_log_handler = FILE
audit_log_rotate_on_size = 4G
audit_log_rotations = 1
bind_address = 10.10.10.29
binlog_format = row
datadir = /var/lib/mysql
default_storage_engine = innodb
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_size = 512M
innodb_file_per_table = 1
innodb_locks_unsafe_for_binlog = 1
innodb_log_file_size = 64M
log_warnings = 2
max_connections = 1000
pid_file = /var/run/mysqld/mysqld.pid
plugin-load = audit_log=audit_log.so
port = 3306
server_id = 1
socket = /var/run/mysqld/mysqld.sock
sql_mode = traditional
user = mysql
wsrep_cluster_address = gcomm://10.10.10.29,10.10.10.28,10.10.10.27
wsrep_cluster_name = test
wsrep_node_address = 10.10.10.29
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_provider_options = "gcache.size = 1G; gmcast.listen_addr = tcp://10.10.10.29:4567"
wsrep_slave_threads = 4
wsrep_sst_auth = sst:password
wsrep_sst_method = xtrabackup-v2
pxc_encrypt_cluster_traffic = ON
early-plugin-load = keyring_file.so
keyring_file_data = /var/lib/mysql-keyring/keyring
commit b496ae4af8634dc 38323754a3119b7 defc6d5302
Merge: cfea40e 58eb07b
Author: Krunal Bauskar <email address hidden>
Date: Wed Nov 1 15:33:35 2017 +0530
Merge pull request #567 from kbauskar/ 5.7-pxc- 885
- PXC#885: IST hangs when keyring_file_data is set
commit 58eb07b9cb62e67 9e5503a19e29d9a f0b1593c64
Author: Krunal Bauskar <email address hidden>
Date: Wed Nov 1 11:02:39 2017 +0530
- PXC#885: IST hangs when keyring_file_data is set
* When a node joins the cluster depending on existing
state it is synced using SST or IST.
* If SST is done then keyring needs is sent from DONOR
to JOINER.
* If IST is done only write-sets are send. keyring
sync is not needed.
bug was caused as the JOINER flow expected keyring
even in IST flow.