Comment 4 for bug 1643670

Revision history for this message
Damien Ciabrini (dciabrin) wrote :

Looking at the logs, controller-1 bootstrap the cluster, the other two node tried to join afterwards and request a SST (rsync) to sync their local state.

Log show the SST requests always fail due to "connection refused" errors. For instance, when controller-0 request a SST from controller-1, it starts a rsyncd and wait for data to be transferred from controller-1 via rsync.

161121 18:03:58 [Note] WSREP: Member 0.0 (rhelosp-18748-controller-0.localdomain) requested state transfer from '*any*'. Selected 1.0 (rhelosp-18748-controller-1.localdomain)(SYNCED) as donor.
161121 18:03:58 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 0)
161121 18:03:58 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
161121 18:03:58 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address 'rhelosp-18748-controller-0.internalapi.localdomain:4444/rsync_sst' --auth '(null)' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid 'e819dd00-b013-11e6-93a9-c3de2248d2b2:0''

What's odd is that controller-1 seems to resolve dns of controller-1 as "localhost" (ipv6 and ipv4 tried by rsync):

rsync: failed to connect to rhelosp-18748-controller-0.internalapi.localdomain (::1%1): Connection refused (111)
rsync: failed to connect to rhelosp-18748-controller-0.internalapi.localdomain (127.0.0.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]

Consequently the rsync transfer fails, and the controller-0 and controller-2 can never join the galera cluster.

looking at /etc/hosts I see:

172.17.0.21 rhelosp-18748-controller-0.internalapi. rhelosp-18748-controller-0.internalapi

Could it be that the "." at the end of all HEAT-generated entries is what causes the issue?