Looking at the logs, controller-1 bootstrap the cluster, the other two node tried to join afterwards and request a SST (rsync) to sync their local state.
Log show the SST requests always fail due to "connection refused" errors. For instance, when controller-0 request a SST from controller-1, it starts a rsyncd and wait for data to be transferred from controller-1 via rsync.
161121 18:03:58 [Note] WSREP: Member 0.0 (rhelosp-18748-controller-0.localdomain) requested state transfer from '*any*'. Selected 1.0 (rhelosp-18748-controller-1.localdomain)(SYNCED) as donor.
161121 18:03:58 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 0)
161121 18:03:58 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
161121 18:03:58 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address 'rhelosp-18748-controller-0.internalapi.localdomain:4444/rsync_sst' --auth '(null)' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid 'e819dd00-b013-11e6-93a9-c3de2248d2b2:0''
What's odd is that controller-1 seems to resolve dns of controller-1 as "localhost" (ipv6 and ipv4 tried by rsync):
rsync: failed to connect to rhelosp-18748-controller-0.internalapi.localdomain (::1%1): Connection refused (111)
rsync: failed to connect to rhelosp-18748-controller-0.internalapi.localdomain (127.0.0.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Consequently the rsync transfer fails, and the controller-0 and controller-2 can never join the galera cluster.
Looking at the logs, controller-1 bootstrap the cluster, the other two node tried to join afterwards and request a SST (rsync) to sync their local state.
Log show the SST requests always fail due to "connection refused" errors. For instance, when controller-0 request a SST from controller-1, it starts a rsyncd and wait for data to be transferred from controller-1 via rsync.
161121 18:03:58 [Note] WSREP: Member 0.0 (rhelosp- 18748-controlle r-0.localdomain ) requested state transfer from '*any*'. Selected 1.0 (rhelosp- 18748-controlle r-1.localdomain )(SYNCED) as donor. 18748-controlle r-0.internalapi .localdomain: 4444/rsync_ sst' --auth '(null)' --socket '/var/lib/ mysql/mysql. sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid 'e819dd00- b013-11e6- 93a9-c3de2248d2 b2:0''
161121 18:03:58 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 0)
161121 18:03:58 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
161121 18:03:58 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address 'rhelosp-
What's odd is that controller-1 seems to resolve dns of controller-1 as "localhost" (ipv6 and ipv4 tried by rsync):
rsync: failed to connect to rhelosp- 18748-controlle r-0.internalapi .localdomain (::1%1): Connection refused (111) 18748-controlle r-0.internalapi .localdomain (127.0.0.1): Connection refused (111)
rsync: failed to connect to rhelosp-
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Consequently the rsync transfer fails, and the controller-0 and controller-2 can never join the galera cluster.
looking at /etc/hosts I see:
172.17.0.21 rhelosp- 18748-controlle r-0.internalapi . rhelosp- 18748-controlle r-0.internalapi
Could it be that the "." at the end of all HEAT-generated entries is what causes the issue?