Failed HA Controller deployment, mysql fails to start
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
Julie Pichon |
Bug Description
3-controller 1-node deployments fail. Looking in to this further, it appears that mysql never starts properly on at least one node:
[stack@
RHELOSP-
resource_type: OS::Heat:
physical_
status: CREATE_FAILED
status_reason: |
Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
deploy_stdout: |
Matching apachectl 'Server version: Apache/2.4.6 (Red Hat Enterprise Linux)
Server built: Aug 3 2016 08:33:27'
Notice: Scope(Class[
Notice: Compiled catalog for rhelosp-
<snip>
Notice: /Stage[
Logging in to the controllers shows that mysql failed to start on at least two Controller nodes.
/var/log/mysql.log and /var/log/
Looking at the logs, controller-1 bootstrap the cluster, the other two node tried to join afterwards and request a SST (rsync) to sync their local state.
Log show the SST requests always fail due to "connection refused" errors. For instance, when controller-0 request a SST from controller-1, it starts a rsyncd and wait for data to be transferred from controller-1 via rsync.
161121 18:03:58 [Note] WSREP: Member 0.0 (rhelosp- 18748-controlle r-0.localdomain ) requested state transfer from '*any*'. Selected 1.0 (rhelosp- 18748-controlle r-1.localdomain )(SYNCED) as donor. 18748-controlle r-0.internalapi .localdomain: 4444/rsync_ sst' --auth '(null)' --socket '/var/lib/ mysql/mysql. sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid 'e819dd00- b013-11e6- 93a9-c3de2248d2 b2:0''
161121 18:03:58 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 0)
161121 18:03:58 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
161121 18:03:58 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address 'rhelosp-
What's odd is that controller-1 seems to resolve dns of controller-1 as "localhost" (ipv6 and ipv4 tried by rsync):
rsync: failed to connect to rhelosp- 18748-controlle r-0.internalapi .localdomain (::1%1): Connection refused (111) 18748-controlle r-0.internalapi .localdomain (127.0.0.1): Connection refused (111)
rsync: failed to connect to rhelosp-
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Consequently the rsync transfer fails, and the controller-0 and controller-2 can never join the galera cluster.
looking at /etc/hosts I see:
172.17.0.21 rhelosp- 18748-controlle r-0.internalapi . rhelosp- 18748-controlle r-0.internalapi
Could it be that the "." at the end of all HEAT-generated entries is what causes the issue?