[MOS 9.0] Controller node redeployment fails with „Failed to call refresh“ error
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Confirmed
|
Medium
|
Fuel Sustaining | ||
Mitaka |
Confirmed
|
Medium
|
Fuel Sustaining |
Bug Description
[MOS 9.0] Controller node redeployment fails with „Failed to call refresh“ error
Environment:
1x Mgmt node (Fuel)
3x Controller node
3x Storage node
3x Compute node
MOS 9.0
Steps to reproduce the error:
- Shut down controller node
- Remove it from the environment via Fuel Web UI and run deployment. Fuel removes the node and updates active configuration
- Boot the server using PXE. Fuel notices a new node, bootstraps it and marks as a newly discovered system
- Add the node back to the environment (assign controller role) and run the deployment process again. Fuel runs the deployment but in the end fails with the following error(s):
2016-07-29 11:41:02 ERR (/Stage[
2016-07-29 11:41:02 ERR (/Stage[
At this point it seems that MySQL on the newly added node gets stuck in a restart loop. The deployment fails and restarting it doesn't help. Deployment can be sucessfully finished by the following steps:
## Connect to the new node (the one with the "error" status):
[root@fuel ~]# fuel node
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---+---
7 | stopped | Untitled (4e:f2) | 1 | 10.20.0.8 | ec:38:8f:6d:4e:f2 | ceph-osd | | 1 | 1
20 | stopped | Untitled (4e:e8) | 1 | 10.20.0.4 | ec:38:8f:6d:4e:e8 | controller, mongo | | 1 | 1
21 | error | Untitled (4e:ea) | 1 | 10.20.0.6 | ec:38:8f:6d:4e:ea | controller, mongo | | 1 | 1
3 | stopped | Untitled (df:2f) | 1 | 10.20.0.5 | a4:dc:be:1c:df:2f | ceph-osd | | 1 | 1
1 | stopped | Untitled (4b:88) | 1 | 10.20.0.3 | ec:38:8f:6d:4b:88 | compute | | 1 | 1
9 | stopped | Untitled (4b:80) | 1 | 10.20.0.10 | ec:38:8f:6d:4b:80 | compute | | 1 | 1
18 | stopped | Untitled (4e:f6) | 1 | 10.20.0.7 | ec:38:8f:6d:4e:f6 | controller, mongo | | 1 | 1
6 | stopped | Untitled (4e:f4) | 1 | 10.20.0.9 | ec:38:8f:6d:4e:f4 | ceph-osd | | 1 | 1
8 | stopped | Untitled (4b:82) | 1 | 10.20.0.11 | ec:38:8f:6d:4b:82 | compute | | 1 | 1
[root@fuel ~]# ssh 10.20.0.6
Warning: Permanently added '10.20.0.6' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04.4 LTS (GNU/Linux 3.13.0-92-generic x86_64)
* Documentation: https:/
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
## Verify that MySQL is not available:
root@node-21:~# mysql
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/
root@node-21:~# ps -ef | grep -i mysql
root 17182 1 0 11:53 ? 00:00:00 /bin/sh /usr/bin/
mysql 18281 17182 1 11:53 ? 00:00:00 /usr/sbin/mysqld --basedir=/usr --datadir=
root 18282 17182 0 11:53 ? 00:00:00 logger -t mysqld -p daemon.error
mysql 18315 18281 0 11:53 ? 00:00:00 sh -c wsrep_sst_
mysql 18316 18315 0 11:53 ? 00:00:00 /bin/bash -ue /usr//bin/
mysql 19130 18316 0 11:53 ? 00:00:00 perl /usr//bin/
mysql 19195 19130 0 11:53 ? 00:00:00 sh -c xtrabackup_56 --defaults-
mysql 19196 19195 7 11:53 ? 00:00:03 xtrabackup_56 --defaults-
root 19575 16253 0 11:54 ? 00:00:00 /bin/bash /usr/lib/
root 19645 19575 0 11:54 ? 00:00:00 /bin/bash /usr/lib/
root 19646 19645 2 11:54 ? 00:00:00 /bin/sh /usr/bin/
root 20514 19646 0 11:54 ? 00:00:00 /usr/sbin/mysqld --basedir=/usr --datadir=
root 20516 19473 0 11:54 pts/3 00:00:00 grep --color=auto -i mysql
## Stop pacemaker:
root@node-21:~# service pacemaker stop
Signaling Pacemaker Cluster Manager to terminate[ OK ]
Waiting for cluster services to unload.....[ OK ]
## Kill leftover MySQL processes:
root@node-21:~# ps -ef | grep -i mysql
root 7852 1 0 12:05 ? 00:00:00 logger -t mysqld -p daemon.error
mysql 7888 1 0 12:05 ? 00:00:00 sh -c wsrep_sst_
mysql 7889 7888 0 12:05 ? 00:00:00 /bin/bash -ue /usr//bin/
mysql 8152 7889 0 12:05 ? 00:00:00 socat -u TCP-LISTEN:
mysql 8153 7889 1 12:05 ? 00:00:00 xbstream -x
root 8813 19473 0 12:06 pts/3 00:00:00 grep --color=auto -i mysql
root@node-21:~# kill -9 7852 7888 7889 8152 8153
root@node-21:~# ps -ef | grep -i mysql
mysql 8872 1 0 12:06 ? 00:00:00 perl /usr//bin/
mysql 8910 8872 0 12:06 ? 00:00:00 sh -c xtrabackup_56 --defaults-
mysql 8911 8910 16 12:06 ? 00:00:01 xtrabackup_56 --defaults-
root 9089 19473 0 12:06 pts/3 00:00:00 grep --color=auto -i mysql
root@node-21:~# kill -9 8872 8910 8911
root@node-21:~# ps -ef | grep -i mysql
root 9622 19473 0 12:07 pts/3 00:00:00 grep --color=auto -i mysql
## Start MySQL manually and wait for it to finish the recovery and syncronize with the other nodes:
root@node-21:~# /etc/init.d/mysql start
mysql start/running, process 9840
root@node-21:~#
root@node-21:~#
root@node-21:~# tail -f /var/lib/
==> /var/lib/
[notice (again)]
If you use binary log and don't use any hack of group commit,
the binary log position seems to be:
xtrabackup: starting shutdown with innodb_
InnoDB: FTS optimize thread exiting.
InnoDB: Starting shutdown...
InnoDB: Shutdown completed; log sequence number 629429384
160729 12:09:01 innobackupex: completed OK!
==> /var/lib/
innobackupex: back to '/var/lib/mysql'
innobackupex: Starting to move InnoDB log files
innobackupex: in '/var/lib/
innobackupex: back to original InnoDB log directory '/var/lib/mysql'
innobackupex: Moving '/var/lib/
innobackupex: Moving '/var/lib/
innobackupex: Finished copying back files.
160729 12:09:01 innobackupex: completed OK!
^C
## Verify that the service is up & running:
root@node-21:~# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 22
Server version: 5.6.23-
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+------
| Database |
+------
| information_schema |
| aodh |
| cinder |
| glance |
| heat |
| keystone |
| murano |
| mysql |
| neutron |
| nova |
| nova_api |
| performance_schema |
| test |
+------
13 rows in set (0.01 sec)
mysql> exit
Bye
## Stop MySQL, restart pacemaker and wait for it to bring MySQL resource automatically
root@node-21:~# /etc/init.d/mysql stop
mysql stop/waiting
root@node-21:~# service pacemaker start
Starting Pacemaker Cluster Manager[ OK ]
root@node-21:~# ps -ef | grep -i mysql
root 12810 12033 0 12:10 ? 00:00:00 /bin/bash /usr/lib/
root 13026 12810 0 12:10 ? 00:00:00 /bin/bash /usr/lib/
root 13027 13026 0 12:10 ? 00:00:00 /bin/sh /usr/bin/
mysql 13889 13027 56 12:10 ? 00:00:01 /usr/sbin/mysqld --basedir=/usr --datadir=
nobody 13934 13933 0 12:10 ? 00:00:00 mysql -nNE --connect-
nobody 13938 13937 0 12:10 ? 00:00:00 mysql -nNE --connect-
nobody 13942 13941 0 12:10 ? 00:00:00 mysql -nNE --connect-
root 13945 19473 0 12:10 pts/3 00:00:00 grep --color=auto -i mysql
## Verify that MySQL is available:
root@node-21:~# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 27
Server version: 5.6.23-
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+------
| Database |
+------
| information_schema |
| aodh |
| cinder |
| glance |
| heat |
| keystone |
| murano |
| mysql |
| neutron |
| nova |
| nova_api |
| performance_schema |
| test |
+------
13 rows in set (0.00 sec)
mysql> exit
Bye
## Return to the Fuel Web UI and run the deployment again. Now it should finish sucessfully.
The issue has been reproduced several times on distinct controller nodes (actually, on all of the 3 nodes in the setup). This was noticed only in the case of node redeployment (remove node + refresh environment + add node) - the deployment of a fresh environment didn't expose such issues (tested several times also).
The following logs are attached to the case:
- Fuel Web UI puppet log
- Node's puppet log
- MySQL error log
- Pacemaker log
Changed in fuel: | |
milestone: | none → 9.1 |
assignee: | nobody → Fuel Sustaining (fuel-sustaining-team) |
importance: | Undecided → Medium |
status: | New → Confirmed |
tags: | added: customer-found |
tags: | added: area-library |
Changed in fuel: | |
milestone: | 9.1 → 10.0 |
tags: | added: release-notes |
tags: |
added: release-notes-done removed: release-notes |