The force reboot of controller-0 happened here:
[2019-02-20 01:40:28,952] 139 INFO MainThread host_helper.reboot_hosts:: Rebooting active controller: controller-0
[2019-02-20 01:40:28,952] 262 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'
It appears that all the openstack pods were restarted. I guess this is because etcd goes away when controller-0 is killed. It looks to me like the problem is that the mariadb pods did not come up properly:
mariadb-ingress-9d475c8c7-46kgs 0/1 Running 0 16h 172.16.1.77 controller-1 <none>
mariadb-ingress-9d475c8c7-7td6w 0/1 Running 0 16h 172.16.1.76 controller-1 <none>
mariadb-ingress-error-pages-6b55f4468c-nhkvv 1/1 Running 0 16h 172.16.1.78 controller-1 <none>
mariadb-server-0 0/1 Running 0 16h 172.16.0.201 controller-0 <none>
mariadb-server-1 0/1 Running 0 16h 172.16.1.89 controller-1 <none>
The garbd seems to be OK:
osh-openstack-garbd-garbd-5744f5f85-cjhrb 1/1 Running 0 18h 172.16.2.2 compute-0 <none>
The mariadb-server-0 pod seems to be stuck in a loop - the following logs are repeating forever:
2019-02-20 17:59:24,021 - OpenStack-Helm Mariadb - INFO - Cluster info has been uptodate 0 times out of the required 12
2019-02-20 17:59:24,022 - OpenStack-Helm Mariadb - INFO - Checking to see if cluster data is fresh
2019-02-20 17:59:24,027 - OpenStack-Helm Mariadb - INFO - The data we have from the cluster is too old to make a decision for node mariadb-server-1
2019-02-20 17:59:24,027 - OpenStack-Helm Mariadb - INFO - The data we have from the cluster is ok for node mariadb-server-0
2019-02-20 17:59:27,372 - OpenStack-Helm Mariadb - INFO - Updating grastate configmap
The mariadb-server-1 pod stops generating logs shortly after it comes up:
2019-02-20 01:50:51,516 - OpenStack-Helm Mariadb - INFO - Cluster info has been uptodate 0 times out of the required 12
2019-02-20 01:50:51,516 - OpenStack-Helm Mariadb - INFO - Checking to see if cluster data is fresh
2019-02-20 01:50:51,521 - OpenStack-Helm Mariadb - INFO - The data we have from the cluster is ok for node mariadb-server-1
2019-02-20 01:50:51,521 - OpenStack-Helm Mariadb - INFO - The data we have from the cluster is too old to make a decision for node mariadb-server-0
2019-02-20 01:50:51,545 - OpenStack-Helm Mariadb - INFO - Updating grastate configmap
2019-02-20 01:51:01,531 - OpenStack-Helm Mariadb - INFO - Cluster info has been uptodate 0 times out of the required 12
2019-02-20 01:51:01,531 - OpenStack-Helm Mariadb - INFO - Checking to see if cluster data is fresh
2019-02-20 01:51:01,568 - OpenStack-Helm Mariadb - INFO - Updating grastate configmap
The garbd pod can't seem to connect to either of the mariadb-servers:
2019-02-20 18:03:27.728 INFO: (f14c4149, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.16.0.175:4567 timed out, no messages seen in PT3S
2019-02-20 18:03:30.228 INFO: (f14c4149, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.16.0.21:4567 timed out, no messages seen in PT3S
2019-02-20 18:03:32.729 INFO: (f14c4149, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.16.0.175:4567 timed out, no messages seen in PT3S
2019-02-20 18:03:35.229 INFO: (f14c4149, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.16.0.21:4567 timed out, no messages seen in PT3S
2019-02-20 18:03:37.729 INFO: (f14c4149, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.16.0.175:4567 timed out, no messages seen in PT3S
2019-02-20 18:03:40.229 INFO: (f14c4149, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.16.0.21:4567 timed out, no messages seen in PT3S
The force reboot of controller-0 happened here: reboot_ hosts:: Rebooting active controller: controller-0
[2019-02-20 01:40:28,952] 139 INFO MainThread host_helper.
[2019-02-20 01:40:28,952] 262 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'
It appears that all the openstack pods were restarted. I guess this is because etcd goes away when controller-0 is killed. It looks to me like the problem is that the mariadb pods did not come up properly: ingress- 9d475c8c7- 46kgs 0/1 Running 0 16h 172.16.1.77 controller-1 <none> ingress- 9d475c8c7- 7td6w 0/1 Running 0 16h 172.16.1.76 controller-1 <none> ingress- error-pages- 6b55f4468c- nhkvv 1/1 Running 0 16h 172.16.1.78 controller-1 <none>
mariadb-
mariadb-
mariadb-
mariadb-server-0 0/1 Running 0 16h 172.16.0.201 controller-0 <none>
mariadb-server-1 0/1 Running 0 16h 172.16.1.89 controller-1 <none>
The garbd seems to be OK: garbd-garbd- 5744f5f85- cjhrb 1/1 Running 0 18h 172.16.2.2 compute-0 <none>
osh-openstack-
The mariadb-server-0 pod seems to be stuck in a loop - the following logs are repeating forever:
2019-02-20 17:59:24,021 - OpenStack-Helm Mariadb - INFO - Cluster info has been uptodate 0 times out of the required 12
2019-02-20 17:59:24,022 - OpenStack-Helm Mariadb - INFO - Checking to see if cluster data is fresh
2019-02-20 17:59:24,027 - OpenStack-Helm Mariadb - INFO - The data we have from the cluster is too old to make a decision for node mariadb-server-1
2019-02-20 17:59:24,027 - OpenStack-Helm Mariadb - INFO - The data we have from the cluster is ok for node mariadb-server-0
2019-02-20 17:59:27,372 - OpenStack-Helm Mariadb - INFO - Updating grastate configmap
The mariadb-server-1 pod stops generating logs shortly after it comes up:
2019-02-20 01:50:51,516 - OpenStack-Helm Mariadb - INFO - Cluster info has been uptodate 0 times out of the required 12
2019-02-20 01:50:51,516 - OpenStack-Helm Mariadb - INFO - Checking to see if cluster data is fresh
2019-02-20 01:50:51,521 - OpenStack-Helm Mariadb - INFO - The data we have from the cluster is ok for node mariadb-server-1
2019-02-20 01:50:51,521 - OpenStack-Helm Mariadb - INFO - The data we have from the cluster is too old to make a decision for node mariadb-server-0
2019-02-20 01:50:51,545 - OpenStack-Helm Mariadb - INFO - Updating grastate configmap
2019-02-20 01:51:01,531 - OpenStack-Helm Mariadb - INFO - Cluster info has been uptodate 0 times out of the required 12
2019-02-20 01:51:01,531 - OpenStack-Helm Mariadb - INFO - Checking to see if cluster data is fresh
2019-02-20 01:51:01,568 - OpenStack-Helm Mariadb - INFO - Updating grastate configmap
The garbd pod can't seem to connect to either of the mariadb-servers: 0.0.0.0: 4567' ) connection to peer 00000000 with addr tcp://172. 16.0.175: 4567 timed out, no messages seen in PT3S 0.0.0.0: 4567' ) connection to peer 00000000 with addr tcp://172. 16.0.21: 4567 timed out, no messages seen in PT3S 0.0.0.0: 4567' ) connection to peer 00000000 with addr tcp://172. 16.0.175: 4567 timed out, no messages seen in PT3S 0.0.0.0: 4567' ) connection to peer 00000000 with addr tcp://172. 16.0.21: 4567 timed out, no messages seen in PT3S 0.0.0.0: 4567' ) connection to peer 00000000 with addr tcp://172. 16.0.175: 4567 timed out, no messages seen in PT3S 0.0.0.0: 4567' ) connection to peer 00000000 with addr tcp://172. 16.0.21: 4567 timed out, no messages seen in PT3S
2019-02-20 18:03:27.728 INFO: (f14c4149, 'tcp://
2019-02-20 18:03:30.228 INFO: (f14c4149, 'tcp://
2019-02-20 18:03:32.729 INFO: (f14c4149, 'tcp://
2019-02-20 18:03:35.229 INFO: (f14c4149, 'tcp://
2019-02-20 18:03:37.729 INFO: (f14c4149, 'tcp://
2019-02-20 18:03:40.229 INFO: (f14c4149, 'tcp://