After reboot active controller, controller in disabled/failed state
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Tee Ngo |
Bug Description
Brief Description
-----------------
In multi nodes system, after rebooting active controller, this controller is in disable/failed state
Severity
--------
Major
Steps to Reproduce
------------------
as description
TC-name: mtc/test_
Expected Behavior
------------------
Actual Behavior
----------------
Reproducibility
---------------
Intermittent
System Configuration
-------
Multi-node system
Lab-name: WCP_113-121
Branch/Pull Time/Commit
-------
stx master as of 20190517T013000Z
Last Pass
---------
Timestamp/Logs
--------------
[2019-05-17 09:13:47,567] 262 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-05-17 09:13:49,161] 387 DEBUG MainThread ssh.expect :: Output:
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
| 3 | storage-0 | storage | unlocked | enabled | available |
| 4 | storage-1 | storage | unlocked | enabled | available |
| 5 | compute-0 | worker | unlocked | enabled | available |
| 6 | compute-1 | worker | unlocked | enabled | available |
| 7 | compute-2 | worker | unlocked | enabled | available |
| 8 | compute-3 | worker | unlocked | enabled | available |
| 9 | compute-4 | worker | unlocked | enabled | available |
+----+-
[2019-05-17 09:22:40,335] 139 INFO MainThread host_helper.
[2019-05-17 09:22:40,335] 262 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'
[2019-05-17 09:26:13,928] 262 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-05-17 09:26:15,485] 387 DEBUG MainThread ssh.expect :: Output:
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | unlocked | disabled | offline |
| 2 | controller-1 | controller | unlocked | enabled | available |
| 3 | storage-0 | storage | unlocked | enabled | available |
| 4 | storage-1 | storage | unlocked | enabled | available |
| 5 | compute-0 | worker | unlocked | enabled | available |
| 6 | compute-1 | worker | unlocked | enabled | available |
| 7 | compute-2 | worker | unlocked | enabled | available |
| 8 | compute-3 | worker | unlocked | enabled | available |
| 9 | compute-4 | worker | unlocked | enabled | available |
+----+-
[2019-05-17 10:18:54,294] 262 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-05-17 10:18:55,918] 387 DEBUG MainThread ssh.expect :: Output:
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | unlocked | disabled | failed |
| 2 | controller-1 | controller | unlocked | enabled | available |
| 3 | storage-0 | storage | unlocked | enabled | available |
| 4 | storage-1 | storage | unlocked | enabled | available |
| 5 | compute-0 | worker | unlocked | enabled | available |
| 6 | compute-1 | worker | unlocked | enabled | available |
| 7 | compute-2 | worker | unlocked | enabled | available |
| 8 | compute-3 | worker | unlocked | enabled | available |
| 9 | compute-4 | worker | unlocked | enabled | available |
+----+-
controller-1:~$
Test Activity
-------------
Sanity
tags: | added: stx.sanity |
summary: |
- After reboot active controller, controller in disable/failed state + After reboot active controller, controller in disabled/failed state |
tags: | added: stx.retestneeded |
There are two puppet errors 2019-05- 17-07-39- 41_controller/ puppet. log:2019- 05-17T07: 42:03.517 Error: 2019-05-17 07:42:03 +0000 kubeadm init --config= /etc/kubernetes /kubeadm. yaml returned 1 instead of one of [0]
puppet/
puppet/ 2019-05- 17-07-39- 41_controller/ puppet. log:2019- 05-17T07: 42:03.617 Error: 2019-05-17 07:42:03 +0000 /Stage[ main]/Platform: :Kubernetes: :Master: :Init/Exec[ configure master node]/returns: change from notrun to 0 failed: kubeadm init --config= /etc/kubernetes /kubeadm. yaml returned 1 instead of one of [0]
Here are more logs related to the failed kubernetes command:
2019-05- 17T07:41: 45.660 ^[[0;36mDebug: 2019-05-17 07:41:45 +0000 Exec[configure master node](provider= posix): Executing 'kubeadm init --config= /etc/kubernetes /kubeadm. yaml'^[ [0m 17T07:41: 45.662 ^[[0;36mDebug: 2019-05-17 07:41:45 +0000 Executing: 'kubeadm init --config= /etc/kubernetes /kubeadm. yaml'^[ [0m 17T07:42: 03.488 ^[[mNotice: 2019-05-17 07:42:03 +0000 /Stage[ main]/Platform: :Kubernetes: :Master: :Init/Exec[ configure master node]/returns: [init] Using Kubernetes version: v1.13.5^[[0m 17T07:42: 03.491 ^[[mNotice: 2019-05-17 07:42:03 +0000 /Stage[ main]/Platform: :Kubernetes: :Master: :Init/Exec[ configure master node]/returns: [preflight] Running pre-flight checks^[[0m 17T07:42: 03.493 ^[[mNotice: 2019-05-17 07:42:03 +0000 /Stage[ main]/Platform: :Kubernetes: :Master: :Init/Exec[ configure master node]/returns: [preflight] Pulling images required for setting up a Kubernetes cluster^[[0m 17T07:42: 03.495 ^[[mNotice: 2019-05-17 07:42:03 +0000 /Stage[ main]/Platform: :Kubernetes: :Master: :Init/Exec[ configure master node]/returns: [preflight] This might take a minute or two, depending on the speed of your internet connection^[[0m 17T07:42: 03.498 ^[[mNotice: 2019-05-17 07:42:03 +0000 /Stage[ main]/Platform: :Kubernetes: :Master: :Init/Exec[ configure master node]/returns: [preflight] You can also perform this action in beforehand using 'kubeadm config images pull'^[[0m 17T07:42: 03.500 ^[[mNotice: 2019-05-17 07:42:03 +0000 /Stage[ main]/Platform: :Kubernetes: :Master: :Init/Exec[ configure master node]/returns: error execution phase preflight: [preflight] Some fatal errors occurred:^[[0m 17T07:42: 03.502 ^[[mNotice: 2019-05-17 07:42:03 +0000 /Stage[ main]/Platform: :Kubernetes: :Master: :Init/Exec[ configure master node]/returns: [ERROR ImagePull]: failed to pull image k8s.gcr. io/kube- apiserver: v1.13.5: output: Error response from daemon: Get https:/ /k8s.gcr. io/v2/: dial tcp 108.177.111.82:443: connect: no route to host^[[0m 17T07:42: 03.504 ^[[mNotice: 2019-05-17 07:42:03 +0000 /Stage[ main]/Platform: :Kubernetes: :Master: :Init/Exec[ configure master node]/returns: , error: exit status 1^[[0m 17T07:42: 03.506 ^[[mNotice: 2019-05-17 07:42:03 +0000 /Stage[ main]/Platform: :Kubernetes: :Master: :Init/Exec[ configure master node]/returns: [ERROR ImagePull]: failed to pull image k8s.gcr. io/kube- controller- manager: v1.13.5: output: Error response from daemon: Get https:/ /k8s.gcr. io/v2/: dial tcp 108.177.111.82:443: connect: no route to host^[[0m 17T07:42: 03.508 ^[[mNotice: 2019-05-17 07:42:03 +0000 /Stage[ main]/Platform: :Ku...
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-