2019-09-09 21:31:43 |
Anujeyan Manokeran |
bug |
|
|
added bug |
2019-09-09 21:41:11 |
Anujeyan Manokeran |
attachment added |
|
collect logs https://bugs.launchpad.net/starlingx/+bug/1843344/+attachment/5287601/+files/ALL_NODES_20190909.212621.tar |
|
2019-09-11 17:50:05 |
Ghada Khalil |
summary |
IPV6:compute-2 critical 'kubelet' process has failed |
IPV6: compute-2 critical 'kubelet' process has failed |
|
2019-09-11 17:51:46 |
Ghada Khalil |
description |
Brief Description
-----------------
Kubelet process failure on compute-2 after unlock auto recovery triggered to recover Kubelet by reboot compte-2 and never recovered. Compute-2 was in reboot loop.
compute-2:~$ ps -ef | grep kubelet
sysadmin 45596 45415 0 21:05 ttyS0 00:00:00 grep --color=auto kubelet
compute-2:~$
fm alarm-list
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
| 200.004 | compute-2 experienced a service-affecting failure. Auto-recovery in progress. Manual | host=compute-2 | critical | 2019-09-09T21:16 |
| | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | :33.704384 |
| | | | | |
| 200.006 | compute-2 critical 'kubelet' process has failed and could not be auto-recovered | host=compute-2.process= | critical | 2019-09-09T21:16 |
| | gracefully. Auto-recovery progression by host reboot is required and in progress. Manual | kubelet | | :33.600828 |
| | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | |
| | | | | |
| 100.114 | NTP address 2607:5300:60:97 is not a valid or a reachable NTP server. | host=controller-1.ntp= | minor | 2019-09-09T20:35 |
| | | 2607:5300:60:97 | | :24.678864 |
| | | | | |
| 100.114 | NTP address 2600:3c00::f03c is not a valid or a reachable NTP server. | host=controller-0.ntp= | minor | 2019-09-09T20:12 |
| | | 2600:3c00::f03c | | :55.021560 |
| | | | | |
| 200.010 | controller-0 access to board management module has failed. | host=controller-0 | warning | 2019-09-09T19:56 |
| | | | | :31.703469 |
| | | | | |
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
Severity
--------
Critical
Steps to Reproduce
------------------
1. Follow install procedure for regular system with IPv6 configuration .
2. Install controller-0 and configure with ansiable
3. Install all other nodes with deployment manager.
4. After compute-2 unlock as per description compute-2 failure
System Configuration
--------------------
Regular system with IPv6 configuration
Expected Behavior
------------------
Kubelet process up and running
Actual Behavior
----------------
As per description Kubelet process failing
Reproducibility
---------------
Tested only once in this load.
System Configuration
--------------------
Regular system IPV6
Load
----
Build was on " 2019-09-09_00-10-00
Last Pass
---------
Build was on "2019-09-08_00-10-00
Timestamp/Logs
--------------
2019-09-09T15:42:20.000
Test Activity
-------------
Regression test |
Brief Description
-----------------
Kubelet process failure on compute-2 after unlock auto recovery triggered to recover Kubelet by reboot compte-2 and never recovered. Compute-2 was in reboot loop.
compute-2:~$ ps -ef | grep kubelet
sysadmin 45596 45415 0 21:05 ttyS0 00:00:00 grep --color=auto kubelet
compute-2:~$
fm alarm-list
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
| 200.004 | compute-2 experienced a service-affecting failure. Auto-recovery in progress. Manual | host=compute-2 | critical | 2019-09-09T21:16 |
| | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | :33.704384 |
| | | | | |
| 200.006 | compute-2 critical 'kubelet' process has failed and could not be auto-recovered | host=compute-2.process= | critical | 2019-09-09T21:16 |
| | gracefully. Auto-recovery progression by host reboot is required and in progress. Manual | kubelet | | :33.600828 |
| | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | |
| | | | | |
| 100.114 | NTP address 2607:5300:60:97 is not a valid or a reachable NTP server. | host=controller-1.ntp= | minor | 2019-09-09T20:35 |
| | | 2607:5300:60:97 | | :24.678864 |
| | | | | |
| 100.114 | NTP address 2600:3c00::f03c is not a valid or a reachable NTP server. | host=controller-0.ntp= | minor | 2019-09-09T20:12 |
| | | 2600:3c00::f03c | | :55.021560 |
| | | | | |
| 200.010 | controller-0 access to board management module has failed. | host=controller-0 | warning | 2019-09-09T19:56 |
| | | | | :31.703469 |
| | | | | |
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
Severity
--------
Critical
Steps to Reproduce
------------------
1. Follow install procedure for regular system with IPv6 configuration .
2. Install controller-0 and configure with ansible
3. Install all other nodes
4. After compute-2 unlock as per description compute-2 failure
System Configuration
--------------------
Regular system with IPv6 configuration
Expected Behavior
------------------
Kubelet process up and running
Actual Behavior
----------------
As per description Kubelet process failing
Reproducibility
---------------
Tested only once in this load.
System Configuration
--------------------
Regular system IPV6 - wolfpass-3-7
Load
----
Build was on " 2019-09-09_00-10-00
Last Pass
---------
Build was on "2019-09-08_00-10-00
Timestamp/Logs
--------------
2019-09-09T15:42:20.000
Test Activity
-------------
Regression test |
|
2019-09-11 17:51:54 |
Ghada Khalil |
description |
Brief Description
-----------------
Kubelet process failure on compute-2 after unlock auto recovery triggered to recover Kubelet by reboot compte-2 and never recovered. Compute-2 was in reboot loop.
compute-2:~$ ps -ef | grep kubelet
sysadmin 45596 45415 0 21:05 ttyS0 00:00:00 grep --color=auto kubelet
compute-2:~$
fm alarm-list
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
| 200.004 | compute-2 experienced a service-affecting failure. Auto-recovery in progress. Manual | host=compute-2 | critical | 2019-09-09T21:16 |
| | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | :33.704384 |
| | | | | |
| 200.006 | compute-2 critical 'kubelet' process has failed and could not be auto-recovered | host=compute-2.process= | critical | 2019-09-09T21:16 |
| | gracefully. Auto-recovery progression by host reboot is required and in progress. Manual | kubelet | | :33.600828 |
| | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | |
| | | | | |
| 100.114 | NTP address 2607:5300:60:97 is not a valid or a reachable NTP server. | host=controller-1.ntp= | minor | 2019-09-09T20:35 |
| | | 2607:5300:60:97 | | :24.678864 |
| | | | | |
| 100.114 | NTP address 2600:3c00::f03c is not a valid or a reachable NTP server. | host=controller-0.ntp= | minor | 2019-09-09T20:12 |
| | | 2600:3c00::f03c | | :55.021560 |
| | | | | |
| 200.010 | controller-0 access to board management module has failed. | host=controller-0 | warning | 2019-09-09T19:56 |
| | | | | :31.703469 |
| | | | | |
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
Severity
--------
Critical
Steps to Reproduce
------------------
1. Follow install procedure for regular system with IPv6 configuration .
2. Install controller-0 and configure with ansible
3. Install all other nodes
4. After compute-2 unlock as per description compute-2 failure
System Configuration
--------------------
Regular system with IPv6 configuration
Expected Behavior
------------------
Kubelet process up and running
Actual Behavior
----------------
As per description Kubelet process failing
Reproducibility
---------------
Tested only once in this load.
System Configuration
--------------------
Regular system IPV6 - wolfpass-3-7
Load
----
Build was on " 2019-09-09_00-10-00
Last Pass
---------
Build was on "2019-09-08_00-10-00
Timestamp/Logs
--------------
2019-09-09T15:42:20.000
Test Activity
-------------
Regression test |
Brief Description
-----------------
Kubelet process failure on compute-2 after unlock auto recovery triggered to recover Kubelet by reboot compute-2 and never recovered. Compute-2 was in reboot loop.
compute-2:~$ ps -ef | grep kubelet
sysadmin 45596 45415 0 21:05 ttyS0 00:00:00 grep --color=auto kubelet
compute-2:~$
fm alarm-list
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
| 200.004 | compute-2 experienced a service-affecting failure. Auto-recovery in progress. Manual | host=compute-2 | critical | 2019-09-09T21:16 |
| | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | :33.704384 |
| | | | | |
| 200.006 | compute-2 critical 'kubelet' process has failed and could not be auto-recovered | host=compute-2.process= | critical | 2019-09-09T21:16 |
| | gracefully. Auto-recovery progression by host reboot is required and in progress. Manual | kubelet | | :33.600828 |
| | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | |
| | | | | |
| 100.114 | NTP address 2607:5300:60:97 is not a valid or a reachable NTP server. | host=controller-1.ntp= | minor | 2019-09-09T20:35 |
| | | 2607:5300:60:97 | | :24.678864 |
| | | | | |
| 100.114 | NTP address 2600:3c00::f03c is not a valid or a reachable NTP server. | host=controller-0.ntp= | minor | 2019-09-09T20:12 |
| | | 2600:3c00::f03c | | :55.021560 |
| | | | | |
| 200.010 | controller-0 access to board management module has failed. | host=controller-0 | warning | 2019-09-09T19:56 |
| | | | | :31.703469 |
| | | | | |
+----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+
Severity
--------
Critical
Steps to Reproduce
------------------
1. Follow install procedure for regular system with IPv6 configuration .
2. Install controller-0 and configure with ansible
3. Install all other nodes
4. After compute-2 unlock as per description compute-2 failure
System Configuration
--------------------
Regular system with IPv6 configuration
Expected Behavior
------------------
Kubelet process up and running
Actual Behavior
----------------
As per description Kubelet process failing
Reproducibility
---------------
Tested only once in this load.
System Configuration
--------------------
Regular system IPV6 - wolfpass-3-7
Load
----
Build was on " 2019-09-09_00-10-00
Last Pass
---------
Build was on "2019-09-08_00-10-00
Timestamp/Logs
--------------
2019-09-09T15:42:20.000
Test Activity
-------------
Regression test |
|
2019-09-13 19:07:50 |
Ghada Khalil |
tags |
|
stx.containers |
|
2019-09-13 19:07:57 |
Ghada Khalil |
tags |
stx.containers |
stx.3.0 stx.containers |
|
2019-09-13 19:08:22 |
Ghada Khalil |
summary |
IPV6: compute-2 critical 'kubelet' process has failed |
IPV6: compute-2 in reboot loop due to critical 'kubelet' process failure |
|
2019-09-13 19:09:20 |
Ghada Khalil |
bug |
|
|
added subscriber Bill Zvonar |
2019-09-13 19:09:25 |
Ghada Khalil |
starlingx: status |
New |
Triaged |
|
2019-09-13 19:09:30 |
Ghada Khalil |
starlingx: importance |
Undecided |
High |
|
2019-09-13 19:09:44 |
Ghada Khalil |
starlingx: assignee |
|
Bart Wensley (bartwensley) |
|
2019-09-13 19:24:57 |
Ghada Khalil |
tags |
stx.3.0 stx.containers |
stx.3.0 stx.containers stx.retestneeded |
|
2019-09-19 12:36:26 |
Frank Miller |
starlingx: assignee |
Bart Wensley (bartwensley) |
Eric MacDonald (rocksolidmtce) |
|
2019-09-19 18:32:42 |
Bart Wensley |
bug |
|
|
added subscriber Bart Wensley |
2019-09-25 22:09:29 |
Eddy Raineri |
bug |
|
|
added subscriber Eddy Raineri |
2019-09-30 18:37:47 |
OpenStack Infra |
starlingx: status |
Triaged |
In Progress |
|
2019-10-01 14:40:27 |
OpenStack Infra |
starlingx: status |
In Progress |
Fix Released |
|
2019-10-25 18:35:40 |
Anujeyan Manokeran |
tags |
stx.3.0 stx.containers stx.retestneeded |
stx.3.0 stx.containers |
|