Activity log for bug #1843344

Date Who What changed Old value New value Message
2019-09-09 21:31:43 Anujeyan Manokeran bug added bug
2019-09-09 21:41:11 Anujeyan Manokeran attachment added collect logs https://bugs.launchpad.net/starlingx/+bug/1843344/+attachment/5287601/+files/ALL_NODES_20190909.212621.tar
2019-09-11 17:50:05 Ghada Khalil summary IPV6:compute-2 critical 'kubelet' process has failed IPV6: compute-2 critical 'kubelet' process has failed
2019-09-11 17:51:46 Ghada Khalil description Brief Description ----------------- Kubelet process failure on compute-2 after unlock auto recovery triggered to recover Kubelet by reboot compte-2 and never recovered. Compute-2 was in reboot loop. compute-2:~$ ps -ef | grep kubelet sysadmin 45596 45415 0 21:05 ttyS0 00:00:00 grep --color=auto kubelet compute-2:~$ fm alarm-list +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp | +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ | 200.004 | compute-2 experienced a service-affecting failure. Auto-recovery in progress. Manual | host=compute-2 | critical | 2019-09-09T21:16 | | | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | :33.704384 | | | | | | | | 200.006 | compute-2 critical 'kubelet' process has failed and could not be auto-recovered | host=compute-2.process= | critical | 2019-09-09T21:16 | | | gracefully. Auto-recovery progression by host reboot is required and in progress. Manual | kubelet | | :33.600828 | | | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | | | | | | | | | 100.114 | NTP address 2607:5300:60:97 is not a valid or a reachable NTP server. | host=controller-1.ntp= | minor | 2019-09-09T20:35 | | | | 2607:5300:60:97 | | :24.678864 | | | | | | | | 100.114 | NTP address 2600:3c00::f03c is not a valid or a reachable NTP server. | host=controller-0.ntp= | minor | 2019-09-09T20:12 | | | | 2600:3c00::f03c | | :55.021560 | | | | | | | | 200.010 | controller-0 access to board management module has failed. | host=controller-0 | warning | 2019-09-09T19:56 | | | | | | :31.703469 | | | | | | | +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ Severity -------- Critical Steps to Reproduce ------------------ 1. Follow install procedure for regular system with IPv6 configuration . 2. Install controller-0 and configure with ansiable 3. Install all other nodes with deployment manager. 4. After compute-2 unlock as per description compute-2 failure System Configuration -------------------- Regular system with IPv6 configuration Expected Behavior ------------------ Kubelet process up and running Actual Behavior ---------------- As per description Kubelet process failing Reproducibility --------------- Tested only once in this load. System Configuration -------------------- Regular system IPV6 Load ---- Build was on " 2019-09-09_00-10-00 Last Pass --------- Build was on "2019-09-08_00-10-00 Timestamp/Logs -------------- 2019-09-09T15:42:20.000 Test Activity ------------- Regression test Brief Description ----------------- Kubelet process failure on compute-2 after unlock auto recovery triggered to recover Kubelet by reboot compte-2 and never recovered. Compute-2 was in reboot loop. compute-2:~$ ps -ef | grep kubelet sysadmin 45596 45415 0 21:05 ttyS0 00:00:00 grep --color=auto kubelet compute-2:~$ fm alarm-list +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp | +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ | 200.004 | compute-2 experienced a service-affecting failure. Auto-recovery in progress. Manual | host=compute-2 | critical | 2019-09-09T21:16 | | | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | :33.704384 | | | | | | | | 200.006 | compute-2 critical 'kubelet' process has failed and could not be auto-recovered | host=compute-2.process= | critical | 2019-09-09T21:16 | | | gracefully. Auto-recovery progression by host reboot is required and in progress. Manual | kubelet | | :33.600828 | | | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | | | | | | | | | 100.114 | NTP address 2607:5300:60:97 is not a valid or a reachable NTP server. | host=controller-1.ntp= | minor | 2019-09-09T20:35 | | | | 2607:5300:60:97 | | :24.678864 | | | | | | | | 100.114 | NTP address 2600:3c00::f03c is not a valid or a reachable NTP server. | host=controller-0.ntp= | minor | 2019-09-09T20:12 | | | | 2600:3c00::f03c | | :55.021560 | | | | | | | | 200.010 | controller-0 access to board management module has failed. | host=controller-0 | warning | 2019-09-09T19:56 | | | | | | :31.703469 | | | | | | | +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ Severity -------- Critical Steps to Reproduce ------------------ 1. Follow install procedure for regular system with IPv6 configuration . 2. Install controller-0 and configure with ansible 3. Install all other nodes 4. After compute-2 unlock as per description compute-2 failure System Configuration -------------------- Regular system with IPv6 configuration Expected Behavior ------------------ Kubelet process up and running Actual Behavior ---------------- As per description Kubelet process failing Reproducibility --------------- Tested only once in this load. System Configuration -------------------- Regular system IPV6 - wolfpass-3-7 Load ---- Build was on " 2019-09-09_00-10-00 Last Pass --------- Build was on "2019-09-08_00-10-00 Timestamp/Logs -------------- 2019-09-09T15:42:20.000 Test Activity ------------- Regression test
2019-09-11 17:51:54 Ghada Khalil description Brief Description ----------------- Kubelet process failure on compute-2 after unlock auto recovery triggered to recover Kubelet by reboot compte-2 and never recovered. Compute-2 was in reboot loop. compute-2:~$ ps -ef | grep kubelet sysadmin 45596 45415 0 21:05 ttyS0 00:00:00 grep --color=auto kubelet compute-2:~$ fm alarm-list +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp | +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ | 200.004 | compute-2 experienced a service-affecting failure. Auto-recovery in progress. Manual | host=compute-2 | critical | 2019-09-09T21:16 | | | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | :33.704384 | | | | | | | | 200.006 | compute-2 critical 'kubelet' process has failed and could not be auto-recovered | host=compute-2.process= | critical | 2019-09-09T21:16 | | | gracefully. Auto-recovery progression by host reboot is required and in progress. Manual | kubelet | | :33.600828 | | | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | | | | | | | | | 100.114 | NTP address 2607:5300:60:97 is not a valid or a reachable NTP server. | host=controller-1.ntp= | minor | 2019-09-09T20:35 | | | | 2607:5300:60:97 | | :24.678864 | | | | | | | | 100.114 | NTP address 2600:3c00::f03c is not a valid or a reachable NTP server. | host=controller-0.ntp= | minor | 2019-09-09T20:12 | | | | 2600:3c00::f03c | | :55.021560 | | | | | | | | 200.010 | controller-0 access to board management module has failed. | host=controller-0 | warning | 2019-09-09T19:56 | | | | | | :31.703469 | | | | | | | +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ Severity -------- Critical Steps to Reproduce ------------------ 1. Follow install procedure for regular system with IPv6 configuration . 2. Install controller-0 and configure with ansible 3. Install all other nodes 4. After compute-2 unlock as per description compute-2 failure System Configuration -------------------- Regular system with IPv6 configuration Expected Behavior ------------------ Kubelet process up and running Actual Behavior ---------------- As per description Kubelet process failing Reproducibility --------------- Tested only once in this load. System Configuration -------------------- Regular system IPV6 - wolfpass-3-7 Load ---- Build was on " 2019-09-09_00-10-00 Last Pass --------- Build was on "2019-09-08_00-10-00 Timestamp/Logs -------------- 2019-09-09T15:42:20.000 Test Activity ------------- Regression test Brief Description ----------------- Kubelet process failure on compute-2 after unlock auto recovery triggered to recover Kubelet by reboot compute-2 and never recovered. Compute-2 was in reboot loop. compute-2:~$ ps -ef | grep kubelet sysadmin 45596 45415 0 21:05 ttyS0 00:00:00 grep --color=auto kubelet compute-2:~$ fm alarm-list +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp | +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ | 200.004 | compute-2 experienced a service-affecting failure. Auto-recovery in progress. Manual | host=compute-2 | critical | 2019-09-09T21:16 | | | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | :33.704384 | | | | | | | | 200.006 | compute-2 critical 'kubelet' process has failed and could not be auto-recovered | host=compute-2.process= | critical | 2019-09-09T21:16 | | | gracefully. Auto-recovery progression by host reboot is required and in progress. Manual | kubelet | | :33.600828 | | | Lock and Unlock may be required if auto-recovery is unsuccessful. | | | | | | | | | | | 100.114 | NTP address 2607:5300:60:97 is not a valid or a reachable NTP server. | host=controller-1.ntp= | minor | 2019-09-09T20:35 | | | | 2607:5300:60:97 | | :24.678864 | | | | | | | | 100.114 | NTP address 2600:3c00::f03c is not a valid or a reachable NTP server. | host=controller-0.ntp= | minor | 2019-09-09T20:12 | | | | 2600:3c00::f03c | | :55.021560 | | | | | | | | 200.010 | controller-0 access to board management module has failed. | host=controller-0 | warning | 2019-09-09T19:56 | | | | | | :31.703469 | | | | | | | +----------+------------------------------------------------------------------------------------------+-------------------------+----------+------------------+ Severity -------- Critical Steps to Reproduce ------------------ 1. Follow install procedure for regular system with IPv6 configuration . 2. Install controller-0 and configure with ansible 3. Install all other nodes 4. After compute-2 unlock as per description compute-2 failure System Configuration -------------------- Regular system with IPv6 configuration Expected Behavior ------------------ Kubelet process up and running Actual Behavior ---------------- As per description Kubelet process failing Reproducibility --------------- Tested only once in this load. System Configuration -------------------- Regular system IPV6 - wolfpass-3-7 Load ---- Build was on " 2019-09-09_00-10-00 Last Pass --------- Build was on "2019-09-08_00-10-00 Timestamp/Logs -------------- 2019-09-09T15:42:20.000 Test Activity ------------- Regression test
2019-09-13 19:07:50 Ghada Khalil tags stx.containers
2019-09-13 19:07:57 Ghada Khalil tags stx.containers stx.3.0 stx.containers
2019-09-13 19:08:22 Ghada Khalil summary IPV6: compute-2 critical 'kubelet' process has failed IPV6: compute-2 in reboot loop due to critical 'kubelet' process failure
2019-09-13 19:09:20 Ghada Khalil bug added subscriber Bill Zvonar
2019-09-13 19:09:25 Ghada Khalil starlingx: status New Triaged
2019-09-13 19:09:30 Ghada Khalil starlingx: importance Undecided High
2019-09-13 19:09:44 Ghada Khalil starlingx: assignee Bart Wensley (bartwensley)
2019-09-13 19:24:57 Ghada Khalil tags stx.3.0 stx.containers stx.3.0 stx.containers stx.retestneeded
2019-09-19 12:36:26 Frank Miller starlingx: assignee Bart Wensley (bartwensley) Eric MacDonald (rocksolidmtce)
2019-09-19 18:32:42 Bart Wensley bug added subscriber Bart Wensley
2019-09-25 22:09:29 Eddy Raineri bug added subscriber Eddy Raineri
2019-09-30 18:37:47 OpenStack Infra starlingx: status Triaged In Progress
2019-10-01 14:40:27 OpenStack Infra starlingx: status In Progress Fix Released
2019-10-25 18:35:40 Anujeyan Manokeran tags stx.3.0 stx.containers stx.retestneeded stx.3.0 stx.containers