Activity log for bug #1884111

Date Who What changed Old value New value Message
2020-06-18 17:46:41 Ghada Khalil bug added bug
2020-06-18 17:47:22 Ghada Khalil starlingx: assignee Chris Friesen (cbf123)
2020-06-18 17:47:31 Ghada Khalil bug added subscriber Daniel Badea
2020-06-18 17:50:43 Ghada Khalil description Brief Description ----------------- After the initial unlock of a worker node, docker failed to start. The subsequent reboot succeeded. Severity -------- Major Steps to Reproduce ------------------ Nothing special. Initial unlock of a worker node. Expected Behavior ------------------ Worker node should come up Actual Behavior ---------------- Worker node fails and requires an additional reboot. Reproducibility --------------- Intermittent. Seen a few times so far. System Configuration -------------------- multi-node system Branch/Pull Time/Commit ----------------------- stx master load since May 2020 Last Pass --------- N/A - Issue is intermittent Timestamp/Logs -------------- Logs are available from 2 occurrences: May 2020: 2020-05-13T11:34:39.533 ^[[1;31mError: 2020-05-13 11:34:38 +0000 /Stage[main]/Platform::Docker::Config/Service[docker]/ensure: change from stopped to running failed: Systemd start for docker failed! 2020-05-13T11:34:39.538 journalctl log for docker: 2020-05-13T11:34:39.545 – No entries – June 2020: 2020-06-17T15:13:28.172 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Containerd::Config/Exec[restart-containerd]: The container Class[Platform::Containerd::Config] will propagate my refresh event^[[0m 2020-06-17T15:13:28.178 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Class[Platform::Containerd::Config]: The container Stage[main] will propagate my refresh event^[[0m 2020-06-17T15:13:28.183 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[mount /dev/cgts-vg/scratch-lv](provider=posix): Executing check 'mount | awk '{print $3}' | grep -Fxq /scratch'^[[0m 2020-06-17T15:13:28.186 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'mount | awk '{print $3}' | grep -Fxq /scratch'^[[0m 2020-06-17T15:13:28.189 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[Change /scratch dir permissions](provider=posix): Executing 'chmod 0770 /scratch'^[[0m 2020-06-17T15:13:28.193 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'chmod 0770 /scratch'^[[0m 2020-06-17T15:13:28.198 ^[[mNotice: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir permissions]/returns: executed successfully^[[0m 2020-06-17T15:13:28.200 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir permissions]: The container Platform::Filesystem[scratch-lv] will propagate my refresh event^[[0m 2020-06-17T15:13:28.203 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[Change /scratch dir group](provider=posix): Executing 'chgrp sys_protected /scratch'^[[0m 2020-06-17T15:13:28.206 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'chgrp sys_protected /scratch'^[[0m 2020-06-17T15:13:28.209 ^[[mNotice: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir group]/returns: executed successfully^[[0m 2020-06-17T15:13:28.212 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir group]: The container Platform::Filesystem[scratch-lv] will propagate my refresh event^[[0m 2020-06-17T15:13:28.215 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Platform::Filesystem[scratch-lv]: The container Class[Platform::Filesystem::Scratch] will propagate my refresh event^[[0m 2020-06-17T15:13:28.217 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Class[Platform::Filesystem::Scratch]: The container Stage[main] will propagate my refresh event^[[0m 2020-06-17T15:13:28.220 ^[[0;32mInfo: 2020-06-17 15:13:28 +0000 Class[Platform::Docker::Config]: Scheduling refresh of Service[docker]^[[0m 2020-06-17T15:13:28.224 ^[[0;32mInfo: 2020-06-17 15:13:28 +0000 Class[Platform::Docker::Config]: Scheduling refresh of Exec[enable-docker]^[[0m 2020-06-17T15:13:28.227 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl is-active docker'^[[0m 2020-06-17T15:13:28.231 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl is-enabled docker'^[[0m 2020-06-17T15:13:28.238 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl unmask docker'^[[0m 2020-06-17T15:13:28.446 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl start docker'^[[0m 2020-06-17T15:13:28.459 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Runing journalctl command to get logs for systemd start failure: journalctl -n 50 --since '5 minutes ago' -u docker --no-pager^[[0m 2020-06-17T15:13:28.466 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'journalctl -n 50 --since '5 minutes ago' -u docker --no-pager'^[[0m 2020-06-17T15:13:28.471 ^[[1;31mError: 2020-06-17 15:13:28 +0000 Systemd start for docker failed! 2020-06-17T15:13:28.474 journalctl log for docker: 2020-06-17T15:13:28.476 -- No entries -- Test Activity ------------- [Sanity, Feature Testing, Regression Testing, Developer Testing, Evaluation, Other - Please specify] Workaround ---------- Describe workaround if available Brief Description ----------------- After the initial unlock of a worker node, docker failed to start. The subsequent reboot succeeded. Severity -------- Major Steps to Reproduce ------------------ Nothing special. Initial unlock of a worker node. Expected Behavior ------------------ Worker node should come up Actual Behavior ---------------- Worker node fails and requires an additional reboot. Reproducibility --------------- Intermittent. Seen a few times so far. System Configuration -------------------- multi-node system Branch/Pull Time/Commit ----------------------- stx master load since May 2020 Last Pass --------- N/A - Issue is intermittent Timestamp/Logs -------------- Logs are available from 2 occurrences: May 2020: 2020-05-13T11:34:39.533 ^[[1;31mError: 2020-05-13 11:34:38 +0000 /Stage[main]/Platform::Docker::Config/Service[docker]/ensure: change from stopped to running failed: Systemd start for docker failed! 2020-05-13T11:34:39.538 journalctl log for docker: 2020-05-13T11:34:39.545 – No entries – June 2020: 2020-06-17T15:13:28.172 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Containerd::Config/Exec[restart-containerd]: The container Class[Platform::Containerd::Config] will propagate my refresh event^[[0m 2020-06-17T15:13:28.178 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Class[Platform::Containerd::Config]: The container Stage[main] will propagate my refresh event^[[0m 2020-06-17T15:13:28.183 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[mount /dev/cgts-vg/scratch-lv](provider=posix): Executing check 'mount | awk '{print $3}' | grep -Fxq /scratch'^[[0m 2020-06-17T15:13:28.186 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'mount | awk '{print $3}' | grep -Fxq /scratch'^[[0m 2020-06-17T15:13:28.189 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[Change /scratch dir permissions](provider=posix): Executing 'chmod 0770 /scratch'^[[0m 2020-06-17T15:13:28.193 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'chmod 0770 /scratch'^[[0m 2020-06-17T15:13:28.198 ^[[mNotice: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir permissions]/returns: executed successfully^[[0m 2020-06-17T15:13:28.200 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir permissions]: The container Platform::Filesystem[scratch-lv] will propagate my refresh event^[[0m 2020-06-17T15:13:28.203 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[Change /scratch dir group](provider=posix): Executing 'chgrp sys_protected /scratch'^[[0m 2020-06-17T15:13:28.206 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'chgrp sys_protected /scratch'^[[0m 2020-06-17T15:13:28.209 ^[[mNotice: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir group]/returns: executed successfully^[[0m 2020-06-17T15:13:28.212 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir group]: The container Platform::Filesystem[scratch-lv] will propagate my refresh event^[[0m 2020-06-17T15:13:28.215 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Platform::Filesystem[scratch-lv]: The container Class[Platform::Filesystem::Scratch] will propagate my refresh event^[[0m 2020-06-17T15:13:28.217 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Class[Platform::Filesystem::Scratch]: The container Stage[main] will propagate my refresh event^[[0m 2020-06-17T15:13:28.220 ^[[0;32mInfo: 2020-06-17 15:13:28 +0000 Class[Platform::Docker::Config]: Scheduling refresh of Service[docker]^[[0m 2020-06-17T15:13:28.224 ^[[0;32mInfo: 2020-06-17 15:13:28 +0000 Class[Platform::Docker::Config]: Scheduling refresh of Exec[enable-docker]^[[0m 2020-06-17T15:13:28.227 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl is-active docker'^[[0m 2020-06-17T15:13:28.231 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl is-enabled docker'^[[0m 2020-06-17T15:13:28.238 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl unmask docker'^[[0m 2020-06-17T15:13:28.446 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl start docker'^[[0m 2020-06-17T15:13:28.459 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Runing journalctl command to get logs for systemd start failure: journalctl -n 50 --since '5 minutes ago' -u docker --no-pager^[[0m 2020-06-17T15:13:28.466 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'journalctl -n 50 --since '5 minutes ago' -u docker --no-pager'^[[0m 2020-06-17T15:13:28.471 ^[[1;31mError: 2020-06-17 15:13:28 +0000 Systemd start for docker failed! 2020-06-17T15:13:28.474 journalctl log for docker: 2020-06-17T15:13:28.476 -- No entries -- Test Activity ------------- General Use Workaround ---------- N/A - worker nodes recovers, but requires multiple reboots
2020-06-18 17:50:54 Ghada Khalil tags stx.4.0 stx.containers
2020-06-18 17:51:39 Ghada Khalil starlingx: importance Undecided Medium
2020-06-18 17:51:41 Ghada Khalil starlingx: status New Triaged
2020-06-18 22:00:33 OpenStack Infra starlingx: status Triaged In Progress
2020-06-19 15:23:32 OpenStack Infra starlingx: status In Progress Fix Released