Comment 0 for bug 1884111

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Brief Description
-----------------
After the initial unlock of a worker node, docker failed to start. The subsequent reboot succeeded.

Severity
--------
Major

Steps to Reproduce
------------------
Nothing special. Initial unlock of a worker node.

Expected Behavior
------------------
Worker node should come up

Actual Behavior
----------------
Worker node fails and requires an additional reboot.

Reproducibility
---------------
Intermittent. Seen a few times so far.

System Configuration
--------------------
multi-node system

Branch/Pull Time/Commit
-----------------------
stx master load since May 2020

Last Pass
---------
N/A - Issue is intermittent

Timestamp/Logs
--------------
Logs are available from 2 occurrences:

May 2020:
2020-05-13T11:34:39.533 ^[[1;31mError: 2020-05-13 11:34:38 +0000 /Stage[main]/Platform::Docker::Config/Service[docker]/ensure: change from stopped to running failed: Systemd start for docker failed!
2020-05-13T11:34:39.538 journalctl log for docker:
2020-05-13T11:34:39.545 – No entries –

June 2020:
2020-06-17T15:13:28.172 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Containerd::Config/Exec[restart-containerd]: The container Class[Platform::Containerd::Config] will propagate my refresh event^[[0m
2020-06-17T15:13:28.178 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Class[Platform::Containerd::Config]: The container Stage[main] will propagate my refresh event^[[0m
2020-06-17T15:13:28.183 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[mount /dev/cgts-vg/scratch-lv](provider=posix): Executing check 'mount | awk '{print $3}' | grep -Fxq /scratch'^[[0m
2020-06-17T15:13:28.186 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'mount | awk '{print $3}' | grep -Fxq /scratch'^[[0m
2020-06-17T15:13:28.189 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[Change /scratch dir permissions](provider=posix): Executing 'chmod 0770 /scratch'^[[0m
2020-06-17T15:13:28.193 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'chmod 0770 /scratch'^[[0m
2020-06-17T15:13:28.198 ^[[mNotice: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir permissions]/returns: executed successfully^[[0m
2020-06-17T15:13:28.200 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir permissions]: The container Platform::Filesystem[scratch-lv] will propagate my refresh event^[[0m
2020-06-17T15:13:28.203 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[Change /scratch dir group](provider=posix): Executing 'chgrp sys_protected /scratch'^[[0m
2020-06-17T15:13:28.206 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'chgrp sys_protected /scratch'^[[0m
2020-06-17T15:13:28.209 ^[[mNotice: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir group]/returns: executed successfully^[[0m
2020-06-17T15:13:28.212 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir group]: The container Platform::Filesystem[scratch-lv] will propagate my refresh event^[[0m
2020-06-17T15:13:28.215 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Platform::Filesystem[scratch-lv]: The container Class[Platform::Filesystem::Scratch] will propagate my refresh event^[[0m
2020-06-17T15:13:28.217 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Class[Platform::Filesystem::Scratch]: The container Stage[main] will propagate my refresh event^[[0m
2020-06-17T15:13:28.220 ^[[0;32mInfo: 2020-06-17 15:13:28 +0000 Class[Platform::Docker::Config]: Scheduling refresh of Service[docker]^[[0m
2020-06-17T15:13:28.224 ^[[0;32mInfo: 2020-06-17 15:13:28 +0000 Class[Platform::Docker::Config]: Scheduling refresh of Exec[enable-docker]^[[0m
2020-06-17T15:13:28.227 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl is-active docker'^[[0m
2020-06-17T15:13:28.231 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl is-enabled docker'^[[0m
2020-06-17T15:13:28.238 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl unmask docker'^[[0m
2020-06-17T15:13:28.446 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl start docker'^[[0m
2020-06-17T15:13:28.459 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Runing journalctl command to get logs for systemd start failure: journalctl -n 50 --since '5 minutes ago' -u docker --no-pager^[[0m
2020-06-17T15:13:28.466 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'journalctl -n 50 --since '5 minutes ago' -u docker --no-pager'^[[0m
2020-06-17T15:13:28.471 ^[[1;31mError: 2020-06-17 15:13:28 +0000 Systemd start for docker failed!
2020-06-17T15:13:28.474 journalctl log for docker:
2020-06-17T15:13:28.476 -- No entries --

Test Activity
-------------
[Sanity, Feature Testing, Regression Testing, Developer Testing, Evaluation, Other - Please specify]

Workaround
----------
Describe workaround if available