Brief Description
-----------------
After the initial unlock of a worker node, docker failed to start. The subsequent reboot succeeded.
Severity
--------
Major
Steps to Reproduce
------------------
Nothing special. Initial unlock of a worker node.
Expected Behavior
------------------
Worker node should come up
Actual Behavior
----------------
Worker node fails and requires an additional reboot.
Reproducibility
---------------
Intermittent. Seen a few times so far.
System Configuration
--------------------
multi-node system
Branch/Pull Time/Commit
-----------------------
stx master load since May 2020
Last Pass
---------
N/A - Issue is intermittent
Timestamp/Logs
--------------
Logs are available from 2 occurrences:
May 2020:
2020-05-13T11:34:39.533 ^[[1;31mError: 2020-05-13 11:34:38 +0000 /Stage[main]/Platform::Docker::Config/Service[docker]/ensure: change from stopped to running failed: Systemd start for docker failed!
2020-05-13T11:34:39.538 journalctl log for docker:
2020-05-13T11:34:39.545 – No entries –
June 2020:
2020-06-17T15:13:28.172 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Containerd::Config/Exec[restart-containerd]: The container Class[Platform::Containerd::Config] will propagate my refresh event^[[0m
2020-06-17T15:13:28.178 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Class[Platform::Containerd::Config]: The container Stage[main] will propagate my refresh event^[[0m
2020-06-17T15:13:28.183 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[mount /dev/cgts-vg/scratch-lv](provider=posix): Executing check 'mount | awk '{print $3}' | grep -Fxq /scratch'^[[0m
2020-06-17T15:13:28.186 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'mount | awk '{print $3}' | grep -Fxq /scratch'^[[0m
2020-06-17T15:13:28.189 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[Change /scratch dir permissions](provider=posix): Executing 'chmod 0770 /scratch'^[[0m
2020-06-17T15:13:28.193 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'chmod 0770 /scratch'^[[0m
2020-06-17T15:13:28.198 ^[[mNotice: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir permissions]/returns: executed successfully^[[0m
2020-06-17T15:13:28.200 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir permissions]: The container Platform::Filesystem[scratch-lv] will propagate my refresh event^[[0m
2020-06-17T15:13:28.203 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[Change /scratch dir group](provider=posix): Executing 'chgrp sys_protected /scratch'^[[0m
2020-06-17T15:13:28.206 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'chgrp sys_protected /scratch'^[[0m
2020-06-17T15:13:28.209 ^[[mNotice: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir group]/returns: executed successfully^[[0m
2020-06-17T15:13:28.212 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[main]/Platform::Filesystem::Scratch/Platform::Filesystem[scratch-lv]/Exec[Change /scratch dir group]: The container Platform::Filesystem[scratch-lv] will propagate my refresh event^[[0m
2020-06-17T15:13:28.215 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Platform::Filesystem[scratch-lv]: The container Class[Platform::Filesystem::Scratch] will propagate my refresh event^[[0m
2020-06-17T15:13:28.217 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Class[Platform::Filesystem::Scratch]: The container Stage[main] will propagate my refresh event^[[0m
2020-06-17T15:13:28.220 ^[[0;32mInfo: 2020-06-17 15:13:28 +0000 Class[Platform::Docker::Config]: Scheduling refresh of Service[docker]^[[0m
2020-06-17T15:13:28.224 ^[[0;32mInfo: 2020-06-17 15:13:28 +0000 Class[Platform::Docker::Config]: Scheduling refresh of Exec[enable-docker]^[[0m
2020-06-17T15:13:28.227 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl is-active docker'^[[0m
2020-06-17T15:13:28.231 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl is-enabled docker'^[[0m
2020-06-17T15:13:28.238 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl unmask docker'^[[0m
2020-06-17T15:13:28.446 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl start docker'^[[0m
2020-06-17T15:13:28.459 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Runing journalctl command to get logs for systemd start failure: journalctl -n 50 --since '5 minutes ago' -u docker --no-pager^[[0m
2020-06-17T15:13:28.466 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'journalctl -n 50 --since '5 minutes ago' -u docker --no-pager'^[[0m
2020-06-17T15:13:28.471 ^[[1;31mError: 2020-06-17 15:13:28 +0000 Systemd start for docker failed!
2020-06-17T15:13:28.474 journalctl log for docker:
2020-06-17T15:13:28.476 -- No entries --
Test Activity
-------------
[Sanity, Feature Testing, Regression Testing, Developer Testing, Evaluation, Other - Please specify]
Workaround
----------
Describe workaround if available
Brief Description
-----------------
After the initial unlock of a worker node, docker failed to start. The subsequent reboot succeeded.
Severity
--------
Major
Steps to Reproduce
------------------
Nothing special. Initial unlock of a worker node.
Expected Behavior
------------------
Worker node should come up
Actual Behavior
----------------
Worker node fails and requires an additional reboot.
Reproducibility
---------------
Intermittent. Seen a few times so far.
System Configuration ------- ------
-------
multi-node system
Branch/Pull Time/Commit ------- ------- --
-------
stx master load since May 2020
Last Pass
---------
N/A - Issue is intermittent
Timestamp/Logs
--------------
Logs are available from 2 occurrences:
May 2020: 13T11:34: 39.533 ^[[1;31mError: 2020-05-13 11:34:38 +0000 /Stage[ main]/Platform: :Docker: :Config/ Service[ docker] /ensure: change from stopped to running failed: Systemd start for docker failed! 13T11:34: 39.538 journalctl log for docker: 13T11:34: 39.545 – No entries –
2020-05-
2020-05-
2020-05-
June 2020: 17T15:13: 28.172 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[ main]/Platform: :Containerd: :Config/ Exec[restart- containerd] : The container Class[Platform: :Containerd: :Config] will propagate my refresh event^[[0m 17T15:13: 28.178 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Class[Platform: :Containerd: :Config] : The container Stage[main] will propagate my refresh event^[[0m 17T15:13: 28.183 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[mount /dev/cgts- vg/scratch- lv](provider= posix): Executing check 'mount | awk '{print $3}' | grep -Fxq /scratch'^[[0m 17T15:13: 28.186 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'mount | awk '{print $3}' | grep -Fxq /scratch'^[[0m 17T15:13: 28.189 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[Change /scratch dir permissions] (provider= posix): Executing 'chmod 0770 /scratch'^[[0m 17T15:13: 28.193 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'chmod 0770 /scratch'^[[0m 17T15:13: 28.198 ^[[mNotice: 2020-06-17 15:13:28 +0000 /Stage[ main]/Platform: :Filesystem: :Scratch/ Platform: :Filesystem[ scratch- lv]/Exec[ Change /scratch dir permissions] /returns: executed successfully^[[0m 17T15:13: 28.200 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[ main]/Platform: :Filesystem: :Scratch/ Platform: :Filesystem[ scratch- lv]/Exec[ Change /scratch dir permissions]: The container Platform: :Filesystem[ scratch- lv] will propagate my refresh event^[[0m 17T15:13: 28.203 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Exec[Change /scratch dir group]( provider= posix): Executing 'chgrp sys_protected /scratch'^[[0m 17T15:13: 28.206 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'chgrp sys_protected /scratch'^[[0m 17T15:13: 28.209 ^[[mNotice: 2020-06-17 15:13:28 +0000 /Stage[ main]/Platform: :Filesystem: :Scratch/ Platform: :Filesystem[ scratch- lv]/Exec[ Change /scratch dir group]/returns: executed successfully^[[0m 17T15:13: 28.212 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 /Stage[ main]/Platform: :Filesystem: :Scratch/ Platform: :Filesystem[ scratch- lv]/Exec[ Change /scratch dir group]: The container Platform: :Filesystem[ scratch- lv] will propagate my refresh event^[[0m 17T15:13: 28.215 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Platform: :Filesystem[ scratch- lv]: The container Class[Platform: :Filesystem: :Scratch] will propagate my refresh event^[[0m 17T15:13: 28.217 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Class[Platform: :Filesystem: :Scratch] : The container Stage[main] will propagate my refresh event^[[0m 17T15:13: 28.220 ^[[0;32mInfo: 2020-06-17 15:13:28 +0000 Class[Platform: :Docker: :Config] : Scheduling refresh of Service[ docker] ^[[0m 17T15:13: 28.224 ^[[0;32mInfo: 2020-06-17 15:13:28 +0000 Class[Platform: :Docker: :Config] : Scheduling refresh of Exec[enable- docker] ^[[0m 17T15:13: 28.227 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl is-active docker'^[[0m 17T15:13: 28.231 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl is-enabled docker'^[[0m 17T15:13: 28.238 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl unmask docker'^[[0m 17T15:13: 28.446 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: '/usr/bin/systemctl start docker'^[[0m 17T15:13: 28.459 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Runing journalctl command to get logs for systemd start failure: journalctl -n 50 --since '5 minutes ago' -u docker --no-pager^[[0m 17T15:13: 28.466 ^[[0;36mDebug: 2020-06-17 15:13:28 +0000 Executing: 'journalctl -n 50 --since '5 minutes ago' -u docker --no-pager'^[[0m 17T15:13: 28.471 ^[[1;31mError: 2020-06-17 15:13:28 +0000 Systemd start for docker failed! 17T15:13: 28.474 journalctl log for docker: 17T15:13: 28.476 -- No entries --
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
Test Activity
-------------
[Sanity, Feature Testing, Regression Testing, Developer Testing, Evaluation, Other - Please specify]
Workaround
----------
Describe workaround if available