Comment 2 for bug 1871638

Revision history for this message
Bart Wensley (bartwensley) wrote :

After the controller is first unlocked, the controller manifest is failing to apply. The failure happens because the docker daemon cannot be started:

2020-04-08T06:55:57.588 Debug: 2020-04-08 06:55:57 +0000 Exec[perform systemctl daemon reload for docker proxy](provider=posix): Executing 'systemctl daemon-reload'
2020-04-08T06:55:57.590 Debug: 2020-04-08 06:55:57 +0000 Executing: 'systemctl daemon-reload'
2020-04-08T06:55:57.592 Notice: 2020-04-08 06:55:57 +0000 /Stage[main]/Platform::Docker::Config/Exec[perform systemctl daemon reload for docker proxy]: Triggered 'refresh' from 1 events
2020-04-08T06:55:57.594 Info: 2020-04-08 06:55:57 +0000 /Stage[main]/Platform::Docker::Config/Exec[perform systemctl daemon reload for docker proxy]: Scheduling refresh of Service[docker]
2020-04-08T06:55:57.596 Debug: 2020-04-08 06:55:57 +0000 /Stage[main]/Platform::Docker::Config/Exec[perform systemctl daemon reload for docker proxy]: The container Class[Platform::Docker::Config] will propagate my refresh event
2020-04-08T06:55:57.598 Debug: 2020-04-08 06:55:57 +0000 Executing: '/usr/bin/systemctl is-active docker'
2020-04-08T06:55:57.600 Debug: 2020-04-08 06:55:57 +0000 Executing: '/usr/bin/systemctl is-enabled docker'
2020-04-08T06:55:57.602 Debug: 2020-04-08 06:55:57 +0000 Executing: '/usr/bin/systemctl unmask docker'
2020-04-08T06:55:57.643 Debug: 2020-04-08 06:55:57 +0000 Executing: '/usr/bin/systemctl start docker'
2020-04-08T06:55:57.816 Debug: 2020-04-08 06:55:57 +0000 Runing journalctl command to get logs for systemd start failure: journalctl -n 50 --since '5 minutes ago' -u docker --no-pager
2020-04-08T06:55:57.819 Debug: 2020-04-08 06:55:57 +0000 Executing: 'journalctl -n 50 --since '5 minutes ago' -u docker --no-pager'
2020-04-08T06:55:57.828 Error: 2020-04-08 06:55:57 +0000 Systemd start for docker failed!

Looking at daemon.log, it shows dockerd and containerd being restarted, but then dockerd gets stuck with the following logs coming out 1000s of times:

2020-04-08T06:55:57.472 controller-0 dockerd[100339]: info time="2020-04-08T06:55:57.472875804Z" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
2020-04-08T06:55:57.472 controller-0 dockerd[100339]: info time="2020-04-08T06:55:57.472870255Z" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=plugins.moby

The containerd/dockerd startup was modified recently:
https://review.opendev.org/#/c/716911
https://review.opendev.org/#/c/715593
https://review.opendev.org/#/c/717044

Paul or Bob should take a look at this LP. Note that the puppet exec that is failing is only hit if an http_proxy/https_proxy is configured. I'm not sure if that makes a difference?