[rocky] [rdo-kolla-build : Reload and restart docker] timeout

Bug #1816406 reported by Quique Llorente on 2019-02-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Quique Llorente

Bug Description

Looks like we are timting at at rocky promotion post at RDO kolla build
http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-centos-7-rocky-containers-build/e391995/job-output.txt.gz
2019-02-18 05:49:10.798157 | primary | RUNNING HANDLER [rdo-kolla-build : Reload and restart docker] ******************
2019-02-18 08:48:59.540657 | POST-RUN END RESULT_TIMED_OUT: [trusted : review.rdoproject.org/config/playbooks/tripleo-ci-periodic-base/containers-build.yaml@master]
2019-02-18 08:48:59.541789 | POST-RUN START: [untrusted : git.openstack.org/openstack-infra/tripleo-ci/playbooks/tripleo-ci/post.yaml@master]
2019-02-18 08:49:01.339853 |
2019-02-18 08:49:01.340046 | PLAY [Write console log to localhost as fact zuul_console_json]
2019-02-18 08:49:01.414320 |
2019-02-18 08:49:01.414485 | TASK [capture console log json as fact]
2019-02-18 08:49:01.845841 | localhost | ok
2019-02-18 08:49:02.319810 |
2019-02-18 08:49:02.320031 | PLAY [Collect logs]
2019-02-18 08:49:02.344035 |
2019-02-18 08:49:02.344180 | TASK [set collection timeout]
2019-02-18 08:49:02.407278 | primary | ok
2019-02-18 08:49:02.433132 |
2019-02-18 08:49:02.433288 | TASK [Copy zuul_console_json log to workspace for reproducer]
2019-02-18 08:49:03.659532 | primary | changed
2019-02-18 08:49:03.678451 |
2019-02-18 08:49:03.678621 | TASK [Check for artifacts created by a previous collect_logs]
2019-02-18 08:49:03.904813 | primary | ok
2019-02-18 08:49:03.924600 |
2019-02-18 08:49:03.924753 | TASK [Remark of collect logs running before post in ovb]
2019-02-18 08:49:03.963816 | primary | skipping: Conditional result was False
2019-02-18 08:49:03.984419 |
2019-02-

summary: - rocky: rdo-kolla-build : Install registry certificate authority timeout
+ [rocky] [rdo-kolla-build : Reload and restart docker] timeout
Quique Llorente (quiquell) wrote :

Just started to fail after we rebuilded OVS without DPDK, it also fails at collect logs :-/

Quique Llorente (quiquell) wrote :

Looking at at holded machine restart is hold and collect logs is at docker ps as expected
06d524fb908230a6038f26887484d25c8f36f5fc7b0d385e4110232 /usr/libexec/docker/docker-runc-current
root 45901 0.0 0.0 356488 1788 ? Sl 05:43 0:00 /usr/bin/docker-containerd-shim-current c1edb4b40cbdb84362a6df8138fa3736211d61a433af9813e54fdcdee43f4d66 /var/run/docker/libcontainerd/c1edb4b40cbdb84362a6df8138fa3736211d61a433af9813e54fdcdee43f4d66 /usr/libexec/docker/docker-runc-current
root 46028 0.0 0.0 282756 1796 ? Sl 05:43 0:00 /usr/bin/docker-containerd-shim-current 864510a019fc9f32a57d3ad9d86a596e9d21524c1cb4f844509841370253fb67 /var/run/docker/libcontainerd/864510a019fc9f32a57d3ad9d86a596e9d21524c1cb4f844509841370253fb67 /usr/libexec/docker/docker-runc-current
root 50866 0.0 0.0 423944 1784 ? Sl 05:43 0:00 /usr/bin/docker-containerd-shim-current 8e0fe64c212bd1739dd52f66626a6455faa4cd95d03ea6552c563c0230b3fa60 /var/run/docker/libcontainerd/8e0fe64c212bd1739dd52f66626a6455faa4cd95d03ea6552c563c0230b3fa60 /usr/libexec/docker/docker-runc-current
root 65799 0.0 0.0 33280 1444 ? S 05:48 0:00 /bin/systemctl restart docker
root 65837 0.1 0.2 1165084 24048 ? Ssl 05:48 0:20 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --init-path=/usr/libexec/docker/docker-init-current --seccomp-profile=/etc/docker/seccomp.json --log-driver=journald --signature-verification=false --storage-driver=overlay2 --mtu 1300 --max-concurrent-downloads=16 --max-concurrent-uploads=16 --insecure-registry 192.168.24.1:8787 --insecure-registry 192.168.24.3:8787
root 65849 0.0 0.1 1003380 16036 ? Ssl 05:48 0:12 /usr/bin/docker-containerd-current -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc --debug --runtime-args --systemd-cgroup=true
root 67560 0.0 0.0 13760 1460 ? Ss 05:59 0:00 /usr/bin/sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0
root 67562 0.0 0.0 13760 648 ? S 05:59 0:00 /usr/bin/sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm $DEAD; exit 0
root 67563 0.0 0.0 713616 7888 ? Sl 05:59 0:00 /usr/bin/docker-current ps -aq -f status=dead
root 84134 0.0 0.0 11636 1224 ? S 08:49 0:00 /bin/sh -c docker ps | grep opendaylight_api
root 84135 0.0 0.1 624004 8400 ? Sl 08:49 0:00 /usr/bin/docker-current ps
zuul 95271 0.0 0.0 11144 952 pts/1 S+ 11:21 0:00 grep --color=auto docker

Quique Llorente (quiquell) wrote :

More on docker restart

Feb 19 05:48:02 undercloud ansible-systemd: Invoked with no_block=False force=None name=docker enabled=None daemon_reload=True state=restarted masked=None user=False
Feb 19 05:48:02 undercloud systemd: Reloading.
Feb 19 05:48:02 undercloud systemd: Stopping Docker Application Container Engine...
Feb 19 05:48:02 undercloud dockerd-current: time="2019-02-19T05:48:02.582071366Z" level=info msg="Processing signal 'terminated'"
Feb 19 05:48:02 undercloud dockerd-current: time="2019-02-19T05:48:02.621195972Z" level=debug msg="Clean shutdown succeeded"
Feb 19 05:48:02 undercloud dockerd-current: time="2019-02-19T05:48:02.624634538Z" level=info msg="stopping containerd after receiving terminated"
Feb 19 05:48:02 undercloud dockerd-current: time="2019-02-19T05:48:02.62599497Z" level=fatal msg="containerd: serve grpc" error="accept unix /var/run/docker/libcontainerd/docker-containerd.sock: use of closed network connection"
Feb 19 05:48:03 undercloud dockerd-current: time="2019-02-19T05:48:03.112319707Z" level=debug msg="libcontainerd: containerd health check returned error: rpc error: code = 9 desc = grpc: the client connection is closing"
Feb 19 05:48:03 undercloud systemd: Stopped Docker Application Container Engine.
Feb 19 05:48:03 undercloud systemd: Starting Docker Storage Setup...
Feb 19 05:48:03 undercloud container-storage-setup: INFO: Volume group backing root filesystem could not be determined
Feb 19 05:48:03 undercloud container-storage-setup: ERROR: Failed to determine existing storage driver.
Feb 19 05:48:03 undercloud systemd: docker-storage-setup.service: main process exited, code=exited, status=1/FAILURE
Feb 19 05:48:03 undercloud systemd: Failed to start Docker Storage Setup.
Feb 19 05:48:03 undercloud systemd: Unit docker-storage-setup.service entered failed state.
Feb 19 05:48:03 undercloud systemd: docker-storage-setup.service failed.
Feb 19 05:48:03 undercloud systemd: Starting Docker Application Container Engine...
Feb 19 05:48:03 undercloud dockerd-current: time="2019-02-19T05:48:03.917339565Z" level=debug msg="Listener created for HTTP on unix (/var/run/docker.sock)"
Feb 19 05:48:03 undercloud dockerd-current: time="2019-02-19T05:48:03.918245049Z" level=debug msg="libcontainerd: runContainerdDaemon: runtimeArgs: [-l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc --debug --runtime-args --systemd-cgroup=true]"
Feb 19 05:48:03 undercloud dockerd-current: time="2019-02-19T05:48:03.920302983Z" level=info msg="libcontainerd: new containerd process, pid: 65849"
Feb 19 05:48:03 undercloud dockerd-current: time="2019-02-19T05:48:03.921096307Z" level=debug msg="libcontainerd: maximum number of retries for containerd health check is 3"
Feb 19 05:48:03 undercloud dockerd-current: time="2019-02-19T05:48:03.9

Quique Llorente (quiquell) wrote :

Docker version looks different too
working one:
docker-1.13.1-88.git07f3374.el7.centos.x86_64

failing one
docker-1.13.1-91.git07f3374.el7.centos.x86_64

Alan Pevec (apevec) wrote :

This is Docker from RHEL Extras issue tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1671861

wes hayutin (weshayutin) on 2019-02-19
Changed in tripleo:
milestone: none → stein-3
Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.