Comment 0 for bug 1985981

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote : Sc010 kvm internal job failing with Error: container-init binary not found on the host: stat /usr/libexec/podman/catatonit: no such file or directory"

We are running Sc010 kvm in both vexx cloud and in the internal cloud.

The job which runs in the internal cloud fails with the below error:-

~~~
2022-08-12 07:38:27,832 p=89450 u=root n=ansible | 2022-08-12 07:38:27.831192 | fa163e0d-40f2-7933-9109-000000000070 | FATAL | Run cephadm bootstrap
.
.
Non-zero exit code 125 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint ceph --init -e CONTAINER_IMAGE=quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph -e NODE_NAME=standalone.localdomain -e CEPH_USE_RANDOM_NONCE=1 quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph --version
ceph: stderr Error: container-init binary not found on the host: stat /usr/libexec/podman/catatonit: no such file or directory
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 9106, in <module>
    main()
  File "/usr/sbin/cephadm", line 9094, in main
    r = ctx.func(ctx)
  File "/usr/sbin/cephadm", line 1969, in _default_image
    return func(ctx)
  File "/usr/sbin/cephadm", line 4707, in command_bootstrap
    image_ver = CephContainer(ctx, ctx.image, 'ceph', ['--version']).run().strip()
  File "/usr/sbin/cephadm", line 3739, in run
    out, _, _ = call_throws(self.ctx, self.run_cmd(),
  File "/usr/sbin/cephadm", line 1636, in call_throws
    raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
RuntimeError: Failed command: /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint ceph --init -e CONTAINER_IMAGE=quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph -e NODE_NAME=standalone.localdomain -e CEPH_USE_RANDOM_NONCE=1 quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph --version: Error: container-init binary not found on the host: stat /usr/libexec/podman/catatonit: no such file or directory", "stderr_lines": ["Verifying podman|docker is present...", "Verifying lvm2 is present...", "Verifying time synchronization is in place...", "Unit chronyd.service is enabled and running", "Repeating the final host check...", "podman (/bin/podman) version 4.1.1 is present", "systemctl is present", "lvcreate is present", "Unit chronyd.service is enabled and running", "Host looks OK", "Cluster fsid: e1f5356e-8579-59d7-a01c-bd09ff028582", "Verifying IP 192.168.42.1 port 3300 ...", "Verifying IP 192.168.42.1 port 6789 ...", "Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network", "Adjusting default settings to suit single-host cluster...", "Pulling container image quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph...", "Non-zero exit code 125 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint ceph --init -e CONTAINER_IMAGE=quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph -e NODE_NAME=standalone.localdomain -e CEPH_USE_RANDOM_NONCE=1 quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph --version", "ceph: stderr Error: container-init binary not found on the host: stat /usr/libexec/podman/catatonit: no such file or directory", "Traceback (most recent call last):", " File "/usr/sbin/cephadm", line 9106, in <module>", " main()", " File "/usr/sbin/cephadm", line 9094, in main", " r = ctx.func(ctx)", " File "/usr/sbin/cephadm", line 1969, in _default_image", " return func(ctx)", " File "/usr/sbin/cephadm", line 4707, in command_bootstrap", " image_ver = CephContainer(ctx, ctx.image, 'ceph', ['--version']).run().strip()", " File "/usr/sbin/cephadm", line 3739, in run", " out, _, _ = call_throws(self.ctx, self.run_cmd(),", " File "/usr/sbin/cephadm", line 1636, in call_throws", " raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')", "RuntimeError: Failed command: /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint ceph --init -e CONTAINER_IMAGE=quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph -e NODE_NAME=standalone.localdomain -e CEPH_USE_RANDOM_NONCE=1 quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph --version: Error: container-init binary not found on the host: stat /usr/libexec/podman/catatonit: no such file or directory"], "stdout": "", "stdout_lines": []}
~~~

Same sc010 kvm job is passing in vexx Cloud.

https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-scenario010-kvm-standalone-master&skip=0

As per blog[1] This can happen due to the missing catatonit package which is a weak dependency of podman.

[1] https://unix.stackexchange.com/questions/619212/podman-run-with-init-gives-me-error-container-init-binary-not-found-on-the-h

From logs, I can confirm podman-catatonit.x86_64 missing in the internal job but present in the job running in vexx cloud.

Another difference is in the podman version and the source repo of the podman package:-

In vexx job:-
https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario010-kvm-standalone-master/97aa4c5/logs/undercloud/var/log/extra/package-list-installed.txt.gz

~~~
podman.x86_64 2:4.1.1-3.el9 @appstream
podman-catatonit.x86_64 2:4.1.1-3.el9 @appstream
~~~

Internal job:-
~~~
podman.x86_64 2:4.1.1-6.el9 @quickstart-centos-appstreams
~~~