container rabbitmq_wait_bundle failed with Error: OCI runtime error: set propagation for `bin/epmd`: Invalid argument

Bug #1954918 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Won't Fix
High
Unassigned

Bug Description

tripleo-ci-centos-9-standalone job is failing with following during standalone deploy

Logs[1]
```
ERROR: Can't run container rabbitmq_wait_bundle
stderr: Error: OCI runtime error: set propagation for `bin/epmd`: Invalid argument
```

Based on this https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/rabbitmq/rabbitmq-messaging-notify-pacemaker-puppet.yaml#L207

```
 image: {get_param: ContainerRabbitmqImage}
            volumes:
              list_concat:
                - {get_attr: [ContainersCommon, container_puppet_apply_volumes]}
                - - /bin/true:/bin/epmd
```
/bin/epmd binary comes from erlpmd.

Based on this.
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_96a/821241/12/check/tripleo-ci-centos-9-content-provider/96a4341/logs/undercloud/home/zuul/workspace/logs/container-builds/a7d6c87f-e9d0-4131-a278-183abb930e88/base/rabbitmq/rabbitmq-build.log

there is no erlpmd package.

Logs:
[1]. https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_a72/821241/12/check/tripleo-ci-centos-9-standalone/a724e97/logs/undercloud/home/zuul/standalone_deploy.log

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Damien Ciabrini (dciabrin) wrote :

Thanks for the report!
It's unexpected because we have HA overcloud deploying just fine internally with centos9 and containers from quay.io/tripleo_centos9.
I'm going to try to deploy a centos9 standalone locally and reproduce.

Revision history for this message
Damien Ciabrini (dciabrin) wrote :

To be more specific, /bin/epmd comes from erlang-erts, which is correctly installed in the container.

So far I'm not able to reproduce the error from the CI job. I spawned manually the same container on a CS9 vm, with the same podman version and crun container runtime:

podman run -u root --net=host --name=foo -it -v /etc/localtime:/etc/localtime:ro -v /etc/hosts:/etc/hosts:ro -v /dev/log:/dev/log -v /bin/true:/bin/epmd -v /bin/true:/meh trunk.registry.rdoproject.org/tripleomastercentos9/openstack-rabbitmq:f2f06b5a6205fbb954685d8249a0cadf echo ok
9c9eb82f9563428bf6fef03ff4785f75c7d96c5a263ec882f0cdddfa50d35bcd
ok

I'm still looking into it, at this staged i wonder if holding a CI node after the error would speed resolution?

Revision history for this message
chandan kumar (chkumar246) wrote :

Based on my testing here: https://review.rdoproject.org/r/c/testproject/+/37242 and https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-standalone-master the job is pretty much healthy. It was just a blip in CI.

I think we can close this issue since we are no longer seeing it.

Changed in tripleo:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.