cephadm issue hitting ceph-related CI jobs

Bug #1928078 reported by Cédric Jeanneret
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Francesco Pantano

Bug Description

At least tripleo-ci-centos-8-scenario001-standalone is affected by a cephadm known issue[1].

There's an ongoing effort to downgrade cephadm for centos to the last working version.

[1] https://tracker.ceph.com/issues/50691

Example:
https://review.opendev.org/c/openstack/python-tripleoclient/+/790130

more precisely:
https://6f4cb1a371578ae496a7-1bfbc49daf51cfd18c455a1195e874d3.ssl.cf2.rackcdn.com/790130/5/check/tripleo-ci-centos-8-scenario001-standalone/a9ba12c/logs/undercloud/home/zuul/standalone-ansible-zygmp3_5/cephadm/cephadm_command.log

https://6f4cb1a371578ae496a7-1bfbc49daf51cfd18c455a1195e874d3.ssl.cf2.rackcdn.com/790130/5/check/tripleo-ci-centos-8-scenario001-standalone/a9ba12c/logs/undercloud/var/log/ceph/cephadm.log

[...]
2021-05-11 09:50:13,701 INFO Non-zero exit code 22 from /bin/podman run --rm --ipc=host --no-hosts --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=10.4.70.69:5001/tripleowallaby/daemon:v6.0.0-stable-6.0-pacific-centos-8-x86_64 -e NODE_NAME=standalone.localdomain -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/4b5c8c0a-ff60-454b-a1b4-9747aa737d19:/var/log/ceph:z -v /tmp/ceph-tmpl92iebtz:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpb9497tkr:/etc/ceph/ceph.conf:z 10.4.70.69:5001/tripleowallaby/daemon:v6.0.0-stable-6.0-pacific-centos-8-x86_64 cephadm set-user ceph-admin
2021-05-11 09:50:13,702 INFO /usr/bin/ceph: stderr Error EINVAL: Traceback (most recent call last):
2021-05-11 09:50:13,702 INFO /usr/bin/ceph: stderr File "/usr/share/ceph/mgr/mgr_module.py", line 1335, in _handle_command
2021-05-11 09:50:13,702 INFO /usr/bin/ceph: stderr return self.handle_command(inbuf, cmd)
2021-05-11 09:50:13,702 INFO /usr/bin/ceph: stderr File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 167, in handle_command
2021-05-11 09:50:13,703 INFO /usr/bin/ceph: stderr return dispatch[cmd['prefix']].call(self, cmd, inbuf)
2021-05-11 09:50:13,703 INFO /usr/bin/ceph: stderr File "/usr/share/ceph/mgr/mgr_module.py", line 389, in call
2021-05-11 09:50:13,703 INFO /usr/bin/ceph: stderr return self.func(mgr, **kwargs)
2021-05-11 09:50:13,703 INFO /usr/bin/ceph: stderr File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
2021-05-11 09:50:13,704 INFO /usr/bin/ceph: stderr wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731
2021-05-11 09:50:13,704 INFO /usr/bin/ceph: stderr File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
2021-05-11 09:50:13,704 INFO /usr/bin/ceph: stderr return func(*args, **kwargs)
2021-05-11 09:50:13,704 INFO /usr/bin/ceph: stderr File "/usr/share/ceph/mgr/cephadm/module.py", line 822, in set_ssh_user
2021-05-11 09:50:13,710 INFO /usr/bin/ceph: stderr host = self.cache.get_hosts()[0]
2021-05-11 09:50:13,711 INFO /usr/bin/ceph: stderr IndexError: list index out of range
2021-05-11 09:50:13,711 INFO /usr/bin/ceph: stderr
2021-05-11 09:50:13,743 DEBUG Releasing lock 140376062407232 on /run/cephadm/4b5c8c0a-ff60-454b-a1b4-9747aa737d19.lock
2021-05-11 09:50:13,743 DEBUG Lock 140376062407232 released on /run/cephadm/4b5c8c0a-ff60-454b-a1b4-9747aa737d19.lock

Changed in tripleo:
assignee: nobody → Francesco Pantano (fmount)
Revision history for this message
Francesco Pantano (fmount) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
tags: added: promotion-blocker
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
John Fulton (jfulton-org) wrote :

Workaround:

  Downgrade to cephadm-16.2.1-1.el8.noarch.rpm

I reproduced the bug with cephadm-16.2.2-0.el8.noarch.rpm I then downgraded to cephadm-16.2.1-1.el8.noarch.rpm and was not able to reproduce the bug (ceph deployment succeeded)

Revision history for this message
Francesco Pantano (fmount) wrote :

Both 16.2.3-1 and 16.2.2-1 are already untagged from cbs by Storage SIG.
This was related to a wrong promotion w/o the feedback of TripleO CI which is supposed to test the -pending builds via [1].

I'm expecting to see our current jobs start installing the old, working package soon.

[1] https://review.opendev.org/q/topic:%22centos-storage-sig%22+(status:open%20OR%20status:merged)

wes hayutin (weshayutin)
Changed in tripleo:
status: Confirmed → Triaged
Revision history for this message
Francesco Pantano (fmount) wrote :
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.