Another intermittent failure was observed in CI once but not again. Perhaps when this bug is patched an extra task could be added to ensure authorized keys have the correct selinux context.
fatal: [undercloud]: FAILED! => {"attempts": 12, "changed": true, "cmd": "ssh -i /home/zuul/.ssh/ceph-admin-id_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null ceph-admin@192.168.42.1 'echo good'", "delta": "0:00:00.063750", "end": "2022-06-18 22:10:39.793079", "msg": "non-zero return code", "rc": 255, "start": "2022-06-18 22:10:39.729329", "stderr": "Warning: Permanently added '192.168.42.1' (ED25519) to the list of known hosts.\r\nceph-admin@192.168.42.1: Permission denied (publickey).", "stderr_lines": ["Warning: Permanently added '192.168.42.1' (ED25519) to the list of known hosts.", "ceph-admin@192.168.42.1: Permission denied (publickey)."], "stdout": "", "stdout_lines": []}
The failed task [1] confirms the ceph-admin user account is unable to support ssh entry.
The account was created [2].
The journal [3] showed that selinux denied sshd from read access on the file authorized_keys.
I could add an extra task after creating the the authorized_keys key [5] which ensures the SELinux context is correct.
[4]
Jun 18 22:10:41 standalone.localdomain setroubleshoot[86399]: SELinux is preventing /usr/sbin/sshd from read access on the file authorized_keys. For complete SELinux messages run: sealert -l fa5095ff-f00d-4445-bacf-6c7ddf9e24d5
Jun 18 22:10:41 standalone.localdomain setroubleshoot[86399]: SELinux is preventing /usr/sbin/sshd from read access on the file authorized_keys.
If you believe that sshd should be allowed read access on the authorized_keys file by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'sshd' --raw | audit2allow -M my-sshd # semodule -X 300 -i my-sshd.pp
Jun 18 22:10:41 standalone.localdomain setroubleshoot[86399]: AnalyzeThread.run(): Set alarm timeout to 10
Jun 18 22:10:43 standalone.localdomain python3[86462]: ansible-command Invoked with chdir=/home/zuul/workspace _raw_params=source /home/zuul/workspace/hash_info.sh export LOG_PATH="56/36256/98/check/periodic-tripleo-ci-centos-9-scenario001-standalone-master/a453b28" export LOG_HOST_URL="https://logserver.rdoproject.org" export SUCCESS="False" export TOCI_JOBTYPE="periodic-tripleo-ci-centos-9-scenario001-standalone-master" export SSL_CA_BUNDLE="/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt" bash -xe /home/zuul/src/review.rdoproject.org/config/ci-scripts/tripleo-upstream/dlrnapi_report.sh _uses_shell=True zuul_log_id=fa163e35-56ca-7734-6b4c-000000000009-primary warn=False argv=None executable=None creates=None removes=None stdin=None
Jun 18 22:10:49 standalone.localdomain python3[86493]: ansible-file Invoked with path=/home/zuul/workspace/logs state=directory recurse=True force=False follow=True modification_time_format=%Y%m%d%H%M.%S access_time_format=%Y%m%d%H%M.%S _original_basename=None _diff_peek=None src=None modification_time=None access_time=None mode=None owner=None group=None seuser=None serole=None selevel=None setype=None attributes=None content=NOT_LOGGING_PARAMETER backup=None remote_src=None regexp=None delimiter=None directory_mode=None unsafe_writes=None
Jun 18 22:10:49 standalone.localdomain python3[86515]: ansible-stat Invoked with path=/home/zuul/workspace/logs/report.html follow=False get_checksum=True get_mime=True get_attributes=True checksum_algorithm=sha1 get_md5=None
Jun 18 22:10:50 standalone.localdomain systemd[1]: dbus-:1.1-org.fedoraproject.SetroubleshootPrivileged@13.service: Main process exited, code=killed, status=14/ALRM
Jun 18 22:10:50 standalone.localdomain systemd[1]: dbus-:1.1-org.fedoraproject.SetroubleshootPrivileged@13.service: Failed with result 'signal'.
Jun 18 22:10:50 standalone.localdomain python3[86584]: ansible-stat Invoked with path=/home/zuul/workspace/logs/zuul_console.json follow=False get_checksum=True checksum_algorithm=sha1 get_mime=True get_attributes=True get_md5=None
Jun 18 22:10:51 standalone.localdomain python3[86649]: ansible-copy Invoked with dest=/home/zuul/workspace/logs/zuul_console.json src=/home/zuul/.ansible/tmp/ansible-tmp-1655604650.4069228-28-169704872561974/source _original_basename=tmpkqaytlv4 follow=False checksum=aa53301d06abc755e726e9efdc2c4e6d83e048e8 backup=False force=True content=NOT_LOGGING_PARAMETER validate=None directory_mode=None remote_src=None local_follow=None mode=None owner=None group=None seuser=None serole=None selevel=None setype=None attributes=None regexp=None delimiter=None unsafe_writes=None
Jun 18 22:10:51 standalone.localdomain systemd[1]: dbus-:1.1-org.fedoraproject.Setroubleshootd@12.service: Main process exited, code=killed, status=14/ALRM
Jun 18 22:10:51 standalone.localdomain systemd[1]: dbus-:1.1-org.fedoraproject.Setroubleshootd@12.service: Failed with result 'signal'.
Jun 18 22:10:51 standalone.localdomain python3[86693]: ansible-stat Invoked with path=/home/zuul/workspace/logs/undercloud follow=False get_checksum=True get_mime=True get_attributes=True checksum_algorithm=sha1 get_md5=None
Jun 18 22:10:51 standalone.localdomain sshd[86694]: error: kex_exchange_identification: Connection closed by remote host
Jun 18 22:10:51 standalone.localdomain sshd[86694]: Connection closed by 103.203.57.11 port 49662
Another intermittent failure was observed in CI once but not again. Perhaps when this bug is patched an extra task could be added to ensure authorized keys have the correct selinux context.
https:/ /review. rdoproject. org/r/c/ testproject/ +/36256
001 standalone testing periodic. New failure:
First time I've seen this timeout:
fatal: [undercloud]: FAILED! => {"attempts": 12, "changed": true, "cmd": "ssh -i /home/zuul/ .ssh/ceph- admin-id_ rsa -o StrictHostKeyCh ecking= no -o UserKnownHostsF ile=/dev/ null ceph-admin@ 192.168. 42.1 'echo good'", "delta": "0:00:00.063750", "end": "2022-06-18 22:10:39.793079", "msg": "non-zero return code", "rc": 255, "start": "2022-06-18 22:10:39.729329", "stderr": "Warning: Permanently added '192.168.42.1' (ED25519) to the list of known hosts.\ r\nceph- admin@192. 168.42. 1: Permission denied (publickey).", "stderr_lines": ["Warning: Permanently added '192.168.42.1' (ED25519) to the list of known hosts.", "ceph-admin@ 192.168. 42.1: Permission denied (publickey)."], "stdout": "", "stdout_lines": []}
The failed task [1] confirms the ceph-admin user account is unable to support ssh entry.
The account was created [2].
The journal [3] showed that selinux denied sshd from read access on the file authorized_keys.
I could add an extra task after creating the the authorized_keys key [5] which ensures the SELinux context is correct.
[1] https:/ /github. com/openstack/ tripleo- quickstart- extras/ blob/master/ roles/standalon e/tasks/ ceph-install. yml#L95
[2] https:/ /logserver. rdoproject. org/56/ 36256/98/ check/periodic- tripleo- ci-centos- 9-scenario001- standalone- master/ a453b28/ logs/undercloud /etc/passwd. txt.gz
[3] https:/ /logserver. rdoproject. org/56/ 36256/98/ check/periodic- tripleo- ci-centos- 9-scenario001- standalone- master/ a453b28/ logs/undercloud /var/log/ extra/journal. txt.gz
[4] localdomain setroubleshoot[ 86399]: SELinux is preventing /usr/sbin/sshd from read access on the file authorized_keys. For complete SELinux messages run: sealert -l fa5095ff- f00d-4445- bacf-6c7ddf9e24 d5 localdomain setroubleshoot[ 86399]: SELinux is preventing /usr/sbin/sshd from read access on the file authorized_keys.
Jun 18 22:10:41 standalone.
Jun 18 22:10:41 standalone.
Jun 18 22:10:41 standalone. localdomain setroubleshoot[ 86399]: AnalyzeThread. run(): Set alarm timeout to 10 localdomain python3[86462]: ansible-command Invoked with chdir=/ home/zuul/ workspace _raw_params=source /home/zuul/ workspace/ hash_info. sh
export LOG_PATH= "56/36256/ 98/check/ periodic- tripleo- ci-centos- 9-scenario001- standalone- master/ a453b28"
export LOG_HOST_URL="https:/ /logserver. rdoproject. org"
export SUCCESS="False"
export TOCI_JOBTYPE= "periodic- tripleo- ci-centos- 9-scenario001- standalone- master"
export SSL_CA_ BUNDLE= "/etc/pki/ ca-trust/ extracted/ openssl/ ca-bundle. trust.crt"
bash -xe /home/zuul/ src/review. rdoproject. org/config/ ci-scripts/ tripleo- upstream/ dlrnapi_ report. sh
_uses_shell= True zuul_log_ id=fa163e35- 56ca-7734- 6b4c-0000000000 09-primary warn=False argv=None executable=None creates=None removes=None stdin=None localdomain python3[86493]: ansible-file Invoked with path=/home/ zuul/workspace/ logs state=directory recurse=True force=False follow=True modification_ time_format= %Y%m%d% H%M.%S access_ time_format= %Y%m%d% H%M.%S _original_ basename= None _diff_peek=None src=None modification_ time=None access_time=None mode=None owner=None group=None seuser=None serole=None selevel=None setype=None attributes=None content= NOT_LOGGING_ PARAMETER backup=None remote_src=None regexp=None delimiter=None directory_mode=None unsafe_writes=None localdomain python3[86515]: ansible-stat Invoked with path=/home/ zuul/workspace/ logs/report. html follow=False get_checksum=True get_mime=True get_attributes=True checksum_ algorithm= sha1 get_md5=None localdomain systemd[1]: dbus-:1. 1-org.fedorapro ject.Setroubles hootPrivileged@ 13.service: Main process exited, code=killed, status=14/ALRM localdomain systemd[1]: dbus-:1. 1-org.fedorapro ject.Setroubles hootPrivileged@ 13.service: Failed with result 'signal'. localdomain python3[86584]: ansible-stat Invoked with path=/home/ zuul/workspace/ logs/zuul_ console. json follow=False get_checksum=True checksum_ algorithm= sha1 get_mime=True get_attributes=True get_md5=None localdomain python3[86649]: ansible-copy Invoked with dest=/home/ zuul/workspace/ logs/zuul_ console. json src=/home/ zuul/.ansible/ tmp/ansible- tmp-1655604650. 4069228- 28-169704872561 974/source _original_ basename= tmpkqaytlv4 follow=False checksum= aa53301d06abc75 5e726e9efdc2c4e 6d83e048e8 backup=False force=True content= NOT_LOGGING_ PARAMETER validate=None directory_mode=None remote_src=None local_follow=None mode=None owner=None group=None seuser=None serole=None selevel=None setype=None attributes=None regexp=None delimiter=None unsafe_writes=None localdomain systemd[1]: dbus-:1. 1-org.fedorapro ject.Setroubles hootd@12. service: Main process exited, code=killed, status=14/ALRM localdomain systemd[1]: dbus-:1. 1-org.fedorapro ject.Setroubles hootd@12. service: Failed with result 'signal'. localdomain python3[86693]: ansible-stat Invoked with path=/home/ zuul/workspace/ logs/undercloud follow=False get_checksum=True get_mime=True get_attributes=True checksum_ algorithm= sha1 get_md5=None localdomain sshd[86694]: error: kex_exchange_ identification: Connection closed by remote host localdomain sshd[86694]: Connection closed by 103.203.57.11 port 49662
Jun 18 22:10:43 standalone.
Jun 18 22:10:49 standalone.
Jun 18 22:10:49 standalone.
Jun 18 22:10:50 standalone.
Jun 18 22:10:50 standalone.
Jun 18 22:10:50 standalone.
Jun 18 22:10:51 standalone.
Jun 18 22:10:51 standalone.
Jun 18 22:10:51 standalone.
Jun 18 22:10:51 standalone.
Jun 18 22:10:51 standalone.
Jun 18 22:10:51 standalone.
[5] https:/ /github. com/openstack/ tripleo- ansible/ blob/master/ tripleo_ ansible/ roles/tripleo_ create_ admin/tasks/ create_ user.yml# L49-L55