Comment 3 for bug 1986708

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

Seen today as well in this job:
https://review.opendev.org/c/openstack/tripleo-ansible/+/853252

https://zuul.opendev.org/t/openstack/build/c6e5b64b5a4940a08ca751b50ac9e6c9
fatal: [subnode-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: kex_exchange_identification: read: Connection reset by peer\r\nConnection reset by 10.209.128.68 port 22", "unreachable": true}

If we compare to the time we see this error in the log (Thursday 18 August 2022 09:32:46 +0000) we can frame this in the journald:
Aug 18 09:32:45 centos-9-stream-rax-dfw-0030762605 sshd[135791]: Accepted publickey for zuul from 10.209.39.69 port 54484 ssh2: RSA SHA256:ZF9fnpKktgsD1qskSaOryoa9rBiFv0AhNYxlCxB1S/s
Aug 18 09:32:45 centos-9-stream-rax-dfw-0030762605 systemd-logind[704]: New session 128 of user zuul.
Aug 18 09:32:45 centos-9-stream-rax-dfw-0030762605 systemd[1]: Started Session 128 of User zuul.
Aug 18 09:32:45 centos-9-stream-rax-dfw-0030762605 sshd[135791]: pam_unix(sshd:session): session opened for user zuul(uid=1000) by (uid=0)
Aug 18 09:32:45 centos-9-stream-rax-dfw-0030762605 sudo[135828]: zuul : TTY=pts/0 ; PWD=/home/zuul ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-igauwcvpqfxxcpaxhwapjcrsfscqvxhe ; /usr/libexec/platform-python /home/zuul/.ansible/tmp/ansible-tmp-1660815163.318697-128609-249815272917366/AnsiballZ_file.py
Aug 18 09:32:45 centos-9-stream-rax-dfw-0030762605 sudo[135828]: pam_unix(sudo:session): session opened for user root(uid=0) by zuul(uid=1000)
Aug 18 09:32:45 centos-9-stream-rax-dfw-0030762605 systemd[1]: Started /usr/bin/podman healthcheck run d1a62b77bec9a31ba7325592b4f2c8b70ce6f8120d079776e0d70dbf1521e88a.
Aug 18 09:32:45 centos-9-stream-rax-dfw-0030762605 systemd[1]: tmp-crun.FzxKF6.mount: Deactivated successfully.
Aug 18 09:32:45 centos-9-stream-rax-dfw-0030762605 podman[135830]: 2022-08-18 09:32:45.939338182 +0000 UTC m=+0.098918338 container exec d1a62b77bec9a31ba7325592b4f2c8b70ce6f8120d079776e0d70dbf1521e88a (image=192.168.24.1:8787/tripleomastercentos9/openstack-nova-compute:80afe6f3cedeea3f67fa00bebbd52e52, name=nova_migration_target, container_name=nova_migration_target, com.redhat.component=ubi9-container, version=9.0.0, config_id=tripleo_step4, release=1604, url=https://access.redhat.com/containers/#/registry.access.redhat.com/ubi9/images/9.0.0-1604, build-date=2022-08-02T22:00:29.261592, io.openshift.expose-services=, io.buildah.version=1.26.4, vendor=Red Hat, Inc., summary=Provides the latest release of Red Hat Universal Base Image 9., config_data={'environment': {'KOLLA_CONFIG_STRATEGY': 'COPY_ALWAYS', 'TRIPLEO_CONFIG_HASH': '26b74d0c5605283ac83ca9f732ef69b1'}, 'healthcheck': {'test': '/openstack/healthcheck'}, 'image': '192.168.24.1:8787/tripleomastercentos9/openstack-nova-compute:80afe6f3cedeea3f67fa00bebbd52e52', 'net': 'host', 'privileged': True, 'restart': 'always', 'user': 'root', 'volumes': ['/etc/hosts:/etc/hosts:ro', '/etc/localtime:/etc/localtime:ro', '/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '/dev/log:/dev/log', '/etc/puppet:/etc/puppet:ro', '/var/lib/kolla/config_files/nova-migration-target.json:/var/lib/kolla/config_files/config.json:ro', '/var/lib/config-data/puppet-generated/nova_libvirt:/var/lib/kolla/config_files/src:ro', '/etc/ssh:/host-ssh:ro', '/run/libvirt:/run/libvirt:shared,z', '/var/lib/nova:/var/lib/nova:shared']}, description=The Universal Base Image is designed and engineered to be the base layer for all of your containerized applications, middleware and utilities. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly., vcs-ref=feeb37c0babd21fde366517b0e3d245d1f9c9b44, com.redhat.license_terms=https://www.redhat.com/en/about/red-hat-end-user-license-agreements#UBI, architecture=x86_64, io.openshift.tags=base rhel9, maintainer=OpenStack TripleO team, distribution-scope=public, io.k8s.description=The Universal Base Image is designed and engineered to be the base layer for all of your containerized applications, middleware and utilities. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly., io.k8s.display-name=Red Hat Universal Base Image 9, managed_by=tripleo_ansible, tcib_managed=true, vcs-type=git, com.redhat.build-host=cpt-1005.osbs.prod.upshift.rdu2.redhat.com, name=ubi9)
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 ansible-file[135831]: Invoked with path=/var/log/extra state=directory mode=511 recurse=False force=False follow=True modification_time_format=%Y%m%d%H%M.%S access_time_format=%Y%m%d%H%M.%S unsafe_writes=False _original_basename=None _diff_peek=None src=None modification_time=None access_time=None owner=None group=None seuser=None serole=None selevel=None setype=None attributes=None
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 sudo[135828]: pam_unix(sudo:session): session closed for user root
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 sshd[135794]: Received disconnect from 10.209.39.69 port 54484:11: disconnected by user
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 sshd[135794]: Disconnected from user zuul 10.209.39.69 port 54484
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 sshd[135791]: pam_unix(sshd:session): session closed for user zuul
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 systemd[1]: session-128.scope: Deactivated successfully.
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 systemd-logind[704]: Session 128 logged out. Waiting for processes to exit.
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 systemd-logind[704]: Removed session 128.
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 kernel: IN=enX1 OUT= MAC=bc:76:4e:04:39:71:18:8b:45:d9:71:41:08:00 SRC=10.209.39.69 DST=10.209.128.68 LEN=60 TOS=0x08 PREC=0x40 TTL=61 ID=8666 DF PROTO=TCP SPT=54486 DPT=22 WINDOW=64240 RES=0x00 SYN URGP=0
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 sshd[135854]: Accepted publickey for zuul from 10.209.39.69 port 54502 ssh2: RSA SHA256:ZF9fnpKktgsD1qskSaOryoa9rBiFv0AhNYxlCxB1S/s
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 systemd-logind[704]: New session 129 of user zuul.
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 systemd[1]: Started Session 129 of User zuul.
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 sshd[135854]: pam_unix(sshd:session): session opened for user zuul(uid=1000) by (uid=0)
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 sshd[135857]: Received disconnect from 10.209.39.69 port 54502:11: disconnected by user
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 sshd[135857]: Disconnected from user zuul 10.209.39.69 port 54502
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 sshd[135854]: pam_unix(sshd:session): session closed for user zuul
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 systemd[1]: session-129.scope: Deactivated successfully.
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 systemd-logind[704]: Session 129 logged out. Waiting for processes to exit.
Aug 18 09:32:46 centos-9-stream-rax-dfw-0030762605 systemd-logind[704]: Removed session 129.

If we take the audit.log.1[1] in parallel, and pass it through `ausearch -if audit.log.1 | grep -A1 ":32:46" | grep -v success' :
time->Thu Aug 18 11:32:46 2022
type=LOGIN msg=audit(1660815166.265:21068): pid=135854 uid=0 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=129 res=1

apparently, there's a tiny issue here. No idea where to look deeper, but that may have something to do with the amount/speed of ssh connection spawing.. ?