Grenade job fails due to timeout waiting for SSH

Bug #2066072 reported by melanie witt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
In Progress
Undecided
Unassigned

Bug Description

Seen in a nova-grenade-multinode job failure today [1]:

In grenade/projects/70_cinder/resources.sh:create, a server is booted from a volume and attaches one non-bootable non-encrypted volume followed by one non-bootable encrypted volume. After the volumes attach successfully, there is an attempt to SSH to the server and it fails due to a timeout if SSH does not connect within 30 seconds.

After a failure, the server console log is dumped and in it the last message is "Starting dropbear sshd: OK" which makes it seem likely that sshd was not up and running yet and that's why SSH failed.

We could increase the timeout here to give the server a bit more time to get sshd running before giving up.

Excerpt from controller/logs/grenade.sh_log.txt:

2024-05-17 21:51:37.799 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:183 : local timeleft=30
2024-05-17 21:51:37.801 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:184 : [[ 30 -gt 0 ]]
2024-05-17 21:51:37.804 | ++ /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:185 : date +%s
2024-05-17 21:51:37.809 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:185 : local start=1715982697
2024-05-17 21:51:37.811 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:186 : timeout 30 ssh -v -o ConnectTimeout=10 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i /opt/stack/save/cinder_key.pem cirros@172.24.5.150 'echo '\''I am a teapot'\'' > verify.txt'
2024-05-17 21:51:37.825 | OpenSSH_8.9p1 Ubuntu-3ubuntu0.7, OpenSSL 3.0.2 15 Mar 2022
2024-05-17 21:51:37.825 | debug1: Reading configuration data /etc/ssh/ssh_config
2024-05-17 21:51:37.826 | debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
2024-05-17 21:51:37.826 | debug1: /etc/ssh/ssh_config line 21: Applying options for *
2024-05-17 21:51:37.827 | debug1: Connecting to 172.24.5.150 [172.24.5.150] port 22.
2024-05-17 21:51:37.836 | debug1: connect to address 172.24.5.150 port 22: Connection refused
2024-05-17 21:51:37.836 | ssh: connect to host 172.24.5.150 port 22: Connection refused
2024-05-17 21:51:37.841 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:188 : local rc=255
2024-05-17 21:51:37.844 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:190 : [[ 255 -ne 0 ]]
2024-05-17 21:51:37.846 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:191 : echo 'SSH not responding yet, trying again...'
2024-05-17 21:51:37.846 | SSH not responding yet, trying again...
2024-05-17 21:51:37.848 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:192 : sleep 1

[...]

2024-05-17 21:52:06.181 | ++ /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:193 : date +%s
2024-05-17 21:52:06.186 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:193 : local end=1715982726
2024-05-17 21:52:06.188 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:194 : local took=1
2024-05-17 21:52:06.190 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:195 : timeleft=1
2024-05-17 21:52:06.192 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:196 : [[ 1 -le 0 ]]
2024-05-17 21:52:06.193 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:184 : [[ 1 -gt 0 ]]
2024-05-17 21:52:06.196 | ++ /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:185 : date +%s
2024-05-17 21:52:06.200 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:185 : local start=1715982726
2024-05-17 21:52:06.203 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:186 : timeout 30 ssh -v -o ConnectTimeout=10 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i /opt/stack/save/cinder_key.pem cirros@172.24.5.150 'echo '\''I am a teapot'\'' > verify.txt'
2024-05-17 21:52:06.210 | OpenSSH_8.9p1 Ubuntu-3ubuntu0.7, OpenSSL 3.0.2 15 Mar 2022
2024-05-17 21:52:06.210 | debug1: Reading configuration data /etc/ssh/ssh_config
2024-05-17 21:52:06.210 | debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
2024-05-17 21:52:06.210 | debug1: /etc/ssh/ssh_config line 21: Applying options for *
2024-05-17 21:52:06.211 | debug1: Connecting to 172.24.5.150 [172.24.5.150] port 22.
2024-05-17 21:52:06.214 | debug1: connect to address 172.24.5.150 port 22: Connection refused
2024-05-17 21:52:06.214 | ssh: connect to host 172.24.5.150 port 22: Connection refused
2024-05-17 21:52:06.217 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:188 : local rc=255
2024-05-17 21:52:06.220 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:190 : [[ 255 -ne 0 ]]
2024-05-17 21:52:06.223 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:191 : echo 'SSH not responding yet, trying again...'
2024-05-17 21:52:06.223 | SSH not responding yet, trying again...
2024-05-17 21:52:06.225 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:192 : sleep 1
2024-05-17 21:52:07.231 | ++ /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:193 : date +%s
2024-05-17 21:52:07.235 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:193 : local end=1715982727
2024-05-17 21:52:07.238 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:194 : local took=1
2024-05-17 21:52:07.241 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:195 : timeleft=0
2024-05-17 21:52:07.243 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:196 : [[ 0 -le 0 ]]
2024-05-17 21:52:07.245 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:198 : openstack console log show cinder_server1
2024-05-17 21:52:09.257 | [ 0.000000] Linux version 5.15.0-71-generic (buildd@lcy02-amd64-044) (gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 (Ubuntu 5.15.0-71.78-generic 5.15.92)
2024-05-17 21:52:09.258 | [ 0.000000] Command line: LABEL=cirros-rootfs ro console=tty1 console=ttyS0

[...]

2024-05-17 21:52:09.268 | Top of dropbear init script
2024-05-17 21:52:09.268 | Starting dropbear sshd: OK
2024-05-17 21:52:09.472 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:199 : die 199 'SSH to the client did not work, something very wrong'
2024-05-17 21:52:09.474 | + /opt/stack/new/devstack/functions-common:die:285 : local exitcode=0
2024-05-17 21:52:09.476 | [Call Trace]
2024-05-17 21:52:09.476 | /opt/stack/new/grenade/projects/70_cinder/resources.sh:302:create
2024-05-17 21:52:09.476 | /opt/stack/new/grenade/projects/70_cinder/resources.sh:199:die
2024-05-17 21:52:09.480 | [ERROR] /opt/stack/new/grenade/projects/70_cinder/resources.sh:199 SSH to the client did not work, something very wrong

[1] https://zuul.opendev.org/t/openstack/build/7e8fc2371c4e4d7ba3e73b4ec53cd51e/log/controller/logs/grenade.sh_log.txt

Tags: gate-failure
Revision history for this message
melanie witt (melwitt) wrote :
Changed in cinder:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.