cinder server ping check intermittently fails in the gate with "SSH to the client did not work, something very wrong"

Bug #1840355 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grenade
Confirmed
Undecided
Unassigned

Bug Description

https://zuul.opendev.org/t/openstack/build/bd182b5bef2143e09ba6737b4f893d2e/log/logs/grenade.sh.txt.gz#45562

2019-08-15 16:59:32.808 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:193 : sleep 1
2019-08-15 16:59:33.814 | ++ /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:194 : date +%s
2019-08-15 16:59:33.819 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:194 : local end=1565888373
2019-08-15 16:59:33.822 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:195 : local took=1
2019-08-15 16:59:33.824 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:196 : timeleft=0
2019-08-15 16:59:33.826 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:197 : [[ 0 -le 0 ]]
2019-08-15 16:59:33.829 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:198 : die 198 'SSH to the client did not work, something very wrong'
2019-08-15 16:59:33.833 | + /opt/stack/new/devstack/functions-common:die:195 : local exitcode=0

Related patch here: https://review.opendev.org/#/c/673923/ (for debugging)

Could be a duplicate of bug 1808010, bug 1463631 or bug 1836642 and this might fix it:

https://review.opendev.org/#/q/I9082be077b59acd3a39910fa64e29147cb5c2dd7

This bug report is for e-r tracking in case it's not one of those others.

Matt Riedemann (mriedem)
Changed in grenade:
status: New → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :

This might be a duplicate of bug 1463631.

Revision history for this message
melanie witt (melwitt) wrote :

Just saw another hit of this bug and I can see in the logs in this particular case, the problem was likely that sshd wasn't ready by the time the SSH connectivity check began:

2021-02-10 10:19:24.111 | OpenSSH_7.6p1 Ubuntu-4ubuntu0.3, OpenSSL 1.0.2n 7 Dec 2017
2021-02-10 10:19:24.111 | debug1: Reading configuration data /etc/ssh/ssh_config
2021-02-10 10:19:24.111 | debug1: /etc/ssh/ssh_config line 19: Applying options for *
2021-02-10 10:19:24.112 | debug1: Connecting to 172.24.5.232 [172.24.5.232] port 22.
2021-02-10 10:19:24.113 | debug1: connect to address 172.24.5.232 port 22: Connection refused
2021-02-10 10:19:24.113 | ssh: connect to host 172.24.5.232 port 22: Connection refused

because it says "Connection refused" [1].

Anecdotally, I've seen VMs take several minutes before they were SSH-able, but we have to put a limit on how long we'll wait for sshd to be up and running, of course.

First attempt was at:

2021-02-10 10:18:55.971 | + /opt/stack/new/grenade/projects/70_cinder/resources.sh:create:187 : timeout 30 ssh -v -o ConnectTimeout=10 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i /opt/stack/save/cinder_key.pem cirros@172.24.5.232 'echo '\''I am a teapot'\'' > verify.txt'

about 30 seconds prior to the final attempt and looks like the code [2] reflects that as well:

    local timeleft=30
    while [[ $timeleft -gt 0 ]]; do

[1] https://zuul.opendev.org/t/openstack/build/3c3c82a666884b9c8e5a2f9d74e2e32a/log/controller/logs/grenade.sh_log.txt#1817-1822
[2] https://github.com/openstack/grenade/blob/1cdbb71ffe1f41fbe46694888aa8ba7d2d7917c7/projects/70_cinder/resources.sh#L180-L181

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.