integration tests: restructure ssh timeout

Bug #1758409 reported by Joshua Powers
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fix Released

Bug Description

# Summary
During the integration tests, currently if SSH to instance times out it holds up testing for over an hour in an attempt to SSH to an instance; note the timestamp jump on:

The _ssh_connect function was originally written for the nocloud_kvm platform and used as a method for determining if an instance was up and accessible. As such, the function is doing double duty and not correctly focused on SSH'ing to an up and running instance and has a bug in it as it is waiting far too long.

# Action plan

1. For the nocloud_kvm platform when when starting and before _wait_for_system, there should be a check if an instance is accessible during the is_running check. This could be done again by SSH with a number of retries, but should be taken care of inside the nocloud_kvm platform itself and not in the SSH connect function.

2. Update the _ssh_connect to timeout quickly, reduce wait on banner, and only retry up to 3 times.

# Noted Files

Related branches

Revision history for this message
Joshua Powers (powersj) wrote :

#1 is handled already by the retry logic in, so I am only going to update _ssh_connect to not wait as long.

Running through full set of tests on all platforms on local system and test system before submitting merge proposal.

Revision history for this message
Joshua Powers (powersj) wrote :

Example run on a slower machine shows how nocloud-kvm does require more time:

So it seems I do need to add additional timing to the _wait_for_system for nocloud-kvm

ec2 uses the boto3 library to "wait for system" to start and lxd launches so fast it doesn't hit this condition.

Ryan Harper (raharper)
Changed in cloud-init:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers