running many `cluster join` in parallel fails randomly

Bug #2071943 reported by Marian Gasparovic
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Snap
New
Undecided
Unassigned

Bug Description

We are now deploying Sunbeam with nine nodes which means running eight `cluster join` commands. Usually one of them fails immediately. When we switched to running three joins in parallel, problem disappeared.

2024-07-02-08:40:14 root ERROR [localhost] Command failed: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null solqa-lab1-server-16.nosilo.lab1.solutionsqa -- sunbeam cluster join --token eyJzZWNyZXQiOiI4YzkwYTAxOTMxM2U4MjE4NjNhYzM4MGQ5YmZiYTYwYjg5MWI0NTNhN2RmNzc5ZWY1MzY5OGExYjJkYzkxZjAxIiwiZmluZ2VycHJpbnQiOiIzMDNhYmMyNmU3M2NkYzZiNzdhYjNkYzQ3ZGI3YWViZmU1MWU1MDRmM2UwYTQwMTc0MTVmNTVkNWNmNGY0YTk3Iiwiam9pbl9hZGRyZXNzZXMiOlsiMTAuMjQ2LjE2NC4xNDg6NzAwMCJdfQ== --role storage
2024-07-02-08:40:14 root ERROR 1[localhost] STDOUT follows:
b''
2024-07-02-08:40:14 root ERROR 2[localhost] STDERR follows:
Warning: Permanently added 'solqa-lab1-server-16.nosilo.lab1.solutionsqa' (ED25519) to the list of known hosts.
Error: Juju not detected: please install snap

https://solutions.qa.canonical.com/testruns/f0cec44c-77fb-46ca-9411-7402a7472c7c

I am really not sure if calling many joins is the issue as the command fails immediately.
Maybe something to keep an eye on

Tags: cdo-qa
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.