juju should wait until a node's status is 'deployed' to attempt ssh'ing into it

Bug #1394680 reported by Larry Michel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Ian Booth
1.22
Fix Released
High
Ian Booth

Bug Description

While bootstrapping a server, installation seemed to be going ok and system was being rebooted after installation finished. But juju bootstrap failed due to being kicked out of an ssh session, and it destroyed the environment. So, a server that was still deploying was stopped through maas and state was transitioned from Deploying to Ready.

This is from console output:
==============================================================================================
Launching instance
WARNING picked arbitrary tools &{1.20.11-precise-amd64 https://streams.canonical.com/juju/tools/releases/juju-1.20.11-precise-amd64.tgz 196a1348755f3ce869ce1319995d2d6b672809e165d87987dc5c12828c228de8 8112417}
 - /MAAS/api/1.0/nodes/node-94afcdc8-aea0-11e3-9074-00163efc5068/
Waiting for address
Attempting to connect to bakhtak.oil:22
Attempting to connect to bakhtak.oil:22
Attempting to connect to 10.245.0.216:22
Warning: Permanently added 'bakhtak.oil' (ECDSA) to the list of known hosts.
Logging to /var/log/cloud-init-output.log on remote host
Installing add-apt-repository
Adding apt repository: deb http://ubuntu-cloud.archive.canonical.com/ubuntu precise-updates/cloud-tools main
Running apt-get update
Connection to bakhtak.oil closed by remote host.
ERROR bootstrap failed: subprocess encountered error code 255
Stopping instance...
Bootstrap failed, destroying environment
ERROR subprocess encountered error code 255
==============================================================================================

Is juju bootstrap polling the maas server and attempting ssh after it's switched to the "Deployed" state or is trying to ssh into the system and waiting for another condition? From the MAAS logs, there were not issue with the installation and the system was in the process of rebooting following curtin installation when the juju environment got destroyed due to this error.

Tags: bootstrap oil

Related branches

Larry Michel (lmic)
tags: added: oil
Curtis Hovey (sinzui)
tags: added: bootstrap
Changed in juju-core:
status: New → Triaged
Curtis Hovey (sinzui)
Changed in juju-core:
importance: Undecided → Medium
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I think fixing this would help alleviate what we're seeing in bug 1355782 - can the priority on this be raised?

Changed in juju-core:
importance: Medium → High
milestone: none → 1.22
summary: - juju bootstrap fails due to remote connection closed by remote host
- while it's deploying
+ juju should wait until a node's status is 'deployed' to attempt ssh'ing
+ into it
Ian Booth (wallyworld)
Changed in juju-core:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Ian Booth (wallyworld)
Changed in juju-core:
milestone: 1.22-alpha1 → 1.23
Revision history for this message
Raphaël Badin (rvb) wrote :

Hum, it's weird that Juju was able to SSH into the machine while it was deploying. AFAIK, Juju's SSH key is part of the user data that gets pulled by the machine *after* it's installed. It goes like this:
- Client deploys a machine
- MAAS starts machine
- Machine gets into the ephemeral environment and runs the Curtin install (only the SSH keys registered inside MAAS for the user are actually installed on the machine at this stage)
- Once the install is done, the machine reboots
- The Juju user data is pulled from MAAS by cloud init
- Juju can now connect to the machine

Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Revision history for this message
Larry Michel (lmic) wrote :

Raphael, we have observed that the user's public key gets installed by curtin such that user can ssh into the installation environment. To recreate, you would deploy a node from MAAS, then try to SSH into system once system is pingable and until it lets you in; ssh returns connection refused before finally succeeding. You can then check mounted filesystems and cloud init logs to ensure you are in the installation environment. Lastly, keep the SSH session opened in order to observe it getting killed as a result of system initiating post-install reboot.

Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.23 → 1.23-beta1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.