juju bootstrap does not wait for MAAS nodes to change state to "deployed"

Bug #1821565 reported by Trent Lloyd
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

Currently when bootstrapping a controller, juju only waits until MAAS provides it the IP address - it then enters a loop waiting for connectivity to that host. It does not wait until MAAS marks the host as 'Deployed'.

This can be a problem if for some reason the installation fails, and an old installation is booted. In this case the machine may have the same IP, and even have the juju ssh keys configured. That would result in an old machine with other existing data on it being used as a controller.

This arose for me in testing because maas-dhcpd was down, so my MAAS machine tries to PXE boot, failed, then booted the old installation from the HDD.

I did not verify if this also applies to machines during normal deployment (outside of bootstrap) - we should also ensure it doesn't happen there.

Changed in juju:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Mark Maglana (mmaglana) wrote :

I'm able to replicate the issue on 2.6.5-bionic-amd64. Bootstrap hangs at "Running machine configuration script..." until I hit Ctrl-C:

$ juju bootstrap node-amontons
Creating Juju controller "node-amontons" on node-amontons
Looking for packaged Juju agent version 2.6.5 for amd64
Launching controller instance(s) on node-amontons...
 - srt88n (arch=amd64 mem=3.5G cores=1)
Installing Juju agent on bootstrap instance
Fetching Juju GUI 2.14.0
Waiting for address
Attempting to connect to 172.16.99.2:22
Connected to 172.16.99.2
Running machine configuration script...
^CInterrupt signalled: waiting for bootstrap to exit
Bootstrap agent now started
Contacting Juju controller at 172.16.99.2 to verify accessibility...
ERROR unable to contact api server after 1 attempts: unable to connect to API: dial tcp 172.16.99.2:17070: connect: connection refused

MAAS version: 2.6.0 (7802-g59416a869-0ubuntu1~18.04.1)

Revision history for this message
Mark Maglana (mmaglana) wrote :

Just to add, the line where it says "Installing Juju agent on bootstrap instance" happened while the instance was still PXE booting.

Revision history for this message
Mark Maglana (mmaglana) wrote :
Download full text (4.5 KiB)

More debug info:

02:17:42 INFO cmd bootstrap.go:509 Installing Juju agent on bootstrap instance
02:17:43 INFO cmd bootstrap.go:626 Fetching Juju GUI 2.14.0
02:21:44 DEBUG juju.cloudconfig.instancecfg instancecfg.go:887 Setting numa ctl preference to false
Waiting for address
02:21:44 DEBUG juju.provider.maas maas2instance.go:87 "tight-midge" has addresses ["local-cloud:172.16.99.5@undefined(id:-1)"]
Attempting to connect to 172.16.99.5:22
02:21:44 DEBUG juju.provider.common bootstrap.go:576 connection attempt for 172.16.99.5 failed: ssh: connect to host 172.16.99.5 port 22: Connection refused
02:21:50 DEBUG juju.provider.common bootstrap.go:576 connection attempt for 172.16.99.5 failed: /var/lib/juju/nonce.txt does not exist
02:21:54 DEBUG juju.provider.maas maas2instance.go:87 "tight-midge" has addresses ["local-cloud:172.16.99.5@undefined(id:-1)"]
02:21:55 DEBUG juju.provider.common bootstrap.go:576 connection attempt for 172.16.99.5 failed: /var/lib/juju/nonce.txt does not exist
02:22:01 DEBUG juju.provider.common bootstrap.go:576 connection attempt for 172.16.99.5 failed: /var/lib/juju/nonce.txt does not exist
02:22:04 DEBUG juju.provider.maas maas2instance.go:87 "tight-midge" has addresses ["local-cloud:172.16.99.5@undefined(id:-1)"]
02:22:06 DEBUG juju.provider.common bootstrap.go:576 connection attempt for 172.16.99.5 failed: /var/lib/juju/nonce.txt does not exist
02:22:11 DEBUG juju.provider.common bootstrap.go:576 connection attempt for 172.16.99.5 failed: /var/lib/juju/nonce.txt does not exist
02:22:14 DEBUG juju.provider.maas maas2instance.go:87 "tight-midge" has addresses ["local-cloud:172.16.99.5@undefined(id:-1)"]
02:22:17 DEBUG juju.provider.common bootstrap.go:576 connection attempt for 172.16.99.5 failed: /var/lib/juju/nonce.txt does not exist
02:22:22 DEBUG juju.provider.common bootstrap.go:576 connection attempt for 172.16.99.5 failed: /var/lib/juju/nonce.txt does not exist
02:22:24 DEBUG juju.provider.maas maas2instance.go:87 "tight-midge" has addresses ["local-cloud:172.16.99.5@undefined(id:-1)"]
02:22:28 DEBUG juju.provider.common bootstrap.go:576 connection attempt for 172.16.99.5 failed: /var/lib/juju/nonce.txt does not exist
02:22:33 INFO cmd bootstrap.go:345 Connected to 172.16.99.5
02:22:33 INFO juju.cloudconfig userdatacfg_unix.go:529 Fetching agent: curl -sSfw 'agent binaries from %{url_effective} downloaded: HTTP %{http_code}; time %{time_total}s; size %{size_download}
bytes; speed %{speed_download} bytes/s ' --retry 10 -o $bin/tools.tar.gz <[https://streams.canonical.com/juju/tools/agent/2.6.5/juju-2.6.5-ubuntu-amd64.tgz]>
02:22:33 INFO cmd bootstrap.go:415 Running machine configuration script...
^C03:01:36 INFO cmd bootstrap.go:719 Interrupt signalled: waiting for bootstrap to exit
03:01:36 INFO cmd bootstrap.go:564 Bootstrap agent now started
03:01:36 DEBUG juju.provider.maas maas2instance.go:87 "tight-midge" has addresses ["local-cloud:172.16.99.5@undefined(id:-1)"]
03:01:36 INFO juju.juju api.go:303 API endpoints changed from [] to [172.16.99.5:17070]
03:01:36 INFO cmd controller.go:89 Contacting Juju controller at 172.16.99.5 to verify accessibility...
03:01:36 INFO juju.juju api.go:67 connecting...

Read more...

Revision history for this message
Mark Maglana (mmaglana) wrote :

Here's another log. At least with 2.6.5, it doesn't hang but times out waiting for the controller instance to be ready.

Revision history for this message
Mark Maglana (mmaglana) wrote :

Apologies for the false positives above. Turns out the problem was with the MAAS being misconfigured, assigning the wrong Gateway IP to machines. The 2.6 series just hangs (which is a different bug) but juju 2.7-beta1-bionic-amd64 at least times out and reports an error showing that the machine was unable to reach https://streams.canonical.com.

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Medium → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.