MAAS provider bootstrap: Timeout, server <server> not responding.

Bug #1365035 reported by David Britton
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
New
Undecided
Unassigned

Bug Description

I've got this error a few times now with 1.20.6, and I don't think it's representative of an actual failure with the network. This is all on a private network that has been working fine before 1.20.6. Short paste here, longer paste attached:

Logging to /var/log/cloud-init-output.log on remote host
Running apt-get update
Running apt-get upgrade
Installing package: git
Installing package: curl
Installing package: cpu-checker
Installing package: bridge-utils
Installing package: rsyslog-gnutls
Fetching tools: curl -sSfw 'tools from %{url_effective} downloaded: HTTP %{http_code}; time %{time_total}s; size %{size_download} bytes; speed %{speed_download} bytes/s ' --retry 10 -o $bin/tools.tar.gz 'https://streams.canonical.com/juju/tools/releases/juju-1.20.6-trusty-amd64.tgz'
Bootstrapping Juju machine agent
Timeout, server tesla.beretstack not responding.
2014-09-02 20:51:07 ERROR juju.provider.common bootstrap.go:119 bootstrap failed: subprocess encountered error code 255
Stopping instance...
2014-09-02 20:51:07 INFO juju.cmd cmd.go:113 Bootstrap failed, destroying environment
2014-09-02 20:51:07 INFO juju.provider.common destroy.go:15 destroying environment "beretstack-0"
2014-09-02 20:51:07 ERROR juju.cmd supercommand.go:323 subprocess encountered error code 255

If I had the option to leave the machine up (coming in 1.20.7+?) it would be helpful, since I can't repeat this every time.

Revision history for this message
David Britton (dpb) wrote :
description: updated
Curtis Hovey (sinzui)
tags: added: bootstrap
Curtis Hovey (sinzui)
tags: added: timeout tools
tags: added: maas-provider
Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: none → 1.21-alpha1
Mark Ramm (mark-ramm)
Changed in juju-core:
importance: High → Critical
Revision history for this message
David Britton (dpb) wrote :

Got the bootstrap error again but --keep-broken didn't work, it seems as the machine was deallocated in MAAS.

David Britton (dpb)
no longer affects: juju-core
Revision history for this message
David Britton (dpb) wrote :

Sooo... I finally caught this bugger in action. It's as I was discussing with bigjools in #maas.

I bootstrap, but soon after a destroy-environment previously ran. Well, that gets juju thinking that the machine is ready to use (since it hasn't been shutdown yet). Juju is then waiting on the machine to finish starting jujud so it can contact mongo, but then the power off/power on comes along.

*Then*, the machine is ripped out from underneath it.

Questions

1) Why doesn't maas allocate the machine after power on is run?

2) Why is this power off/power on taking so long?

My theory on #2: I'm getting a ton of dhcp refreshes that are all taking 6+ seconds. I think they are clogging the rabbit queue. Perhaps that warrants a separate bug. Not sure, will need some guidance there.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.