node marked "Ready" before poweroff complete
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
High
|
Julian Edwards |
Bug Description
After bootstrapping and running one deployment, we start the next with the following sequence:
juju destroy-environment -y -e maas
juju bootstrap -e maas
In this case, we're getting handed back one of the machines that previous maas environment used, but the system hasn't been powered off (ie it didn't re-install from scratch). This results in bootstrap failure. I notice oddities during the bootstrap log:
WARNING picked arbitrary tools &{"1.18.
- /MAAS/api/
Waiting for address
Attempting to connect to node-1.maas:22
Attempting to connect to 10.245.0.158:22
Warning: Permanently added 'mode-1.maas' (ECDSA) to the list of known hosts.
Logging to /var/log/
Installing add-apt-repository
Adding apt repository: deb http://
Running apt-get update
Running apt-get upgrade
Installing package: git
Installing package: curl
Installing package: cpu-checker
Installing package: bridge-utils
Installing package: rsyslog-gnutls
Installing package: --target-release 'precise-
Fetching tools: curl -sSfw 'tools from %{url_effective} downloaded: HTTP %{http_code}; time %{time_total}s; size %{size_download} bytes; speed %{speed_download} bytes/s ' -o $bin/tools.tar.gz 'https:/
Starting MongoDB server (juju-db)
2014-06-02 13:43:10,360 - __init_
user already exists?
Cloud-init v. 0.7 running 'modules:config' at Mon, 02 Jun 2014 13:43:11 +0000. Up 43.60 seconds.
going from 2014 13.41.08 (bootstrap start) to now is roughly 2 minutes. That's very fast for power down; power off, power on, pxe boot, kernel load.
It appears the host has already been dist-upgraded,, and then finally:
start: Job is already running: juju-db
attempting to start juju-db breaks, and the bootstrap fails.
Related branches
- Jeroen T. Vermeulen (community): Approve
-
Diff: 392 lines (+107/-82)6 files modifiedsrc/maasserver/api/tests/test_node.py (+8/-59)
src/maasserver/enum.py (+3/-0)
src/maasserver/models/node.py (+29/-2)
src/maasserver/models/tests/test_node.py (+57/-13)
src/maasserver/node_action.py (+0/-1)
src/maasserver/tests/test_node_action.py (+10/-7)
Changed in maas: | |
importance: | Undecided → Critical |
Changed in maas: | |
status: | New → Triaged |
tags: | added: node-lifecycle |
tags: | added: robustness |
tags: | removed: node-lifecycle |
Changed in maas: | |
milestone: | none → 1.7.0 |
Changed in maas: | |
assignee: | Raphaël Badin (rvb) → nobody |
status: | In Progress → Triaged |
tags: | added: cloud-installer landscape |
Changed in maas: | |
assignee: | nobody → Julian Edwards (julian-edwards) |
status: | Triaged → In Progress |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |
description: | updated |
I think this issue is happening because when the node is "deallocated" the celery task for poweroff is being started, but since maas doesn't check that poweroff has completed before the node is marked ready, that same node gets used again, and juju connects to the node before it has powered down.
Note: the poweroff command is working, after a moment the node will finally turn off.