commissioning fails silently if a node can't reach the region controller
Bug #1303925 reported by
James Troup
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
High
|
Julian Edwards | ||
cloud-init |
Expired
|
Low
|
Unassigned |
Bug Description
We recently had a node which completely refused to commission in MAAS.
After (literally) several man days of debugging, we figured out that
it was because the node couldn't talk to the region controller over
HTTP.
Obviously, that's ultimately our mistake/problem, but MAAS could have
been a lot better at helping us to help ourselves; currently, there's
absolutely no indication from the boot process that the HTTP
connection to the region controller is the problem.
Attached is the serial console output (from the point of boot) for the
node that was failing to commission. 91.189.94.35 is the MAAS region
controller and 91.189.88.20 is the MAAS cluster controller.
Related branches
lp:~julian-edwards/maas/commission-monitor-bug-1303925
- Jeroen T. Vermeulen (community): Approve
-
Diff: 317 lines (+74/-3)7 files modifiedsrc/maasserver/api/tests/test_enlistment.py (+4/-1)
src/maasserver/api/tests/test_nodes.py (+3/-0)
src/maasserver/models/node.py (+14/-2)
src/maasserver/models/tests/test_node.py (+29/-0)
src/maasserver/tests/test_node_action.py (+4/-0)
src/metadataserver/api.py (+1/-0)
src/metadataserver/tests/test_api.py (+19/-0)
Changed in maas: | |
importance: | Critical → High |
tags: | added: node-lifecycle |
tags: | added: robustness |
tags: | removed: node-lifecycle |
Changed in maas: | |
milestone: | 1.6.0 → none |
Changed in maas: | |
milestone: | none → next |
Changed in maas: | |
milestone: | next → 1.7.1 |
Changed in maas: | |
status: | Triaged → In Progress |
assignee: | nobody → Julian Edwards (julian-edwards) |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
Calling this critical since it's a costly failure state to get into, and targeting it for 14.10.