nodes failed deployement; they are powered off

Bug #1743333 reported by Jason Hobbs
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
CDO QA System Tests
New
Undecided
Unassigned

Bug Description

Two nodes failed deployment; upon inspection, they were powered off.

They commissioned successfully, and when installation began, they pxe booted, downloaded kernel and initrd, but had no rsyslog entries from booting to Linux during the deploy. When I checked their power status a few hours after the failure, they were powered down.

I can't figure out any clues in the maas logs, or ipmi event logs, why they are powered down.

https://solutions.qa.canonical.com/#/qa/testRun/fc665066-08b6-4e66-990d-9eeeb3d7aa2b

description: updated
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
summary: - elgyem and spearow failed deployement; they are powered off
+ nodes failed deployement; they are powered off
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

It seems possible that maas is telling these nodes to power off - that's one scenario for how it would boot to linux and then power off.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I figured out what is happening.

This is a variant of 1743249. Instead of timing out and going into a stuck state, sometimes nodes start trying to retrieve the generic grub.cfg-default-amd64. When they do this successfully, they get treated as a new enlisting node. They talk back to the MAAS API and then power off. Because of this, the rsyslog logs show up under maas-enlisting-node instead of under the named node.

http://paste.ubuntu.com/26485470/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.