NUC does not boot after power off/power on

Bug #1366172 reported by Andres Rodriguez
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Julian Edwards

Bug Description

1. I first deployed a node. The node Installed successfully.
2. Then I went to the web UI and Stopped the node. (Stop Button)
3. Then I went to the WebUI again and started the node (Start Button)
4. The node failed to boot with error:

"Booting local disk..."
Cannot get disk parameters
boot: _

This was on an intel NUC.

Tags: amt nuc

Related branches

tags: added: robustness
Revision history for this message
Julian Edwards (julian-edwards) wrote :

You know what I am going to ask for next don't you ...

Changed in maas:
status: New → Incomplete
Revision history for this message
Raphaël Badin (rvb) wrote :

I'm seeing this too. It happens very consistently with my NUCs.

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → Critical
Revision history for this message
Raphaël Badin (rvb) wrote :

It seems it only happens when AMT is used to turn the node on: in the very same situation (node deployed and then turned off), if I power up the node manually, the node boots fine.

tags: added: nuc
removed: robustness
Raphaël Badin (rvb)
tags: added: amt
Raphaël Badin (rvb)
summary: - Node does not boot after power off/power on
+ NUC does not boot after power off/power on
Christian Reis (kiko)
Changed in maas:
milestone: none → 1.7.0
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Until this bug gets some logs, it's impossible to diagnose as this scenario doesn't happen for me.

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Raphaël Badin (rvb) wrote :

> Until this bug gets some logs, it's impossible to diagnose as this scenario doesn't happen for me.

Really? I managed to reproduce this problem reliably (by following the steps from the bug's description). I'll try again when I'm back home.

Revision history for this message
Raphaël Badin (rvb) wrote :

We actually got this yesterday using the orange box so I'm going to mark this bug as triaged.

Changed in maas:
status: Incomplete → Triaged
Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1366172] Re: NUC does not boot after power off/power on

On Thursday 09 Oct 2014 08:45:22 you wrote:
> We actually got this yesterday using the orange box so I'm going to mark
> this bug as triaged.
>
> ** Changed in: maas
> Status: Incomplete => Triaged

I marked it incomplete because it's awaiting log attachments.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Ok I took a look and figured out what's wrong once I saw the log file:

ERROR 2014-10-10 15:59:28,829 maasserver Unable to identify boot image for (ubuntu/amd64/hwe-t/trusty/local): cluster 'master' does not have matching boot image.

So the boot purpose is wrong, basically, it should be "release" not "local" in my case.

Revision history for this message
Raphaël Badin (rvb) wrote :

> So the boot purpose is wrong, basically, it should be "release" not "local" in my case.

There is no such purpose. In my case, the node has been deployed and an OS installed on the disks. I've stopped the machine and I'm re-starting it so the boot purpose *should* be "local" (i.e. boot from the local disk that got installed in the previous step).

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Friday 10 Oct 2014 06:30:38 Raphaël Badin wrote:
> > So the boot purpose is wrong, basically, it should be "release" not
>
> "local" in my case.
>
> There is no such purpose. In my case, the node has been deployed and an
> OS installed on the disks. I've stopped the machine and I'm re-starting
> it so the boot purpose *should* be "local" (i.e. boot from the local
> disk that got installed in the previous step).

It's looking for an image ending in the path "local" ... that's clearly wrong,
it should be "release".

Revision history for this message
Raphaël Badin (rvb) wrote :

> It's looking for an image ending in the path "local" ... that's clearly wrong, it should be "release".

True, the wrong image gets requested (i.e. an image that doesn't exist); but when booting locally —which is what a deployed should do when being powered up— there is no image used. This error doesn't impact (we're not exactly sure why) the PXE config that the node gets (the "boot locally" PXE config).

Revision history for this message
Julian Edwards (julian-edwards) wrote :

I have a suspicion I know what this is. It only happens on the NUCs, not on the microservers, and you said that if you manually power up the NUC it boots OK.

So that leads me to conclude that the power template is doing the wrong thing, and my finger is pointing at this line:

        yes | issue_amt_command powerup pxe

i.e. it always says to PXE boot even if we want a local boot. I'll fix this now.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

After some experimentation I have surprising results.

"amttool powerup hd" didn't work, I had the same outcome.
"amttool powerup" <with no special arg> did work.

Oh well!

Changed in maas:
status: Triaged → In Progress
assignee: nobody → Julian Edwards (julian-edwards)
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Michael (michaelblankenship-ftw) wrote :

Just noting that this seems to only affect two of my nodes (out of seven). The MAAS commissioning portion worked. At the command "juju add-unit nova-compute" it begin another commissioning of the node in question. I watched on a monitor as it installed an OS to the hard drive. It then did a remote reboot, the PXE portion tried to hand things over to the hard drive and failed at the "Booting from disk..." prompt that never progressed past this point. Classically, after a 20 minute timeout Juju would recognize this as a fail and the node usually would be marked with "Commission Failed" or similar.

If at this point, however, you manually power cycle the node the PXE then *will* successfully hand over control to the OS that's loaded and continue from there to finish the OpenStack commissioning of the service.

Dell Vostro 200 with 160GB SATA drive and 6GB RAM

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.