After commissioning NUC reboots instead of shutting down

Bug #1368685 reported by Gavin Panella
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
High
Unassigned

Bug Description

Furthermore, the console hangs after the reboot:

  Trying to load: pxelinux.cfg/01-ec-a8-6b-fd-af-45 ok
  APM not present.
  boot: _

The node did commission successfully, is marked as Ready in the UI, and power control is working.

Revision history for this message
Gavin Panella (allenap) wrote :

In case it's useful, I've attached a photo of the console when it reboots after commissioning.

Revision history for this message
Raphaël Badin (rvb) wrote :

I've just seen this happen on one of my NUCs as well.

I suspect this is the conjunction of two problems:
- after commissioning, instead of being shut down, the node got rebooted. It might be a problem in MAAS itself or the NUC acting up.
- since the node was 'Ready' the PXE config told the node to poweroff (because it doesn't make sense for a node to be powered up in that state) using APM and the NUC doesn't support this.

tags: added: orange-box
tags: added: orangebox
removed: orange-box
Revision history for this message
Nicolas Thomas (thomnico) wrote :

I might have come to the root cause of this:

When having a node that provide this APM not present it generally means that it tries to enter a state when MAAS except another.

I noticed that the nodes having this problem have the following symptom :

For example:
amttool 10.14.4.12 powerdown
host node2amt.maas, powerdown [y/N] ? y
execute: powerdown
result: pt_status: not permitted

(the node is started)

The only permitted is reset, consistent when I go to the AMT URL: http://10.14.4.12:16992/index.htm "Remote control" only reset is available whereas the non problematic ones have all options available when os is running.

Checking maas logs I have consistent "not permitted" message. When shutting down from the OS on the host help to put the node in the expected state and move forward (commissioning for example).

Hope this helps,

Revision history for this message
Christian Reis (kiko) wrote :

We need a way to reproduce this, or an indication that it is still happening on the 1.7 tip.

Changed in maas:
milestone: none → 1.7.0
status: Triaged → Incomplete
Revision history for this message
Nicolas Thomas (thomnico) wrote :

If your stuck with the APM not present message:

Workaroud:

Mark node as broken
Commission node (again).

amttool IP reset

should put things back in order.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

It is still happening, I've seen this recently. The only way I could fix it was to physically switch the power on the NUC because when it gets in this state:

result: pt_status: not permitted

it doesn't seem to accept any commands. However if the "reset" command is let through (I can't remember if I tried it) this would be a possible fix as we can make the power script detect this response and issue a reset followed by a powerdown.

Changed in maas:
status: Incomplete → Triaged
Revision history for this message
Julian Edwards (julian-edwards) wrote :

(To reproduce, just turn it on manually when it's in the READY state IIRC)

Revision history for this message
Christian Reis (kiko) wrote :

Nicolas has found that if you detect the "not permitted" and then reboot into the acpioff.c32 image that I provided in https://bugs.launchpad.net/maas/+bug/1376716/comments/12 you get a workable node again.

I've asked him if he can upload a patch here, but I'm attaching his template for reference.

Revision history for this message
Nicolas Thomas (thomnico) wrote :

Here is the patch to AMT Template.

in short I check if powerdown or powercycle return a "not permited"
 if yes I force a reset pxe

Combined with the .c32 fix from Kiko you end up with a working MAAS again on all nodes.

Releasing installed node ALWAYS fail on the affected NUC (which more likely are the one with latest Bios).

Hope this helps,

Revision history for this message
Julian Edwards (julian-edwards) wrote :

I wonder if you can do this instead:

 issue_amt_command reset pxe
 issue_amt_command powerdown

It would avoid the need to push the acpioff.c32. I'll try it next time I get in this state, unless someone beats me to it.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

I think this is now basically a dupe of bug 1376716 isn't it? I've got a solution there, which is basically:

issue_amt_command reset cd
issue_amt_command powerdown

Which successfully turns off my NUC stuck at the boot prompt.

Revision history for this message
Nicolas Thomas (thomnico) wrote :

Julian the .c32 and amt template is working and tested in the full cycle.

We need a reliable solution by yesterday :) ...

In both cases they are workarounds and the root cause is in the change in behavior of the AMT.. so I will go with the one tested in the field i.e. kiko+me and hours/days of tests...

Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1368685] Re: After commissioning NUC reboots instead of shutting down

On Thursday 23 Oct 2014 09:46:28 you wrote:
> Julian the .c32 and amt template is working and tested in the full
> cycle.
>
> We need a reliable solution by yesterday :) ...

Understood!

> In both cases they are workarounds and the root cause is in the change
> in behavior of the AMT.. so I will go with the one tested in the field
> i.e. kiko+me and hours/days of tests...

It's fine as a workaround, the problem is that it's not a file that's present
in the Ubuntu archive. While we could include it with MAAS, that's a
maintenance burden and if the change I suggested can be reliably tested by a
few people, it's a better long term option.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Nicolas, I've worked out that you get the "result: pt_status: not permitted" only if you have a VNC session active. Kill VNC and power becomes suddenly much more reliable.

Revision history for this message
Christian Reis (kiko) wrote :

Just to summarize where we are on this bug:

- We do not clearly understand in what scenario the end of commissioning leads to a reboot instead of a shutdown. That is the main subject of this bug.
- When we boot a machine in any state other than "deployed" we tell is to shut down via the PXE poweroff image
- That image does not work on machines without APM (that is bug 1376716)

With the workaround in that latter bug, this bug becomes in practice harmless. In its absence, it's painful.

Christian Reis (kiko)
Changed in maas:
status: Triaged → Incomplete
Christian Reis (kiko)
Changed in maas:
milestone: 1.7.0 → 1.7.1
Changed in maas:
milestone: 1.7.1 → 1.7.2
Revision history for this message
Christian Reis (kiko) wrote :

This issue is likely to be caused by problems in ACPI or power management, as we do seem to correctly issue a powerdown. See: http://askubuntu.com/questions/132882/why-do-i-get-a-reboot-instead-of-a-shutdown

Changed in maas:
milestone: 1.7.2 → next
Revision history for this message
Gavin Panella (allenap) wrote :

The consensus is that there is a defect. This also still happens with my NUCs. We can't pinpoint what's broken, but that's not a reason to keep it Incomplete.

Changed in maas:
status: Incomplete → Triaged
tags: added: amt power
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi!

**This is an automated message**

We believe this is may no longer be an issue in the latest MAAS release. Due to the original date of the bug report, we are currently marking it as Invalid. If you believe this bug report still valid against the latest release of MAAS, or if you are still interested in this, please re-open this bug report.

Thanks

Changed in maas:
status: Triaged → Invalid
milestone: next → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.