Rescue Mode fails and can't exit

Bug #1700161 reported by Scott Pendleton
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned

Bug Description

MAAS Version 2.1.5+bzr5596-0ubuntu1 (16.04.1)

A machine failed deployment so I tried to enter Rescue mode, which immediately fails with generic message "failed to enter rescue mode" The only selection that is then allowed is "exit Rescue Mode" no other node options work except Exit Rescue Mode and I assume delete. I can't abort, I can't release, mark broken fixed, commission or anythings else. After selecting Exit rescue mode the status hangs there and nothing happens, the node is quiet, it doesn't reboot, or do anything else. Manually rebooting it does nothing. I left it at "Exiting Rescue Mode" for 3 hours and nothing happens. The only option is to remove the node with delete, re-discover it, commission it and reconfigure all its disks and network settings again.

logs don't show any errors

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-==============================-============-=================================================
ii maas 2.1.5+bzr5596-0ubuntu1~16.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.1.5+bzr5596-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.1.5+bzr5596-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.1.5+bzr5596-0ubuntu1~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.1.5+bzr5596-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Scott,

Could you please attach the logs when you saw uthese issues? I'm thinking your rack controller was disconnected when this happened.

Also, to unblock, go to the machine listing page, select it, and try to change the status to broken and then mark fixed.

Thanks!

Changed in maas:
status: New → Incomplete
milestone: none → 2.3.0
Revision history for this message
Scott Pendleton (scott.pendleton) wrote :

I can do other actions with other nodes, so I don't think it was disconnected. I was trying to get into rescue mode to figure out why a 16.04 deployment fails on this one node, but 14.04 deploys just fine.

Revision history for this message
Scott Pendleton (scott.pendleton) wrote :
Revision history for this message
Scott Pendleton (scott.pendleton) wrote :
Revision history for this message
Scott Pendleton (scott.pendleton) wrote :

As stated in the original post once I attempt to enter rescue mode I can't do anything but delete the node. re-marking as broken or any other action returns "1 node can't be <action>, please update selection".

Revision history for this message
Scott Pendleton (scott.pendleton) wrote :

After doing some experimenting this issue is only occurring with nodes that have power control set to manual. I had plugged a server with IMPI power control into the environment and it enters and exits rescue mode without issue. With Manual power control nodes it fails to enter rescue mode within seconds of attempting to enter the mode so its not a matter of the node timing out.

no longer affects: maas/2.2
Changed in maas:
milestone: 2.3.0 → 2.3.x
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi!

**This is an automated message**

We believe this is may no longer be an issue in the latest MAAS release. Due to the original date of the bug report, we are currently marking it as Invalid. If you believe this bug report still valid against the latest release of MAAS, or if you are still interested in this, please re-open this bug report.

Thanks

Changed in maas:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.