Node stops commissioning and powers down before commissioning is complete

Bug #1523766 reported by Paul Gear
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Won't Fix
Undecided
Unassigned

Bug Description

After upgrading from 1.8.3 (+bzr4053) (the default MAAS version on wily) to 1.9.0 (rc3+bzr4525) from ppa:maas/next, I'm unable to commission an existing node; it powers off while the node list says it's still commissioning. /var/log/maas/regiond.log attached showing 500s.

Deleting the node also fails, with the error message: "Node failed to be deleted, because of the following error: DHCPv4 server is disabled. "

root@maas:/var/log/maas# dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============================-================================-============-======================================
ii maas 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS server all-in-one metapackage
ii maas-cli 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS command line API tool
ii maas-cluster-controller 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS server cluster controller
ii maas-common 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS server common files
ii maas-dhcp 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS DHCP server
ii maas-dns 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS DNS server
ii maas-proxy 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS Caching Proxy
ii maas-region-controller 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS server complete region controller
ii maas-region-controller-min 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS Server minimum region controller
ii python-django-maas 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS server Django web framework
ii python-maas-client 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS python API client
ii python-maas-provisioningserver 1.9.0~rc3+bzr4525-0ubuntu1~wily1 all MAAS server provisioning libraries

Revision history for this message
Paul Gear (paulgear) wrote :
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Doe sit even boot into the commissioning environment? Can you please ssh into the commissioning environment and get the /var/log/cloud-init-*.log

Thanks!

Revision history for this message
Mike Pontillo (mpontillo) wrote :

OK, I've got this reproduced, to some extent.

First I installed MAAS 1.8.3 and commissioned two nodes. The first time I tried, commissioning failed for both nodes, with the "node powers off during commissioning" bug you described. I tried again, and it worked.

Then I upgraded to MAAS 1.9. I took the first node and recommissioned it. The second node I immediately tried to deploy, and got an error saying that the node must be configured to be on a network. (apparently we can't smoothly upgrade completely-unmanaged nodes; probably because we aren't sure which subnet they exist on.) After configuring the node for my unmanaged subnet in the node details page and setting it to "Auto assign" for the IP address (and ensuring I had a static range defined), I saw the following error upon trying to deploy it:

Node failed to be deployed, because of the following error: DHCPv4 server is disabled.

So it appears that as of MAAS 1.9, we refuse to do anything with nodes if we cannot contact the MAAS DCHP server.

I think this is the bug we should address here. If there are other issues (such as commissioning failures) let's file a separate bug for those.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

On second thought, this bug already seems focused on the commissioning problem, so I'll move the DHCP issue to another bug.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Filed bug #1524091 for the DHCP issue.

Revision history for this message
Paul Gear (paulgear) wrote :

Hi Mike,

The system boots just fine into the commissioning environment; it only appears to be because of the errors returned from the MAAS server that it shuts down. Would you still like me to grab the cloud init logs?

Thanks,
Paul

Revision history for this message
Mike Pontillo (mpontillo) wrote :

This was just brought to my attention again. (Apologies; looks like it dropped off our radar before.)

Paul, is this still an issue for you?

Revision history for this message
Mike Pontillo (mpontillo) wrote :

We're really focused on MAAS 2.x and owning the entire network, rather than supporting unmanaged DHCP. So I'll make this a Won't Fix for now. Let me know if there is still an urgent need to address this; if so, we'll reconsider.

Changed in maas:
status: New → Won't Fix
milestone: none → 1.9.4
Revision history for this message
Paul Gear (paulgear) wrote :

Hi Mike,

Thanks for following up. The issue with not supporting non-MAAS DHCP services is that it makes it more difficult (although not impossible) to mix MAAS and non-MAAS devices on the same segment, which is probably moderately common for enterprise MANLAN setups. e.g., I have a Ruckus wireless access point (which uses DHCP and needs to PXE boot from its local non-x86 wireless LAN controller) on the same network as my amd64 MAAS client.

From memory, this bug was found in my home test lab, and so is not important from that perspective. I'm happy to leave it as Won't Fix, although it is worth noting that making it more difficult to integrate with legacy networks will limit MAAS' penetration into such environments. It would be great if there was a clearly documented method for installing MAAS into such environments - this may exist, but I couldn't find it at the time of logging this bug.

Regards,
Paul

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.