Commissioning fails on NUCS previously loaded with coreos
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
We have a bunch of NUCs in our lab. Some number of them always failing to commission with otherwise identical bios, settings, and AMT configuration, all in the same rack on the same controller on the same switch.
Resolution that works 100% is to live-boot via usb 16.04, and then to wipe the start of the SSD with "dd if=/dev/zero of=/dev/sda blocksize=1M count=1", although I'm pretty sure a much smaller blocksize would get the job done.
The only common thread among these NUCs is they previously had coreos installed on them.
Once the SSD is wiped, commissioning works from that point onward.
Seems like this is something the initial MAAS boot image should or could do by way of hygiene.
Very little debug is provided in the MAAS UI by way of the actual commissioning failure.
=================
/var/log during commission failed (nuc 17 is our "coreos was installed last" test case:
Nov 29 13:36:52 nuc-20 maas.node: [info] nuc-17: Status transition from FAILED_
Nov 29 13:36:52 nuc-20 maas.power: [info] Changing power state (on) of node: nuc-17 (6nhkyc)
Nov 29 13:36:52 nuc-20 maas.node: [info] nuc-17: Commissioning started
Nov 29 13:38:06 nuc-20 maas.power: [info] Changed power state (on) of node: nuc-17 (6nhkyc)
Nov 29 13:39:12 nuc-20 maas.power: [info] nuc-17: Power state has changed from on to off.
Nov 29 13:50:10 nuc-20 maas.import-images: [info] Started importing boot images.
Nov 29 13:50:10 nuc-20 maas.import-images: [info] Downloading image descriptions from http://
Nov 29 13:50:11 nuc-20 maas.import-images: [info] Updating boot image iSCSI targets.
Nov 29 13:50:11 nuc-20 maas.import-images: [info] Finished importing boot images, the region does not have any new images.
Nov 29 13:57:55 nuc-20 maas.node: [error] nuc-17: Marking node failed: Machine operation 'Commissioning' timed out after 20 minutes.
Nov 29 13:57:55 nuc-20 maas.node: [info] nuc-17: Status transition from COMMISSIONING to FAILED_
dpkg output...
------------
ii maas 2.1.1+bzr5544-
ii maas-cli 2.1.1+bzr5544-
un maas-cluster-
ii maas-common 2.1.1+bzr5544-
ii maas-dhcp 2.1.1+bzr5544-
ii maas-dns 2.1.1+bzr5544-
ii maas-proxy 2.1.1+bzr5544-
ii maas-rack-
ii maas-region-api 2.1.1+bzr5544-
ii maas-region-
un maas-region-
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-
ii python3-django-maas 2.1.1+bzr5544-
ii python3-maas-client 2.1.1+bzr5544-
ii python3-
Hi Bob,
Could you please attempt a commissioning and attach:
1. # for the specific commissioning action you started. maas/rsyslog/ <machine- name>/< date>/messages
/var/log/
2. Enable SSH action during commissioning and SSH into the machine and grab: cloud-init{ -output} .log
/var/log/
3. Provide the machine's node event log.
Go to the MAAS WebUI, go to the machine's details page, go to "Latest node events", click on "View full history".
Thanks.