UEFI deployment broken in MAAS 1.9

Bug #1510120 reported by Rod Smith on 2015-10-26
26
This bug affects 6 people
Affects Status Importance Assigned to Milestone
MAAS
Critical
Unassigned

Bug Description

UEFI-mode deployment seems to be broken in MAAS 1.9. Enlistment and commissioning work fine, as does most of deployment; however, there seems to be a problem with the GRUB configuration. Instead of the normal boot at the end of deployment, I get a GRUB menu with one option. Selecting that fails and, after a prompt to press a key, brings up the same menu. The GRUB on the hard disk seems to work, though; I'm able to boot to it via a boot to rEFInd on a USB drive and boot the system. Thus, it seems to be the GRUB being sent via PXE from the MAAS server that's unable to redirect to GRUB on the hard disk (or use the disk-based GRUB files).

Here's the version information:

$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=====================================================-===================================================-============-===============================================================================
ii maas 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS server all-in-one metapackage
ii maas-cert-server 0.2.6-0~34~ubuntu14.04.1 all Ubuntu certification support files for MAAS server
ii maas-cli 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS command line API tool
ii maas-cluster-controller 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS server cluster controller
ii maas-common 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS server common files
ii maas-dhcp 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS DHCP server
ii maas-dns 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS DNS server
ii maas-proxy 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS Caching Proxy
ii maas-region-controller 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS server complete region controller
ii maas-region-controller-min 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS Server minimum region controller
ii python-django-maas 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS server Django web framework
ii python-maas-client 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS python API client
ii python-maas-provisioningserver 1.9.0~alpha5+bzr4383-0ubuntu1~trusty1 all MAAS server provisioning libraries

I'm attaching the /var/log/maas directory tree. Note that I've reconfigured my two UEFI-based nodes to boot in BIOS/CSM/legacy mode to work around this issue for the time being.

Related branches

Rod Smith (rodsmith) wrote :
Rod Smith (rodsmith) wrote :

This works fine with MAAS 1.8 on the same hardware (MAAS server and nodes).

Andres Rodriguez (andreserl) wrote :

Hi Rod,

Can you please attache the install log from the node details webui?

Thankls

Changed in maas:
importance: Undecided → Critical
milestone: none → 1.9.0
Rod Smith (rodsmith) wrote :

I'm attaching the installation output from the web UI of a failed deployment. (This was taken while the system still showed up as "deploying" in the MAAS web UI, but it had reached its failure point on the screen.) Also, here's a more precise description of what shows up on the screen at the point of failure:

1. Normal firmware displays
2. "Press any key to continue..." prompt.
3. If I press a key, "Failed to boot both default and fallback entries" appears, followed by "Press any key to continue..." again.
4. If I press a key, GRUB 2.02~beta2-9 menu appears, featuring one entry called "Local".
5. If I select the "Local" GRUB entry, "Press any key to continue..." appears again.
6. GOTO 4.

Gavin Panella (allenap) on 2015-10-27
Changed in maas:
status: New → Triaged
Chris Gregan (cgregan) wrote :

I have a system effected by this as well. Same resulting grub menu in the bootstrapped node: https://chinstrap.canonical.com/~cgregan/grub_screenshot.png

tags: added: cdo-qa
Rod Smith (rodsmith) wrote :

I've also now run into this bug under MAAS 1.8 (version information below).

$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===================================-===========================================-============-===============================================================================
ii maas 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS server all-in-one metapackage
ii maas-cert-server 0.2.6-0~33~ubuntu14.04.1 all Ubuntu certification support files for MAAS server
ii maas-cli 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS command line API tool
ii maas-cluster-controller 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS server cluster controller
ii maas-common 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS server common files
ii maas-dhcp 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS DHCP server
ii maas-dns 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS DNS server
ii maas-proxy 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS Caching Proxy
ii maas-region-controller 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS server complete region controller
ii maas-region-controller-min 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS Server minimum region controller
ii python-django-maas 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS server Django web framework
ii python-maas-client 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS python API client
ii python-maas-provisioningserver 1.8.2+bzr4041-0ubuntu1~trusty1 all MAAS server provisioning libraries

tags: added: blocks-hwcert-server
Rod Smith (rodsmith) wrote :

Please disregard my previous comment (#6); the MAAS 1.8 problem was actually bug #1437024, which has similar symptoms but is a different bug from this one.

Changed in maas:
assignee: nobody → Blake Rouse (blake-rouse)
status: Triaged → In Progress
Blake Rouse (blake-rouse) wrote :

The issue here is that the grubnetx64.efi.signed file in grub-efi-amd64-signed package in trusty does not contain the LVM module. So if you deploy with the flat storage layout every thing works, if you deploy with LVM it will fail to boot from the local disk.

I built a custom grub that includes the LVM module and the system was able to boot correctly.

The issue is that the grubnetx64.efi.signed is not being built with lvm.ko.

Changed in maas:
status: In Progress → Triaged
assignee: Blake Rouse (blake-rouse) → nobody
Changed in grub2-signed (Ubuntu):
status: New → Confirmed
Blake Rouse (blake-rouse) wrote :

Bug 1511437 is related but not the full issue.

Changed in maas:
status: Triaged → In Progress
no longer affects: grub2-signed (Ubuntu)
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
Pieter (pieter-koorts) wrote :

Not sure if this comment is appropriate for this bug or a new bug report however with the release MAAS 1.9 we are unable to boot via UEFI however legacy boot does still work. The service starts the PXE process and then ends up at the grub prompt without continuing.

#############
MAAS clusterd.log
#############
2016-01-06 14:29:06+0000 [TFTP (UDP)] Datagram received from ('10.40.2.33', 1595): <RRQDatagram(filename=bootx64.efi, mode=octet, options={'tsize': '0', 'blksize': '1468'})>
2016-01-06 14:29:06+0000 [-] RemoteOriginReadSession starting on 48264
2016-01-06 14:29:06+0000 [-] Starting protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7f02255b31b8>
2016-01-06 14:29:06+0000 [RemoteOriginReadSession (UDP)] Got error: <tftp.datagram.ERRORDatagram object at 0x7f022bf9f810>
2016-01-06 14:29:06+0000 [-] (UDP Port 48264 Closed)
2016-01-06 14:29:06+0000 [-] Stopping protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7f02255b31b8>
2016-01-06 14:29:07+0000 [TFTP (UDP)] Datagram received from ('10.40.2.33', 1596): <RRQDatagram(filename=bootx64.efi, mode=octet, options={'blksize': '1468'})>
2016-01-06 14:29:07+0000 [-] RemoteOriginReadSession starting on 44794
2016-01-06 14:29:07+0000 [-] Starting protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7f0225615f80>
2016-01-06 14:29:07+0000 [RemoteOriginReadSession (UDP)] Final ACK received, transfer successful
2016-01-06 14:29:07+0000 [-] (UDP Port 44794 Closed)
2016-01-06 14:29:07+0000 [-] Stopping protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7f0225615f80>
2016-01-06 14:29:07+0000 [TFTP (UDP)] Datagram received from ('10.40.2.33', 1597): <RRQDatagram(filename=grubx64.efi, mode=octet, options={'blksize': '512'})>
2016-01-06 14:29:07+0000 [-] RemoteOriginReadSession starting on 43350
2016-01-06 14:29:07+0000 [-] Starting protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7f0225615368>
2016-01-06 14:29:07+0000 [RemoteOriginReadSession (UDP)] Final ACK received, transfer successful
2016-01-06 14:29:07+0000 [-] (UDP Port 43350 Closed)
2016-01-06 14:29:07+0000 [-] Stopping protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7f0225615368>
#############

I have attached screenshots of the PXE boot process and the grub prompt waiting. This is using Ubuntu 14.04 as the commissioning image however due to hardware being new the hwe-v kernel is being used. The machine is configured for UEFI booting only and legacy booting is currently disabled. Server is a SuperMicro 6028R-E1CR12L

I have exactly the same issue as Pieter, I keep getting just the GRUB rescue prompt.
MAAS version 1.9.1 installed from MAAS/stable PPA. Regiond is running 14.04, Clusterd running on 15.10.

Rod Smith (rodsmith) wrote :

Pieter and Domonkos: I think you may have a new bug and so should file a new bug report.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers