Pick up wrong grub.cfg if another filesystem exists

Bug #1582070 reported by Ike Panhc on 2016-05-16
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Medium
Newell Jensen
1.9
Medium
Newell Jensen
2.0
Medium
Newell Jensen

Bug Description

With BMC has no capability to switch boot order from PXE to SATA, MAAS provides grub.cfg to search another grub.cfg on harddrive

set default="0"
set timeout=0

menuentry 'Local' {
    echo 'Booting local disk...'
    search --set=root --file /boot/grub/grub.cfg
    configfile /boot/grub/grub.cfg
}

But if another filesystem is there and unfortunately an /boot/grub/grub.cfg on it, e.g. an Ubuntu installer CDROM or usb stick. Grub might pick up wrong grub.cfg and deploying will fail.

Related branches

Andrew Cloke (andrew-cloke) wrote :

We faced this issue, because a USB stick that happened to have an Ubuntu image on it was placed into the server being controlled by MAAS. The server then started to misbehave when MAAS deployments were made. The issue took a *long* time to root cause, and we believe this behaviour makes MAAS significantly more fragile.

In a real world environment, we don't believe this would be a particularly unusual sequence of events.

Rod Smith (rodsmith) wrote :

Be aware that Ike's bug report originates on ARM64/EFI systems. This is likely to be important....

I'm not sure where MAAS stores all its grub.cfg files (I could find just one on one of the certification servers), so I haven't checked the details in the code, but I believe the behavior is different under AMD64/EFI -- on that platform, MAAS seems to deliver a GRUB that launches any GRUB it discovers on the hard disk, rather than try to find a local grub.cfg file and use it. This approach, if used on ARM, would alleviate this problem, since the USB drive would be unlikely to have an ARM/EFI version of GRUB. (As Andrew says, the bug was encountered because an AMD64 USB drive was inserted in the node.)

That said, the AMD64/EFI approach has its own problems, as noted in bug #1578837; however, that issue may have buggy firmware as its root cause. Also, it's a Secure Boot issue and Secure Boot has yet to become a major factor on ARM64 (AFAIK).

In any event, even if I'm right, I don't know the reasons for the differences between ARM64/EFI and AMD64/EFI in how MAAS hands off from the PXE boot to the local GRUB or GRUB configuration. If there's a technical reason for this difference, then of course switching to the AMD64/EFI approach may be impractical on ARM. If not, though, switching ARM to the AMD64/EFI approach may be worth considering.

Changed in maas:
assignee: nobody → Newell Jensen (newell-jensen)
Newell Jensen (newell-jensen) wrote :

Andrew/Ike,

Can I get access to the hardware setup that is currently producing this issue for you guys?

Thanks,

Newell

Andrew Cloke (andrew-cloke) wrote :

Yes - that shouldn't be a problem. I'll reply off bug with the lab specific access information...

Raghuram Kota (rkota) on 2016-06-06
tags: added: hs-arm64
tags: added: arm64
Newell Jensen (newell-jensen) wrote :

I have been working on getting access to the hardware to reproduce this issue. The hardware was recently flashed with a new firmware that is apparently causing Ike issues. Currently waiting on Ike to get this resolved so we can move forward.

Changed in maas:
status: New → In Progress
Newell Jensen (newell-jensen) wrote :

Haven't been able to get the information I have requested. Setting to incomplete for now.

Changed in maas:
status: In Progress → Incomplete
Changed in maas:
importance: Undecided → Medium
summary: - Pick up wrong grub.cfg if another filesystem exist
+ Pick up wrong grub.cfg if another filesystem exists
Newell Jensen (newell-jensen) wrote :

I was given access to the hardware finally and was able to reproduce the issue. Working on the bug now.

Changed in maas:
status: Incomplete → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: none → next
Changed in maas:
status: Fix Committed → Fix Released
Changed in maas:
milestone: next → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers