Network boot from MAAS sometimes fails at "grub>" prompt

Bug #1904588 reported by Rod Smith
This bug report is a duplicate of:  Bug #1900668: MAAS PXE Boot stalls with grub 2.02 . Edit Remove
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
New
Undecided
Unassigned

Bug Description

On SOME (but not all) boots via MAAS, GRUB hangs at the "grub>" prompt. This happens about 10% or 20% of the time on affected servers (at least ostwald and meitner, two Supermicro servers).

The MAAS rackd.log file shows that the node has requested and received GRUB:

2020-11-17 10:08:47 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 10.1.10.82
2020-11-17 10:08:47 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 10.1.10.82
2020-11-17 10:08:50 provisioningserver.rackdservices.tftp: [info] grubx64.efi requested by 10.1.10.82

On a failed boot, the process then stops; the node does not request grub.cfg, as happens normally. Watching the console, I see several notices that read "error: Couldn't send network packet." (See attached screen shot.)

At the grub> prompt, net_ls_addr shows the expected IP address, and net_ls_routes shows a routing table; however, net_bootps results in an error message stating "can't find command `net_bootps`" (see second attached screen shot.)

Once the system is hung, typing "exit" at the "grub>" prompt causes the server to try the next boot option, which usually works (booting via another network interface, in the case of our servers).

As noted, this problem occurs on a minority of boots. It can affect reboots after deployment, and if it occurs during deployment, it can prevent deployment because the server will hang at the "grub>" prompt.

Revision history for this message
Rod Smith (rodsmith) wrote :
Revision history for this message
Rod Smith (rodsmith) wrote :
Revision history for this message
Julian Andres Klode (juliank) wrote :

Hello!

This seems to be a duplicate of https://launchpad.net/bugs/1900668 - could you check that? I have some list of things in there I'm looking for, mostly running with debug=all and trying current SRU in proposed with has a tftp fix (probably unrelated).

Thanks!

Revision history for this message
Julian Andres Klode (juliank) wrote :

The proper command name is net_bootp, not net_bootps

Revision history for this message
Julian Andres Klode (juliank) wrote :

Sorry for the many comments, but net_dhcp is a more modern replacement for net_bootp too AFAIUI.

Revision history for this message
Rod Smith (rodsmith) wrote :

Ah, I thought the fact that "net_bootps" returned a different error meant this was a different bug; but if that's a typo, then perhaps not. Also, typing "normal" (as in comment #5 there) doesn't work. Still, it does look like it's likely a duplicate.

Revision history for this message
Rod Smith (rodsmith) wrote :

It turns out that "normal" does work, although I could swear it didn't work earlier. Perhaps I mis-typed the command. In any event, I'm marking this bug report as a duplicate now.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.