Network boot from MAAS sometimes fails at "grub>" prompt
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
grub2 (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
On SOME (but not all) boots via MAAS, GRUB hangs at the "grub>" prompt. This happens about 10% or 20% of the time on affected servers (at least ostwald and meitner, two Supermicro servers).
The MAAS rackd.log file shows that the node has requested and received GRUB:
2020-11-17 10:08:47 provisioningser
2020-11-17 10:08:47 provisioningser
2020-11-17 10:08:50 provisioningser
On a failed boot, the process then stops; the node does not request grub.cfg, as happens normally. Watching the console, I see several notices that read "error: Couldn't send network packet." (See attached screen shot.)
At the grub> prompt, net_ls_addr shows the expected IP address, and net_ls_routes shows a routing table; however, net_bootps results in an error message stating "can't find command `net_bootps`" (see second attached screen shot.)
Once the system is hung, typing "exit" at the "grub>" prompt causes the server to try the next boot option, which usually works (booting via another network interface, in the case of our servers).
As noted, this problem occurs on a minority of boots. It can affect reboots after deployment, and if it occurs during deployment, it can prevent deployment because the server will hang at the "grub>" prompt.
Hello!
This seems to be a duplicate of https:/ /launchpad. net/bugs/ 1900668 - could you check that? I have some list of things in there I'm looking for, mostly running with debug=all and trying current SRU in proposed with has a tftp fix (probably unrelated).
Thanks!