PXE load Focal initrd.img-5.4.0-42-generic always timeout

Bug #1892290 reported by xinliang
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Fix Released
Undecided
Unassigned
initramfs-tools (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

PXE booting Focal initrd always gets a timeout, but PXE booting Bionic initrd works
error: timeout reading `initrd.img-5.4.0-42-generic'.

grub.cfg
--------
menuentry "boot_iscsi" {
    linux vmlinuz-5.4.0-42-generic ...
    initrd initrd.img-5.4.0-42-generic
}

grub, kernel and initrd get from Focal boot dir
$ cp /usr/lib/grub/arm64-efi-signed/grubnetaa64.efi.signed ~/tftproot/grubaa64.efi
$ cp /boot/vmlinuz-5.4.0-42-generic ~/tftproot/
$ cp /boot/initrd.img-5.4.0-42-generic ~/tftproot/

hardware
--------
Real aarch64 server or qemu-system-aarch64 machine

software
--------
$ apt search grub-efi-arm64
Sorting... Done
Full Text Search... Done
grub-efi-arm64/focal-updates,focal-security,now 2.04-1ubuntu26.2 arm64 [installed]
  GRand Unified Bootloader, version 2 (ARM64 UEFI version)

grub-efi-arm64-bin/focal-updates,focal-security,now 2.04-1ubuntu26.2 arm64 [installed,automatic]
  GRand Unified Bootloader, version 2 (ARM64 UEFI modules)

grub-efi-arm64-dbg/focal-updates,focal-security 2.04-1ubuntu26.2 arm64
  GRand Unified Bootloader, version 2 (ARM64 UEFI debug files)

grub-efi-arm64-signed/focal-updates,focal-security,now 1.142.4+2.04-1ubuntu26.2 arm64 [installed]
  GRand Unified Bootloader, version 2 (EFI-ARM64 version, signed)

grub-efi-arm64-signed-template/focal-updates,focal-security 2.04-1ubuntu26.2 arm64
  GRand Unified Bootloader, version 2 (ARM64 UEFI signing template)

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04 LTS
Release: 20.04
Codename: focal
$ uname -a
Linux j12-d05-07 5.4.0-42-generic #1 SMP Tue Jul 7 02:48:00 GMT 2020 aarch64 aarch64 aarch64 GNU/Linux

Revision history for this message
xinliang (xin3liang) wrote :

Note that PXE booting Bionic initrd works.

Even though change the initrd compress to gz, it still gets timeout
$ unmkinitramfs initrd.img-5.4.0-42-generic ubuntu-focal-initrd/
$ cd ubuntu-focal-initrd/; find . | cpio -H newc -o | gzip > ../ubuntu-focal-new.initrd

Revision history for this message
Julian Andres Klode (juliank) wrote :

Is one significantly larger than the other?

Revision history for this message
xinliang (xin3liang) wrote :

Yes, focal initrd is a little larger than bionic one.
$ ls -lh /boot/initrd.img-5.4.0-42-generic
-rw-r--r-- 1 root root 81M Jul 22 06:35 /boot/initrd.img-5.4.0-42-generic
$ ls -lh bm-ubuntu-bionic.initrd
-rw-r--r-- 1 root root 52M Aug 10 03:49 bm-ubuntu-bionic.initrd

But not sure if it is related to size. I have tried a more larger initrd which works too.
$ ls -lh ipa-ubuntu-bionic.initramfs
-rw-rw-r-- 1 stack stack 470M Jul 27 10:00 ipa-ubuntu-bionic.initramfs

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in grub2 (Ubuntu):
status: New → Confirmed
Changed in initramfs-tools (Ubuntu):
status: New → Confirmed
Revision history for this message
xinliang (xin3liang) wrote :

Any progress on this bug? @janitor

Revision history for this message
xinliang (xin3liang) wrote :

Similar bug: https://bugzilla.redhat.com/show_bug.cgi?id=1869987
I've tried boot by cmdline, looks the same issue.
grub> linux vmlinuz-5.4.0-47-generic
grub> echo $?
0
grub> initrd initrd.img-5.4.0-47-generic
error: timeout reading `initrd.img-5.4.0-47-generic'.
grub> echo $?
28

It looks like it is caused by Commit 781b3e5efc3 (tftp: Do not use priority queue) .
And bellow upstream commit can fix it.

commit a6838bbc6726ad624bd2b94991f690b8e9d23c69
Author: Javier Martinez Canillas <email address hidden>
Date: Thu Sep 10 17:17:57 2020 +0200

    tftp: Roll-over block counter to prevent data packets timeouts

    Commit 781b3e5efc3 (tftp: Do not use priority queue) caused a regression
    when fetching files over TFTP whose size is bigger than 65535 * block size.

      grub> linux /images/pxeboot/vmlinuz
      grub> echo $?
      0
      grub> initrd /images/pxeboot/initrd.img
      error: timeout reading '/images/pxeboot/initrd.img'.
      grub> echo $?
      28

    It is caused by the block number counter being a 16-bit field, which leads
    to a maximum file size of ((1 << 16) - 1) * block size. Because GRUB sets
    the block size to 1024 octets (by using the TFTP Blocksize Option from RFC
    2348 [0]), the maximum file size that can be transferred is 67107840 bytes.

    The TFTP PROTOCOL (REVISION 2) RFC 1350 [1] does not mention what a client
    should do when a file size is bigger than the maximum, but most TFTP hosts
    support the block number counter to be rolled over. That is, acking a data
    packet with a block number of 0 is taken as if the 65356th block was acked.

    It was working before because the block counter roll-over was happening due
    an overflow. But that got fixed by the mentioned commit, which led to the
    regression when attempting to fetch files larger than the maximum size.

    To allow TFTP file transfers of unlimited size again, re-introduce a block
    counter roll-over so the data packets are acked preventing the timeouts.

    [0]: https://tools.ietf.org/html/rfc2348
    [1]: https://tools.ietf.org/html/rfc1350

    Fixes: 781b3e5efc3 (tftp: Do not use priority queue)

    Suggested-by: Peter Jones <email address hidden>
    Signed-off-by: Javier Martinez Canillas <email address hidden>
    Reviewed-by: Daniel Kiper <email address hidden>

Revision history for this message
xinliang (xin3liang) wrote :

Verified that focal grub2 with above commit works.
grub> linux vmlinuz-5.4.0-47-generic
grub> echo $?
0
grub> initrd initrd.img-5.4.0-47-generic
grub> echo $?
0

Rebuild grub2 deb with above commit and install
$ git clone -b applied/ubuntu/focal-updates https://git.launchpad.net/ubuntu/+source/grub2
$ cd grub2; git cherry-pick a6838bbc6726ad624bd2b94991f690b8e9d23c69 (which fetch from https://git.savannah.gnu.org/git/grub.git)
$ sudo apt-get build-dep grub2
$ dpkg-buildpackage -rfakeroot -b
$ cd ../; sudo apt install ./*.deb

Revision history for this message
xinliang (xin3liang) wrote :

To be noted that this issue also exists in Debian 10 and Ubuntu 18.04. Because they all has commit 781b3e5efc3 (tftp: Do not use priority queue).

Revision history for this message
cleary (bernard-gray) wrote :

hi @xinliang - thanks for raising this bug, and for the info.
I've got my way through to a grub build (I'm going to append some extra notes to your instructions) but, I'm unclear about where the:

$ sudo apt install ./*.deb

is supposed to be applied - are you doing this on your pxe/tftp server? Or does this need to be in the image that is being netbooted? (or both?)

If it's going on the pxe/tftp server, what files is it providing/replacing to resolve the bug (ie is it updating a bootx64.efi image)?

As promised - a little extra detail on the build notes:

$ git clone -b applied/ubuntu/focal-updates https://git.launchpad.net/ubuntu/+source/grub2
$ cd grub2
$ git cherry-pick a6838bbc6726ad624bd2b94991f690b8e9d23c69 # this will throw an error
$ git remote add upstream https://git.savannah.gnu.org/git/grub.git
$ git fetch upstream
$ git cherry-pick a6838bbc6726ad624bd2b94991f690b8e9d23c69
$ dpkg-buildpackage -rfakeroot -b

Revision history for this message
cleary (bernard-gray) wrote :

Oh I forgot to mention, I had some issues getting past the build-tests, so I used this switch:

$ DEB_BUILD_OPTIONS=nocheck dpkg-buildpackage -rfakeroot -b

Revision history for this message
xinliang (xin3liang) wrote :

Hi @cleary, only need to update grub efi on your tftp server. I use grub-mknetdir cmd to do it. Like https://github.com/openstack/ironic/blob/master/devstack/lib/ironic#L2673-L2674

Revision history for this message
cleary (bernard-gray) wrote :

Thanks! and also for the tip on `mknetdir` - I will be checking it out!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.06-2ubuntu3

---------------
grub2 (2.06-2ubuntu3) jammy; urgency=medium

  * Cherry-pick the missing hunk back that changes parameter loading
    in grub-core/loader/i386/linux.c, this should fix booting on
    BIOS systems.
  * Fix the fallback for kernel addresses on amd64 EFI, if the kernel
    could not be allocated at the preferred address, reset errno such
    that if the 2nd allocation succeeds, we do not fail erroneously.

 -- Julian Andres Klode <email address hidden> Mon, 13 Dec 2021 14:27:53 +0100

Changed in grub2 (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.