grub-efi-riscv64-bin 2.06-2ubuntu7.1 makes D1 unbootable

Bug #2011744 reported by Denis Ovsienko
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Invalid
Undecided
Heinrich Schuchardt

Bug Description

I have been running the Ubuntu 22.04 image on a Nezha D1 board for a few months, and until recently installation of updates did not cause problems. However, after a recent dist-upgrade the board failed to boot. I reinstalled the system from a new image and then installed updates again batch by batch doing a successful reboot after each batch, until only the following packages remained:

grub-common
grub-efi-riscv64-bin
grub-efi-riscv64
grub2-common
u-boot-nezha

As it turned out after the next step, it was the grub2 packages that break the boot (it does not make any difference if you upgrade all 4 or just the 2 packages below). This seems consistent with the fact both FAT filesystems on the card become damaged after the upgrade, which is possible to see if you move the SD card to a working Linux host and try to run fsck. As the result, GRUB never starts (note the "invalid FAT entry" message in the log) and the previous bootloader resorts to a network boot, which does not work.

Steps to reproduce:
1. Flash ubuntu-22.04.1-preinstalled-server-riscv64+nezha.img; GRUB packages are version 2.06-2ubuntu7, everything works.
2. apt-get install grub-efi-riscv64 grub-efi-riscv64-bin
4. reboot

root@ubuntu:~# xargs apt-get -y install
grub-common
grub-efi-riscv64-bin
grub-efi-riscv64
grub2-common
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libflashrom1 libftdi1-2
Use 'apt autoremove' to remove them.
Suggested packages:
  multiboot-doc mtools xorriso desktop-base
The following packages will be upgraded:
  grub-common grub-efi-riscv64 grub-efi-riscv64-bin grub2-common
4 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
Need to get 3905 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main riscv64 grub2-common riscv64 2.06-2ubuntu7.1 [674 kB]
Get:2 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main riscv64 grub-common riscv64 2.06-2ubuntu7.1 [2095 kB]
Get:3 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main riscv64 grub-efi-riscv64 riscv64 2.06-2ubuntu7.1 [57.4 kB]
Get:4 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main riscv64 grub-efi-riscv64-bin riscv64 2.06-2ubuntu7.1 [1078 kB]
Fetched 3905 kB in 3s (1498 kB/s)
Preconfiguring packages ...
1;-1fplymouth-reboot.service
         Stopping User Runtime Directory /run/user/1000...
[ OK ] Unmounted /run/user/1000.
[ OK ] Stopped User Runtime Directory /run/user/1000.
[ OK ] Removed slice User Slice of UID 1000.
         Stopping Permit User Sessions...
[ OK ] Stopped Permit User Sessions.
[ OK ] Stopped target Network.
[ OK ] Stopped target Remote File Systems.
[ OK ] Stopped target Preparation for Remote File Systems.
         Stopping Network Name Resolution...
         Stopping WPA supplicant...
[ OK ] Stopped WPA supplicant.
[ OK ] Stopped target Basic System.
[ OK ] Stopped Forward Password R…s to Plymouth Directory Watch.
[ OK ] Stopped target Path Units.
[ OK ] Stopped target Slice Units.
[ OK ] Removed slice User and Session Slice.
[ OK ] Stopped target Socket Units.
[ OK ] Closed cloud-init hotplug hook socket.
[ OK ] Closed Open-iSCSI iscsid Socket.
[ OK ] Closed Syslog Socket.
[ OK ] Closed UUID daemon activation socket.
[ OK ] Stopped target System Initialization.
[ OK ] Unset automount Arbitrary â¦s File System Automount Point.
[ OK ] Stopped target Local Encrypted Volumes.
[ OK ] Stopped Forward Password Râ¦uests to Wall Directory Watch.
[ OK ] Stopped target Swaps.
[ OK ] Stopped target Local Verity Protected Volumes.
[ OK ] Stopped Initial cloud-initâ¦ob (metadata service crawler).
[ OK ] Stopped Wait for Network to be Configured.
         Stopping Network Time Synchronization...
         Stopping Record System Boot/Shutdown in UTMP...
[ OK ] Stopped Network Time Synchronization.
[ OK ] Stopped Network Name Resolution.
[ OK ] Stopped Record System Boot/Shutdown in UTMP.
         Stopping Network Configuration...
[ OK ] Stopped Create Volatile Files and Directories.
[ OK ] Stopped Network Configuration.
[ OK ] Stopped target Preparation for Network.
[ OK ] Closed Network Service Netlink Socket.
[ OK ] Stopped Initial cloud-init job (pre-networking).
[ OK ] Stopped Apply Kernel Variables.
[ OK ] Stopped Load Kernel Modules.
[ OK ] Stopped Create final runtime dir for shutdown pivot root.
[ OK ] Stopped target Local File Systems.
         Unmounting /boot/efi...
         Unmounting /run/credentials/systemd-sysusers.service...
[ OK ] Unmounted /boot/efi.
[ OK ] Unmounted /run/credentials/systemd-sysusers.service.
[ OK ] Reached target Unmount All Filesystems.
[ OK ] Stopped File System Check on /dev/disk/by-label/UEFI.
[ OK ] Removed slice Slice /system/systemd-fsck.
[ OK ] Stopped target Preparation for Local File Systems.
         Stopping Monitoring of LVMâ¦meventd or progress polling...
         Stopping Device-Mapper Multipath Device Controller...
[ OK ] Stopped Create Static Device Nodes in /dev.
[ OK ] Stopped Create System Users.
[ OK ] Stopped Device-Mapper Multipath Device Controller.
[ OK ] Stopped Remount Root and Kernel File Systems.
[ OK ] Stopped Monitoring of LVM2⦠dmeventd or progress polling.
[ OK ] Reached target System Shutdown.
[ OK ] Reached target Late Shutdown Services.
[ OK ] Finished System Reboot.
[ OK ] Reached target System Reboot.
[ 602.551100] reboot: Restarting system
[27]HELLO! BOOT0 is starting!
[30]BOOT0 commit : 20220228+g0ad88b
[33]set pll start
[35]periph0 has been enabled
[38]set pll end
[39]board init ok
[41]DRAM only have internal ZQ!!
[44]get_pmu_exist() = -1
[46]ddr_efuse_type: 0x0
[49][AUTO DEBUG] two rank and full DQ!
[53]ddr_efuse_type: 0x0
[56][AUTO DEBUG] rank 0 row = 15
[59][AUTO DEBUG] rank 0 bank = 8
[62][AUTO DEBUG] rank 0 page size = 2 KB
[66][AUTO DEBUG] rank 1 row = 15
[69][AUTO DEBUG] rank 1 bank = 8
[72][AUTO DEBUG] rank 1 page size = 2 KB
[75]rank1 config same as rank0
[78]DRAM BOOT DRIVE INFO: V0.24
[81]DRAM CLK = 792 MHz
[83]DRAM Type = 3 (2:DDR2,3:DDR3)
[86]DRAMC ZQ value: 0x7b7bfb
[89]DRAM ODT value: 0x42.
[91]ddr_efuse_type: 0x0
[94]DRAM SIZE =1024 M
[98]DRAM simple test OK.
[100]dram size =1024
[102]card no is 0
[104]sdcard 0 line count 4
[106][mmc]: mmc driver ver 2021-04-2 16:45
[115][mmc]: Wrong media type 0x0
[118][mmc]: ***Try SD card 0***
[127][mmc]: HSSDR52/SDR25 4 bit
[130][mmc]: 50000000 Hz
[132][mmc]: 59392 MB
[134][mmc]: ***SD/MMC 0 init OK!!!***
[181]Loading boot-pkg Succeed(index=1).
[185]Entry_name = opensbi
[188]Entry_name = dtb
[190]Entry_name = u-boot
[194]Adding DRAM info to DTB.
[199]Jump to second Boot.

OpenSBI v1.0
   ____ _____ ____ _____
  / __ \ / ____| _ \_ _|
 | | | |_ __ ___ _ __ | (___ | |_) || |
 | | | | '_ \ / _ \ '_ \ \___ \| _ < | |
 | |__| | |_) | __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name : Allwinner D1 Nezha
Platform Features : medeleg
Platform HART Count : 1
Platform IPI Device : aclint-mswi
Platform Timer Device : aclint-mtimer @ 24000000Hz
Platform Console Device : uart8250
Platform HSM Device : ---
Platform Reboot Device : sunxi-wdt-reset
Platform Shutdown Device : ---
Firmware Base : 0x40000000
Firmware Size : 252 KB
Runtime SBI Version : 0.3

Domain0 Name : root
Domain0 Boot HART : 0
Domain0 HARTs : 0*
Domain0 Region00 : 0x0000000014008000-0x000000001400bfff (I)
Domain0 Region01 : 0x0000000014000000-0x0000000014007fff (I)
Domain0 Region02 : 0x0000000040000000-0x000000004003ffff ()
Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff (R,W,X)
Domain0 Next Address : 0x000000004a000000
Domain0 Next Arg1 : 0x0000000044000000
Domain0 Next Mode : S-mode
Domain0 SysReset : yes

Boot HART ID : 0
Boot HART Domain : root
Boot HART ISA : rv64imafdcvsux
Boot HART Features : scounteren,mcounteren,mcountinhibit,time
Boot HART PMP Count : 16
Boot HART PMP Granularity : 2048
Boot HART PMP Address Bits: 38
Boot HART MHPM Count : 0
Boot HART MIDELEG : 0x0000000000000222
Boot HART MEDELEG : 0x000000000000b109

U-Boot 2022.04 (Jun 16 2022 - 10:37:21 +0000)

CPU: rv64imafdc
Model: Allwinner D1 Nezha
DRAM: 1 GiB
sunxi_set_gate: (CLK#24) unhandled
Core: 56 devices, 20 uclasses, devicetree: board
MMC: mmc@4020000: 0, mmc@4021000: 1
Loading Environment from nowhere... OK
In: serial@2500000
Out: serial@2500000
Err: serial@2500000
Net:
Warning: ethernet@4500000 (eth0) using random MAC address - 4a:e4:a4:d3:38:6e
eth0: ethernet@4500000
Hit any key to stop autoboot: 0
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
Scanning disk mmc@4020000.blk...
Scanning disk mmc@4021000.blk...
Disk mmc@4021000.blk not ready
Found 7 disks
** Unable to read file ubootefi.var **
Failed to load EFI variables
BootOrder not defined
EFI boot manager: Cannot load any image
Scanning mmc 0:f...
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
BootOrder not defined
EFI boot manager: Cannot load any image
Found EFI removable media binary efi/boot/bootriscv64.efi
Invalid FAT entry
** Unable to read file efi/boot/bootriscv64.efi **
Failed to load 'efi/boot/bootriscv64.efi'
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
No UEFI binary known at 0x40080000
EFI LOAD FAILED: continuing...
starting USB...
Bus usb@4101000: USB EHCI 1.00
Bus usb@4101400: USB OHCI 1.0
Bus usb@4200000: USB EHCI 1.00
Bus usb@4200400: USB OHCI 1.0
scanning bus usb@4101000 for devices... 1 USB Device(s) found
scanning bus usb@4101400 for devices... 1 USB Device(s) found
scanning bus usb@4200000 for devices... 1 USB Device(s) found
scanning bus usb@4200400 for devices... 1 USB Device(s) found
       scanning usb for storage devices... 0 Storage Device(s) found

Device 0: unknown device
BOOTP broadcast 1
BOOTP broadcast 2
BOOTP broadcast 3
BOOTP broadcast 4
DHCP client bound to address 10.80.0.219 (3022 ms)
*** Warning: no boot file name; using '0A5000DB.img'
Using ethernet@4500000 device
TFTP from server 10.80.1.254; our IP address is 10.80.0.219
Filename '0A5000DB.img'.
Load address: 0x4a000000
Loading: T T T T T T T

Revision history for this message
Heinrich Schuchardt (xypron) wrote (last edit ):

Thanks for reporting the issue. I will look into it once back home from a business trip.

Best regards

Heinrich

Changed in grub (Ubuntu):
assignee: nobody → Heinrich Schuchardt (xypron)
Revision history for this message
Heinrich Schuchardt (xypron) wrote :

Hello Denis,

I could not find any ubuntu-22.04.1-preinstalled-server-riscv64+nezha.img image on our download pages anymore. https://cdimage.ubuntu.com/releases/22.04.1/release/ only has 22.04.2 images.
Here the installed GRUB version is 2.06-2ubuntu7.1.

"Invalid FAT entry" is a message written by U-Boot. Writing valid FAT entries is under control of the kernel and not the GRUB package. The FAT file system is not log based and can easily be corrupted, e.g. by switching of a device while the kernel has not yet written its write cache to disk.

>> both FAT filesystems on the card become damaged after the upgrade

On the preinstalled image the ESP /dev/mmcblk0p15 and the cloud-init seed /dev/mmcblk0p12 use a FAT file system. /dev/mmcblk0p12 is not written to in the update process. If that partition is corrupted, I would assume a defective SD card.

Best regards

Heinrich

Changed in grub (Ubuntu):
status: New → Incomplete
Revision history for this message
Heinrich Schuchardt (xypron) wrote :

Hello denis

https://old-releases.ubuntu.com/releases/22.04.1/ubuntu-22.04.1-preinstalled-server-riscv64+nezha.img.xz has the old image.

I followed your instructions:

sudo apt-get update
sudo apt-get install grub-efi-riscv64 grub-efi-riscv64-bin
# Unpacking grub-efi-riscv64 (2.06-2ubuntu7.1) over (2.06-2ubuntu7)
# Unpacking grub-efi-riscv64-bin (2.06-2ubuntu7.1) over (2.06-2ubuntu7)
sudo reboot

My Allwinner Nezha D1 just boots fine afterwards.

Best regards

Heinrich

Changed in grub (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Denis Ovsienko (dovsienko) wrote :

Thank you for the comments. I understand this bug reproduces on one board, but not the other. I spent a day recovering from the fault and minimizing the steps to reproduce using two working different SD cards. In the process the problem reproduced several times as described.

If you wish to debug this [existing] problem, you can have remote access to the board (SSH and/or serial) and/or a copy of the card image just before the fault.

Revision history for this message
Denis Ovsienko (dovsienko) wrote :
Download full text (12.2 KiB)

For posterity, the problem still stands as of now (the latest Ubuntu 22.04 and all latest packages). It in fact exists and in fact steadily reproduces. It is no longer convenient for me to reproduce it in person.

1. Remove the previously activated safety guard.

# dpkg --set-selections <<ENDOFTEXT
> grub-common install
> grub-efi-riscv64-bin install
> grub-efi-riscv64 install
> grub2-common install
> u-boot-nezha install
> ENDOFTEXT

2. Let the broken updates install.

# apt-get update && apt-get dist-upgrade
Hit:1 http://ports.ubuntu.com/ubuntu-ports jammy InRelease
Get:2 http://ports.ubuntu.com/ubuntu-ports jammy-updates InRelease [119 kB]
Get:3 http://ports.ubuntu.com/ubuntu-ports jammy-backports InRelease [109 kB]
Get:4 http://ports.ubuntu.com/ubuntu-ports jammy-security InRelease [110 kB]
Get:5 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main riscv64 Packages [619 kB]
Get:6 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main Translation-en [245 kB]
Get:7 http://ports.ubuntu.com/ubuntu-ports jammy-updates/restricted Translation-en [179 kB]
Fetched 1380 kB in 16s (88.2 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  grub-common grub-efi-riscv64 grub-efi-riscv64-bin grub2-common
  linux-libc-dev u-boot-nezha
6 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 4567 kB/5871 kB of archives.
After this operation, 55.3 kB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main riscv64 grub2-common riscv64 2.06-2ubuntu7.2 [674 kB]
Get:2 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main riscv64 grub-common riscv64 2.06-2ubuntu7.2 [2090 kB]
Get:3 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main riscv64 grub-efi-riscv64 riscv64 2.06-2ubuntu7.2 [57.4 kB]
Get:4 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main riscv64 grub-efi-riscv64-bin riscv64 2.06-2ubuntu7.2 [1078 kB]
Get:5 http://ports.ubuntu.com/ubuntu-ports jammy-updates/universe riscv64 u-boot-nezha all 2022.04+git20220405.7446a472-0ubuntu0.3 [667 kB]
Fetched 4567 kB in 2s (2563 kB/s)
Preconfiguring packages ...
(Reading database ... 123067 files and directories currently installed.)
Preparing to unpack .../0-grub2-common_2.06-2ubuntu7.2_riscv64.deb ...
Unpacking grub2-common (2.06-2ubuntu7.2) over (2.06-2ubuntu7.1) ...
Preparing to unpack .../1-grub-common_2.06-2ubuntu7.2_riscv64.deb ...
Unpacking grub-common (2.06-2ubuntu7.2) over (2.06-2ubuntu7.1) ...
Preparing to unpack .../2-grub-efi-riscv64_2.06-2ubuntu7.2_riscv64.deb ...
Unpacking grub-efi-riscv64 (2.06-2ubuntu7.2) over (2.06-2ubuntu7.1) ...
Preparing to unpack .../3-grub-efi-riscv64-bin_2.06-2ubuntu7.2_riscv64.deb ...
Unpacking grub-efi-riscv64-bin (2.06-2ubuntu7.2) over (2.06-2ubuntu7.1) ...
Preparing to unpack .../4-linux-libc-dev_5.15.0-88.98_riscv64.deb ...
Unpacking linux-libc-dev:riscv64 (5.15.0-88.98) over (5.15.0-87.97) ...
Preparing to unpack .../5-u-boot-nezha_2022.04+git20220405.7446a472-0ubuntu0...

Steve Langasek (vorlon)
affects: grub (Ubuntu) → grub2 (Ubuntu)
Revision history for this message
Julian Andres Klode (juliank) wrote :

In either case, this is not a bug in grub. grub is not involved in the file system here. There could be a kernel bug, for example, that it powers of the SD card or some bus its on before the file system is written back.

But in my very limited experience with SD cards, some just fail. My new 128 GB SD card lasted 2 days or so on my Raspberry Pi until it entered read-only mode whereas the existing one is still fine after years. Make sure to try different size and vendor SD cards, and possibly different power supplies if the board allows that.

I don't know which kernel is installed as there is no data attached and I don't know the boards, so I can't reassign but either way, if Heinrich can't reproduce it, there's little chance of finding a fix.

Revision history for this message
Denis Ovsienko (dovsienko) wrote :
Download full text (8.2 KiB)

Thank you for the comments. The problem is not specific to the SD card or the power supply, it is specific to the botched bootloader update on this board, as demonstrated earlier and just recently. The board has been running fine since May with regular package updates, none of which caused a problem because the bootloader packages were set on hold, as described earlier. I decided to see whether the bug is fixed, removed the hold and allowed APT to upgrade the bootloader packages, which broke the bootloader, as described. The correlation is straightforward: bootloader update => reboot => bootloader fault.

Specifically, this looks the most likely cause of failure in the above dpkg log, which included at least two invocations of dd (the bug tracker is not displaying the complete comment for reasons I do not immediately understand). For this particular hardware dd writes one of the bootloaders (in this case U-boot) into a fixed offset on the SD card, as documented in detail here: https://fedoraproject.org/wiki/Architectures/RISC-V/Allwinner

I suspect the bug has to do wither with the U-boot package or the one that breaks the FAT filesystem, from which U-boot chain-loads GRUB. In case the bug tracker has quietly discarded a part of my previous comment, below is the serial console log again:

=> reset
resetting ...
[30]HELLO! BOOT0 is starting!
[32]BOOT0 commit : 20220228+g0ad88b
[36]set pll start
[37]periph0 has been enabled
[40]set pll end
[42]board init ok
[44]DRAM only have internal ZQ!!
[47]get_pmu_exist() = -1
[49]ddr_efuse_type: 0x0
[52][AUTO DEBUG] two rank and full DQ!
[55]ddr_efuse_type: 0x0
[58][AUTO DEBUG] rank 0 row = 15
[61][AUTO DEBUG] rank 0 bank = 8
[64][AUTO DEBUG] rank 0 page size = 2 KB
[68][AUTO DEBUG] rank 1 row = 15
[71][AUTO DEBUG] rank 1 bank = 8
[74][AUTO DEBUG] rank 1 page size = 2 KB
[78]rank1 config same as rank0
[81]DRAM BOOT DRIVE INFO: V0.24
[84]DRAM CLK = 792 MHz
[86]DRAM Type = 3 (2:DDR2,3:DDR3)
[89]DRAMC ZQ value: 0x7b7bfb
[92]DRAM ODT value: 0x42.
[94]ddr_efuse_type: 0x0
[97]DRAM SIZE =1024 M
[100]DRAM simple test OK.
[103]dram size =1024
[105]card no is 0
[106]sdcard 0 line count 4
[109][mmc]: mmc driver ver 2021-04-2 16:45
[118][mmc]: Wrong media type 0x0
[121][mmc]: ***Try SD card 0***
[130][mmc]: HSSDR52/SDR25 4 bit
[133][mmc]: 50000000 Hz
[135][mmc]: 118900 MB
[137][mmc]: ***SD/MMC 0 init OK!!!***
[185]Loading boot-pkg Succeed(index=1).
[189]Entry_name = opensbi
[192]Entry_name = dtb
[194]Entry_name = u-boot
[198]Adding DRAM info to DTB.
[203]Jump to second Boot.

OpenSBI v1.3
   ____ _____ ____ _____
  / __ \ / ____| _ \_ _|
 | | | |_ __ ___ _ __ | (___ | |_) || |
 | | | | '_ \ / _ \ '_ \ \___ \| _ < | |
 | |__| | |_) | __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|___/_____|
        | |
        |_|

Platform Name : Allwinner D1 Nezha
Platform Features : medeleg
Platform HART Count : 1
Platform IPI Device : aclint-mswi
Platform Timer Device : aclint-mtimer @ 24000000Hz
Platform Console Device : uart8250
Platform HSM Device : sun20i-d1-ppu
Platform PMU Device :
Platform Reboot ...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.