Activity log for bug #1922782

Date Who What changed Old value New value Message
2021-04-06 18:22:00 Antony Messerli bug added bug
2021-04-06 18:23:20 Antony Messerli description Environment: MAAS version (SNAP): 2.9/stable: 2.9.2-9164-g.ac176b5c4 2021-02-17 (11851) 150MB Grub package_version=2.04-1ubuntu26.9 from ephermeral-v3 maas images Servers Dell R7525 configured in UEFI mode with both: Broadcom Gigabit Ethernet BCM5720 Problem description: On commissioning of a new node, the server retrieves bootx64.efi from the MAAS server, loads grubx64.efi and then hangs at Booting under MAAS direction... Because it's getting this far, it's loaded Grub and a configuration at this point. I increased the timeout from 0 to 10 in the MAAS code so that I could crack into Grub to debug. Configuration is getting retrieved from MAAS server so I edited the configuration to do a debug=all and loaded the configuration. Logs show that its attempting to load the kernel and initrd but fails when it was previously able to contact the MAAS server via PXE (sample of kernel load): kern/disk.c:196: Opening 'http,10.127.88.10:5248'... disk/efi/efidisk.c:482: opening http kern/disk.c:281 Opening 'http,10.127.88.10:5248' failed. kern/disk.c:295 Closing 'http'. net/http.c:405: opening path /images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel on host 10.127.88.10 TCP port 5248 commands/verifiers.c:88: file: (http,10.127.88.10:5248)/images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel type:3 .... last debug ends on: loader/efi/linux.c:96: kernel_addr: 0x10000000 handover_offset: 0x190 params: 0x3d6e1000 If I switch to Intel NICs in the server, this issue does not occur. We are wondering if it may be BCM5720 and PCI-e Gen 4 related as we have the BCM5720 NICs in Dell R720s with PCI-e Gen 3 and they can commission properly. I have seen mention of some newer versions of Grub that may solve some HTTP boot issues, but they have not made their way into MAAS yet. If there are good ways to build those bootloaders that would align to how MAAS builds them for their images and test them, I can try and test them in my environment to see if they resolve the issue. Environment: MAAS version (SNAP): 2.9/stable: 2.9.2-9164-g.ac176b5c4 2021-02-17 (11851) 150MB Grub package_version=2.04-1ubuntu26.9 from ephermeral-v3 maas images Servers Dell R7525 configured in UEFI mode with both: Broadcom Gigabit Ethernet BCM5720 Broadcom Adv. Dual 10GBASE-T Ethernet Problem description: On commissioning of a new node, the server retrieves bootx64.efi from the MAAS server, loads grubx64.efi and then hangs at Booting under MAAS direction... Because it's getting this far, it's loaded Grub and a configuration at this point. I increased the timeout from 0 to 10 in the MAAS code so that I could crack into Grub to debug. Configuration is getting retrieved from MAAS server so I edited the configuration to do a debug=all and loaded the configuration. Logs show that its attempting to load the kernel and initrd but fails when it was previously able to contact the MAAS server via PXE (sample of kernel load): kern/disk.c:196: Opening 'http,10.127.88.10:5248'... disk/efi/efidisk.c:482: opening http kern/disk.c:281 Opening 'http,10.127.88.10:5248' failed. kern/disk.c:295 Closing 'http'. net/http.c:405: opening path /images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel on host 10.127.88.10 TCP port 5248 commands/verifiers.c:88: file: (http,10.127.88.10:5248)/images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel type:3 .... last debug ends on: loader/efi/linux.c:96: kernel_addr: 0x10000000 handover_offset: 0x190 params: 0x3d6e1000 If I switch to Intel NICs in the server, this issue does not occur. We are wondering if it may be BCM5720 and PCI-e Gen 4 related as we have the BCM5720 NICs in Dell R720s with PCI-e Gen 3 and they can commission properly. I have seen mention of some newer versions of Grub that may solve some HTTP boot issues, but they have not made their way into MAAS yet. If there are good ways to build those bootloaders that would align to how MAAS builds them for their images and test them, I can try and test them in my environment to see if they resolve the issue.
2021-04-06 18:38:06 Antony Messerli description Environment: MAAS version (SNAP): 2.9/stable: 2.9.2-9164-g.ac176b5c4 2021-02-17 (11851) 150MB Grub package_version=2.04-1ubuntu26.9 from ephermeral-v3 maas images Servers Dell R7525 configured in UEFI mode with both: Broadcom Gigabit Ethernet BCM5720 Broadcom Adv. Dual 10GBASE-T Ethernet Problem description: On commissioning of a new node, the server retrieves bootx64.efi from the MAAS server, loads grubx64.efi and then hangs at Booting under MAAS direction... Because it's getting this far, it's loaded Grub and a configuration at this point. I increased the timeout from 0 to 10 in the MAAS code so that I could crack into Grub to debug. Configuration is getting retrieved from MAAS server so I edited the configuration to do a debug=all and loaded the configuration. Logs show that its attempting to load the kernel and initrd but fails when it was previously able to contact the MAAS server via PXE (sample of kernel load): kern/disk.c:196: Opening 'http,10.127.88.10:5248'... disk/efi/efidisk.c:482: opening http kern/disk.c:281 Opening 'http,10.127.88.10:5248' failed. kern/disk.c:295 Closing 'http'. net/http.c:405: opening path /images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel on host 10.127.88.10 TCP port 5248 commands/verifiers.c:88: file: (http,10.127.88.10:5248)/images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel type:3 .... last debug ends on: loader/efi/linux.c:96: kernel_addr: 0x10000000 handover_offset: 0x190 params: 0x3d6e1000 If I switch to Intel NICs in the server, this issue does not occur. We are wondering if it may be BCM5720 and PCI-e Gen 4 related as we have the BCM5720 NICs in Dell R720s with PCI-e Gen 3 and they can commission properly. I have seen mention of some newer versions of Grub that may solve some HTTP boot issues, but they have not made their way into MAAS yet. If there are good ways to build those bootloaders that would align to how MAAS builds them for their images and test them, I can try and test them in my environment to see if they resolve the issue. Environment: MAAS version (SNAP): 2.9/stable: 2.9.2-9164-g.ac176b5c4 2021-02-17 (11851) 150MB Grub package_version=2.04-1ubuntu26.9 from ephermeral-v3 maas images Servers Dell R7525 configured in UEFI mode with both: Broadcom Gigabit Ethernet BCM5720 Broadcom Adv. Dual 10GBASE-T Ethernet BCM57416 Problem description: On commissioning of a new node, the server retrieves bootx64.efi from the MAAS server, loads grubx64.efi and then hangs at Booting under MAAS direction... Because it's getting this far, it's loaded Grub and a configuration at this point. I increased the timeout from 0 to 10 in the MAAS code so that I could crack into Grub to debug. Configuration is getting retrieved from MAAS server so I edited the configuration to do a debug=all and loaded the configuration. Logs show that its attempting to load the kernel and initrd but fails when it was previously able to contact the MAAS server via PXE (sample of kernel load): kern/disk.c:196: Opening 'http,10.127.88.10:5248'... disk/efi/efidisk.c:482: opening http kern/disk.c:281 Opening 'http,10.127.88.10:5248' failed. kern/disk.c:295 Closing 'http'. net/http.c:405: opening path /images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel on host 10.127.88.10 TCP port 5248 commands/verifiers.c:88: file: (http,10.127.88.10:5248)/images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel type:3 .... last debug ends on: loader/efi/linux.c:96: kernel_addr: 0x10000000 handover_offset: 0x190 params: 0x3d6e1000 If I switch to Intel NICs in the server, this issue does not occur. We are wondering if it may be BCM5720 and PCI-e Gen 4 related as we have the BCM5720 NICs in Dell R720s with PCI-e Gen 3 and they can commission properly. I have seen mention of some newer versions of Grub that may solve some HTTP boot issues, but they have not made their way into MAAS yet. If there are good ways to build those bootloaders that would align to how MAAS builds them for their images and test them, I can try and test them in my environment to see if they resolve the issue.