MAAS fails to enlist HPE DL380 Gen10 in PXE-HTTP mode

Bug #1899581 reported by Michał Ajduk
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Alberto Donato

Bug Description

# Environment

MAAS version (SNAP):
maas 2.8.2-8577-g.a3e674063 8980 2.8/stable canonical✓ -

MAAS was cleanly installed. KVM POD setup works.

MAAS status:
bind9 RUNNING pid 9258, uptime 15:13:02
dhcpd RUNNING pid 26173, uptime 15:09:30
dhcpd6 STOPPED Not started
http RUNNING pid 19526, uptime 15:10:49
ntp RUNNING pid 27147, uptime 14:02:18
proxy RUNNING pid 25909, uptime 15:09:33
rackd RUNNING pid 7219, uptime 15:13:20
regiond RUNNING pid 7221, uptime 15:13:20
syslog RUNNING pid 19634, uptime 15:10:48

# Problem
Enlisting HPE DL380 Gen10 servers (latest firmware,U30 v2.34 (04/08/2020) ) fails. Node Fails to PXE boot as it is unable to obtain GRUB config via TFTP from MAAS. The node is correctly assigned IP address, however MAAS does not listen on UDP 5248 and node uses pxe location 10.216.240.1:5248 as obtained in DHCP ACK (Boot file name: http://10.216.240.1:5248/bootx64.efi).

The default GRUB config file is available via TFTP, but on port UDP 69:
root@inf1az1cz202904rz:~# curl tftp://10.216.240.1/grub/grub.cfg-default-amd64
* Trying 10.216.240.1...
* Connected to 10.216.240.1 () port 69 (#0)
set default="0"
set timeout=0

menuentry 'Commission' {
    echo 'Booting under MAAS direction...'
    linuxefi (http,10.216.240.1:5248)/images/ubuntu/amd64/ga-18.04/bionic/daily/boot-kernel nomodeset ro root=squash:http://10.216.240.1:5248/images/ubuntu/amd64/ga-18.04/bionic/daily/squashfs ip=::::maas-enlist:BOOTIF ip6=off overlayroot=tmpfs overlayroot_cfgdisk=disabled cc:\{'datasource_list': ['MAAS']\}end_cc cloud-config-url=http://10-216-240-0--23.maas-internal:5248/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed apparmor=0 log_host=10.216.240.1 log_port=5247 --- console=tty0 console=ttyS0,115200n8 nvme_core.multipath=0 BOOTIF=01-${net_default_mac}
    initrdefi (http,10.216.240.1:5248)/images/ubuntu/amd64/ga-18.04/bionic/daily/boot-initrd
}

The node tries however to obtain the grub.cfg via UDP 5248. It does so as the DHCP Boot file name location denotes 10.216.240.1:5248

# Analysis

1) Node obtains DHCP IP address from MAAS (10.216.240.1)
2778 2020-10-13 07:22:39,796888 10.216.240.1 255.255.255.255 DHCP 389 DHCP ACK - Transaction ID 0x328a9583
Your (client) IP address: 10.216.240.51
Next server IP address: 10.216.240.1
Boot file name: http://10.216.240.1:5248/bootx64.efi

2) Node obtains bootloader via HTTP
2825 2020-10-13 07:22:40,836910 10.216.240.51 10.216.240.1 HTTP 146 GET /bootx64.efi HTTP/1.1
3382 2020-10-13 07:22:40,857714 10.216.240.1 10.216.240.51 HTTP 2921 HTTP/1.1 200 OK (text/html)
3404 2020-10-13 07:22:40,865559 10.216.240.51 10.216.240.1 TCP 60 1995 → 5248 [RST] Seq=186 Win=65535 Len=0
3410 2020-10-13 07:22:40,910505 10.216.240.51 10.216.240.1 HTTP 152 GET //grubx64.efi HTTP/1.1
3897 2020-10-13 07:22:40,929345 10.216.240.1 10.216.240.51 HTTP 15018 HTTP/1.1 200 OK (text/html)
3919 2020-10-13 07:22:40,930069 10.216.240.51 10.216.240.1 TCP 60 1996 → 5248 [RST] Seq=99 Win=65535 Len=0

3) Node obtains GRUB config location via HTTP
3947 2020-10-13 07:22:41,401169 10.216.240.51 10.216.240.1 HTTP 154 GET /grub/x86_64-efi/command.lst HTTP/1.1
3951 2020-10-13 07:22:41,406033 10.216.240.1 10.216.240.51 HTTP 59 HTTP/1.1 404 Not Found (text/html) (text/html)
3959 2020-10-13 07:22:41,406285 10.216.240.51 10.216.240.1 HTTP 149 GET /grub/x86_64-efi/fs.lst HTTP/1.1
3963 2020-10-13 07:22:41,413525 10.216.240.1 10.216.240.51 HTTP 59 HTTP/1.1 404 Not Found (text/html) (text/html)
3971 2020-10-13 07:22:41,413750 10.216.240.51 10.216.240.1 HTTP 153 GET /grub/x86_64-efi/crypto.lst HTTP/1.1
3975 2020-10-13 07:22:41,415447 10.216.240.1 10.216.240.51 HTTP 59 HTTP/1.1 404 Not Found (text/html) (text/html)
3983 2020-10-13 07:22:41,415647 10.216.240.51 10.216.240.1 HTTP 155 GET /grub/x86_64-efi/terminal.lst HTTP/1.1
3987 2020-10-13 07:22:41,418531 10.216.240.1 10.216.240.51 HTTP 59 HTTP/1.1 404 Not Found (text/html) (text/html)
3995 2020-10-13 07:22:41,418890 10.216.240.51 10.216.240.1 HTTP 140 GET /grub/grub.cfg HTTP/1.1
3997 2020-10-13 07:22:41,421687 10.216.240.1 10.216.240.51 HTTP 481 HTTP/1.1 200 OK (text/html)
--- /grub/grub.cfg ---
# MAAS GRUB2 pre-loader configuration file

# Load based on MAC address first.
configfile (pxe)/grub/grub.cfg-${net_default_mac}

# Failed to load based on MAC address.
# Load amd64 by default, UEFI only supported by 64-bit
configfile (pxe)/grub/grub.cfg-default-amd64
--- /grub/grub.cfg ---

4) Node tries to obtain GRUB config via TFTP based on MAAS given location ( configfile (pxe)/grub/grub.cfg-${net_default_mac} )
3999 2020-10-13 07:22:41,439578 10.216.240.51 10.216.240.1 TFTP 104 Read Request, File: /grub/grub.cfg-d4:f5:ef:02:3d:e8, Transfer type: octet, blksize=1024, tsize=0
 Internet Protocol Version 4, Src: 10.216.240.51, Dst: 10.216.240.1
 User Datagram Protocol, Src Port: 25300, Dst Port: 5248
4000 2020-10-13 07:22:41,439613 10.216.240.1 10.216.240.51 ICMP 132 Destination unreachable (Port unreachable)
4847 2020-10-13 07:23:13,085446 10.216.240.51 10.216.240.1 TFTP 100 Read Request, File: /grub/grub.cfg-default-amd64, Transfer type: octet, blksize=1024, tsize=0
4848 2020-10-13 07:23:13,085496 10.216.240.1 10.216.240.51 ICMP 128 Destination unreachable (Port unreachable)

<DROPS to GRUB prompt>

In GRUB prompt:
grub> net_ls_addr
efinet d4:f5:ef:02:3d:e8 10.216.240.51
grub> configfile (http,10.216.240.1:5248)/grub/grub.cfg-default-amd64
<drops to the prompt again>
grub> configfile (tftp,10.216.240.1)/grub/grub.cfg-default-amd64
<starts bootup process>

Revision history for this message
Alberto Donato (ack) wrote :

This looks like a duplicate of https://bugs.launchpad.net/maas/+bug/1879012, where grub doesn't bring up networking when loaded over http.

Revision history for this message
Michał Ajduk (majduk) wrote :

This is not a duplicate. In https://bugs.launchpad.net/maas/+bug/1879012:
grub> net_ls_addr
grub>

In this case:
grub> net_ls_addr
efinet d4:f5:ef:02:3d:e8 10.216.240.51

Grub brings up networking correctly in this case.

Revision history for this message
Michał Ajduk (majduk) wrote :

The template for grub.cfg (http://10.216.240.1:5248//grub/grub.cfg) is located at:
/var/snap/maas/common/maas/boot-resources/current/grub/grub.cfg

Workaround for this bug is:
Explicitely state config file in grub.cfg:
--- /grub/grub.cfg ---
# MAAS GRUB2 pre-loader configuration file

# Load based on MAC address first.
configfile (tftp,10.216.240.1)/grub/grub.cfg-${net_default_mac}

# Failed to load based on MAC address.
# Load amd64 by default, UEFI only supported by 64-bit
configfile (tftp,10.216.240.1)/grub/grub.cfg-default-amd64
--- /grub/grub.cfg ---

Revision history for this message
Alberto Donato (ack) wrote :

Is net_default_server set in the grub env when it fails to boot?

Revision history for this message
Michał Ajduk (majduk) wrote :

grub> net_ls_addr
efinet3 d4:f5:ef:02:3d:e8 10.216.240
grub> net_default_server
error: can't find command `net_default_server`

It does not seem to have it

Revision history for this message
Alberto Donato (ack) wrote :

It's an env variable, running "set" would print them all.

Revision history for this message
Michał Ajduk (majduk) wrote :

net_default_server - set correectly

Revision history for this message
Alberto Donato (ack) wrote :

what happens if you manually run "configfile (pxe)/grub/grub.cfg" ?

That should be equivalent to using (tftp), and they both should use the net_default_server address when one is not specified.

The (http) case is different, as it would need the port set as well

Revision history for this message
Michał Ajduk (majduk) wrote :

I get dropped directly to back to grub prompt (after 1-2 seconds)

Changed in maas:
assignee: nobody → Alberto Donato (ack)
Revision history for this message
Michał Ajduk (majduk) wrote :

I've done 2 more tests - confirming the bug is constrained to HTTP boot.

1. Boot in PXE mode over HPE Ethernet 1Gb 4-port 366FLR Adapter - NIC (PXE IPv4). This works fine with:
configfile (pxe)/grub/grub.cfg-default-amd64

2. Boot in HTTP mode over HPE Ethernet 1Gb 4-port 366FLR Adapter - NIC (HTTP(S) IPv4). This does not work well as mentioned in the bug:

I can see:
2020-10-16 13:12:26 provisioningserver.rackdservices.http: [info] grub/grub.cfg requested by 10.216.240.37

Then the server drops to grub prompt.

Grub environment consult attached screenshot.

I can see that:
- there is no (pxe) variable
- net_default_server is set to 10.216.240.1:5248
- prefix is set to (http,to 10.216.240.1:5248)/grub
- pxe_default_server is set to 10.216.240.1:5248
- root is set to http,10.216.240.1:5248

That explains, why in this case, the server attempts to load (tftp,10.216.240.1:5248). It derives this from the net_default_server.

This does not have a chance to work as tftp is not listening on port 5248.

summary: - MAAS fails to enlist HPE DL380 Gen10
+ MAAS fails to enlist HPE DL380 Gen10 in PXE-HTTP mode
Alberto Donato (ack)
Changed in maas:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 2.9.0b6
Lee Trager (ltrager)
Changed in maas:
milestone: 2.9.0b6 → 2.9.0b7
Alberto Donato (ack)
Changed in maas:
milestone: 2.9.0b7 → 2.9.x
Revision history for this message
Alberto Donato (ack) wrote :

Marking this a fixed for 2.9 since 66d2a5c89 disabled HTTP boot, which has various issues.

Changed in maas:
milestone: 2.9.x → 2.9.0b7
status: Triaged → Fix Committed
Lee Trager (ltrager)
Changed in maas:
status: Fix Committed → Fix Released
no longer affects: maas/2.8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.