Cannot PXE boot arch 0f due to protocol mismatch

Bug #1838943 reported by Garagoth
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Expired
Medium
Unassigned

Bug Description

Hi.

I am having trouble commissioning HP Synergy blades, gen 10.
Boot mode: UEFI.

Scenario goes this way:
1. blade asks for DHCP, sending arch 16 (00:0f), gets response with IP and filename "http://10.1.28.240:5248/bootx64.efi";
2. blade gets the file (which is grub), and asks for "http://10.1.28.240:5248/grub/grub.cfg" from maas, which it receives.
3. grub.cfg contains "configfile (pxe)/grub/grub.cfg-default-amd64" and my blade tries to get that file, from proper maas IP and proper port, but using UDP instead of TCP+HTTP
4. Since MAAS does not listen on UDP, grub on blade times out and drops to command line.
5. Now I can manually enter "configfile https://10.1.28.240:5248/grub/grub.cfg-default-amd64" and boot process continues just fine.

Only relevant lines I found in rackd.log are:
2019-08-02 15:32:47 provisioningserver.rackdservices.http: [info] bootx64.efi requested by 10.1.28.254
2019-08-02 15:32:47 provisioningserver.rackdservices.http: [info] grubx64.efi requested by 10.1.28.254
2019-08-02 15:32:48 provisioningserver.rackdservices.http: [info] grub/x86_64-efi/command.lst requested by 10.1.28.254
2019-08-02 15:32:48 provisioningserver.rackdservices.http: [info] grub/x86_64-efi/fs.lst requested by 10.1.28.254
2019-08-02 15:32:48 provisioningserver.rackdservices.http: [info] grub/x86_64-efi/crypto.lst requested by 10.1.28.254
2019-08-02 15:32:48 provisioningserver.rackdservices.http: [info] grub/x86_64-efi/terminal.lst requested by 10.1.28.254
2019-08-02 15:32:48 provisioningserver.rackdservices.http: [info] grub/grub.cfg requested by 10.1.28.254

... and it stops here, only trace of activity is on tcpdump where blade asks on UDP port.

Regards,
Marcin Pikulski.

dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-======================================-============-=================================================
ii maas 2.6.0-7802-g59416a869-0ubuntu1~18.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.6.0-7802-g59416a869-0ubuntu1~18.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.6.0-7802-g59416a869-0ubuntu1~18.04.1 all MAAS server common files
ii maas-dhcp 2.6.0-7802-g59416a869-0ubuntu1~18.04.1 all MAAS DHCP server
un maas-dns <none> <none> (no description available)
ii maas-proxy 2.6.0-7802-g59416a869-0ubuntu1~18.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.6.0-7802-g59416a869-0ubuntu1~18.04.1 all Rack Controller for MAAS
ii maas-region-api 2.6.0-7802-g59416a869-0ubuntu1~18.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.6.0-7802-g59416a869-0ubuntu1~18.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.6.0-7802-g59416a869-0ubuntu1~18.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.6.0-7802-g59416a869-0ubuntu1~18.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.6.0-7802-g59416a869-0ubuntu1~18.04.1 all MAAS server provisioning libraries (Python 3)

Revision history for this message
Garagoth (garagoth) wrote :
Revision history for this message
Garagoth (garagoth) wrote :

Changing blade bios setting and disabling http support for pxe causes dhcp arch to change from 0f to 07 and whole process works fine, just a bit slower, but it is possible to boot, commission and deploy a blade this way.

Changed in maas:
assignee: nobody → Blake Rouse (blake-rouse)
Revision history for this message
Blake Rouse (blake-rouse) wrote :

Seems like a firmware issue for it to ask using UDP. This path uses the EFI subsystem to perform network functions and that seems to be correct. Do other types of machines have the same issue?

Changed in maas:
assignee: Blake Rouse (blake-rouse) → nobody
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
Eric Desrochers (slashd)
Changed in maas:
status: Expired → Confirmed
Revision history for this message
Eric Desrochers (slashd) wrote :

It has been brought to my attention by a user having a similar situation happening on their side

They observed that when restarting a server deployed with MAAS(2.4.2), the boot fails the second time.

#tcpdump revealed:
TFTP 100 Read Request, File: /grub/x86_64-efi/command.lst, Transfer type: octet, blksize=1024, tsize=0
TFTP 61 Error Code, Code: File not found, Message: File not found

TFTP 95 Read Request, File: /grub/x86_64-efi/fs.lst, Transfer type: octet, blksize=1024, tsize=0
TFTP 61 Error Code, Code: File not found, Message: File not found

TFTP 99 Read Request, File: /grub/x86_64-efi/crypto.lst, Transfer type: octet, blksize=1024, tsize=0
TFTP 61 Error Code, Code: File not found, Message: File not found

TFTP 101 Read Request, File: /grub/x86_64-efi/terminal.lst, Transfer type: octet, blksize=1024, tsize=0
TFTP 61 Error Code, Code: File not found, Message: File not found

TFTP 86 Read Request, File: /grub/grub.cfg, Transfer type: octet, blksize=1024, tsize=0

TFTP 104 Read Request, File: /grub/grub.cfg-XX:XX:XX:XX:XX:XX, Transfer type: octet, blksize=1024, tsize=0

TFTP 95 Read Request, File: /efi/ubuntu/grubx64.efi, Transfer type: octet, blksize=1024, tsize=0
 TFTP 61 Error Code, Code: File not found, Message: File not found

Revision history for this message
Eric Desrochers (slashd) wrote :

I have suggested the user to give it a try with MAAS 2.6

Changed in maas:
status: Confirmed → Triaged
importance: Undecided → Medium
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Is this still reproducible on MAAS 3.2 or later? The way of handling PXE booting has changed significantly since this issue was submitted.

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.