Unable to pxeboot machines (2.6.0-7802 and 2.8.2-8577)

Bug #1836089 reported by Phil Merricks
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
MAAS
Expired
High
Unassigned

Bug Description

I am unable to boot machines in 2.6.0-7802 that I was previously able to pxeboot - verified by pxeboot attempt using 2.4.2-7034. This is a serious regression for me and prevents buildout of my environment, as it depends on Virsh Pod deployment during commission/deploy.

Added some info in the comment below for output. Note my testing is with 2 LXD deployed MAAS (all config) containers, 1 of each version.

Revision history for this message
Phil Merricks (seffyroff) wrote :

On IRC Roaksoax suggested this may be due to an issue in Syslinux. I'm not easily able to determine the version of Syslinux my MAAS installs are using. But assuming they're using the version that is installed by syslinux-common or pxelinux packages they are the same version - 6.03. I tried netbooting and grabbed a screencap from both 2.4.2 and 2.6.0 on the same network alternately with the same machine. Here's the results:

2.4.2: https://imgur.com/dv2ZSLs

2.6.0: https://imgur.com/rrPSMKi

description: updated
Revision history for this message
Phil Merricks (seffyroff) wrote :

rackd.log from 2.4.2 during pxeboot:

2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] pxelinux.0 requested by 00:1b:21:63:c3:97
2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] pxelinux.0 requested by 00:1b:21:63:c3:97
2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] ldlinux.c32 requested by 00:1b:21:63:c3:97
2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/44454c4c-3300-104a-8058-c4c04f5a4631 requested by 00:1b:21:63:c3:97
2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/01-00-1b-21-63-c3-97 requested by 00:1b:21:63:c3:97
2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A000AC1 requested by 00:1b:21:63:c3:97
2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A000AC requested by 00:1b:21:63:c3:97
2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A000A requested by 00:1b:21:63:c3:97
2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A000 requested by 00:1b:21:63:c3:97
2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A00 requested by 00:1b:21:63:c3:97
2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A0 requested by 00:1b:21:63:c3:97
2019-07-11 14:41:02 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0A requested by 00:1b:21:63:c3:97
2019-07-11 14:41:03 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/0 requested by 00:1b:21:63:c3:97
2019-07-11 14:41:03 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/default requested by 00:1b:21:63:c3:97
2019-07-11 14:41:03 provisioningserver.rackdservices.tftp: [info] ubuntu/amd64/hwe-18.04/bionic/daily/boot-kernel requested by 00:1b:21:63:c3:97
2019-07-11 14:41:05 provisioningserver.rackdservices.tftp: [info] ubuntu/amd64/hwe-18.04/bionic/daily/boot-initrd requested by 00:1b:21:63:c3:97

Revision history for this message
Phil Merricks (seffyroff) wrote :

rackd.log from 2.6.0 during pxeboot:

2019-07-11 14:45:58 provisioningserver.rackdservices.tftp: [info] lpxelinux.0 requested by 10.0.10.193
2019-07-11 14:45:58 provisioningserver.rackdservices.tftp: [info] lpxelinux.0 requested by 10.0.10.193

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Are you still having this issue? Where you able to diagnose more what the issue was?

Did you ensure that the machine can communicate with the rack controller over HTTP? 2.6 switched to using HTTP to server the boot files.

Changed in maas:
status: New → Incomplete
Revision history for this message
Phil Merricks (seffyroff) wrote :

I haven't attempted it on the latest version, but I never found resolution to this, and ultimately have remained on an older version of MAAS (2.4.x).

I'll give it another go this week and post my results here.

Revision history for this message
Alex Zero (citadelcore) wrote :

This happens for me as well on a BL460c Gen8 with MAAS 2.5.x and 2.6.x. No issue with 2.4.x.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
Revision history for this message
Francesco Santagata (phrancesco) wrote :

Hi,

i'm facing the same problem with maas 2.6 and maas 2.8.2 with the old bl460G6.

I'm really scared of the fact that we have tons of G6, a loads of G7 and G8.

looks like the bug is affecting 3 gen in HPE blade and their build in NIC card, for us this means getting close to 100 servers, this looks like it will compromise MAAS adoption in our company.

Does anyone has a solution ?

The dhcp snippet doesn't work for me.

BR

Revision history for this message
Ian Marsh (drulgaard) wrote :

Hi,

Also experiencing this issue. Our lab has a mixture of HP kit, a HP DL370 G6 with a NC365T quad 1G NIC works fine, but two identical HP DL370 G6 with NC375i quad 1G NICs do not, even with one of them upgraded to the 'latest' firmware. It usually hangs after the downloading initrd message, but sometimes hangs earlier, after the downloading kernel one.

Tried a DHCP snippet to switch to pxelinux, but it seems that's a symlink to lpxelinux anyway.

Worked fine on an earlier MAAS (2.5.x, I think), so this is definitely a regression.

Changed in maas:
status: Expired → Confirmed
Revision history for this message
Ian Marsh (drulgaard) wrote :

Apologies, meant to include MAAS version: 2.8.2-8577

summary: - 2.6.0-7802 unable to pxeboot machines
+ Unable to pxeboot machines (2.6.0-7802 and 2.8.2-8577)
Changed in maas:
status: Confirmed → Invalid
status: Invalid → Triaged
importance: Undecided → High
Revision history for this message
Vlad A. (elf128) wrote :

The issue still happening in 2.9.1 (9153-g.66318f531)

Revision history for this message
Adam Collard (adam-collard) wrote :

In MAAS 3.0 we added support to disable certain boot architectures on a per-subnet basis.

See https://maas.io/docs/what-is-new-with-maas-3-0#heading--disabling-boot-methods for details.

Can you confirm that if you disable e.g. uefi_amd64_http then you can boot these machines again?

Changed in maas:
status: Triaged → Incomplete
Alberto Donato (ack)
Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.