PXE doesn't working for HP DL380 with broadcom cards

Bug #1410280 reported by Roman Sokolkov
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Matthew Mosesohn
5.0.x
Won't Fix
High
Matthew Mosesohn
5.1.x
Won't Fix
High
Matthew Mosesohn
6.0.x
Won't Fix
High
Matthew Mosesohn
6.1.x
Fix Committed
High
Matthew Mosesohn

Bug Description

Description:
HP DL380 unable to boot by PXE. DHCP are fine.
But node uses internal docker IP as TFTP server (i.e. 172.17.0.2) which is of course unreachable.

Environment:
- MOS 5.1.1
- HP proliant DL380p Gen8 (System ROM P70 02/10/2014)
- NICs: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe (Broadcom UNDI PXE-2.1 v16.4.1)

Steps to reproduce:
- Enable PXE boot on Broadcom card
- Try to boot by PXE

Expected result:
- Node will boot into bootstrap image

Actual result:
- PXE boot fails on TFTP. Please see screenshot.

Workaround:
As i understand these broadcom cards operate as GPXE.
Remove "pxe-service" option with gpxe from /etc/cobbler/dnsmasq.template
and restart dnsmasq "cobbler sync"

Details:
In Fuel dnsmasq configured with special options in case of NO GPXE.
dhcp-match=gpxe,175
pxe-service=net:#gpxe,x86PC,"Install",pxelinux,10.30.0.2
But in case of GPXE some default behaviour is used.
For some reason PXE ROM gets Server-ID (Option 54): 172.17.0.2 as TFTP server that is internal docker container address.

DHCP offer from dnsmasq (tcpdump): http://paste.openstack.org/show/157037/
Screenshot of PXE booting: http://s009.radikal.ru/i309/1501/5e/3901d8f80c2e.png

description: updated
description: updated
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Looks like as a major issue with docker configuration, raising to high

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Matthew, could you please take a look into this issue? There is a w/a provided by Roman

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Roman, do you know of any internal hardware available to test a fix?

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

The most obvious fix seems to be to replace pxe-service with the following two lines:
pxe-service=net:!gpxe,x86PC,"Install",pxelinux,10.20.0.3 <- replace # to !
pxe-service=net:gpxe,x86PC,"Install",gpxelinux.0,10.20.0.3 <- add gpxelinux.0 image

And it would be necessary to add gpxelinux.0 to tftproot.

But we could short circuit this possibly on the dhcrelay side by rewriting option 54 (Server IP field).

I should point out that in Fuel 6.1 we will move to host networking and remove all the Docker NAT issues and this issue should correct itself. For 6.0.x and 5.1.x we will probably need a solution like above.

Andrey Nikitin (heos)
no longer affects: fuel/5.0.x
Revision history for this message
Roman Sokolkov (rsokolkov) wrote :

Matt, i've never met such internal hardware, ask IT.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Roman, I've got some hosts for testing from Fuel Devops team. It is Dell hardware, but has the same Broadcom hardware. I've added a rule to match gPXE and non-gPXE, but the card doesn't actually present gPXE capability. I'm using the following dnsmasq template in /etc/cobbler/dnsmasq.template: http://paste.openstack.org/

Can you possibly reproduce this again with that config file? Change 10.20.0.2 to 10.30.0.2 or whatever is the IP of your Fuel Master. The Server-ID (opt 54) cannot be rewritten or else dnsmasq is unable to hear the DHCP ACKs. Also, if you enable "log-dhcp" in dnsmasq config, you can see a little more detail.

Revision history for this message
Roman Sokolkov (rsokolkov) wrote :

Matt, please resend correct link with dnsmasq.template and i will check.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

We'll sync up on April 9 to attempt fixing now. I spoke with Gleb Galkin on this.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

The lab where we can test a fix is not available until maybe April 29

no longer affects: fuel/7.0.x
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/178221

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/178221
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=407802d2f75adda351b2e828875984e71d5ae2f0
Submitter: Jenkins
Branch: master

commit 407802d2f75adda351b2e828875984e71d5ae2f0
Author: Matthew Mosesohn <email address hidden>
Date: Tue Apr 28 17:45:53 2015 +0300

    Remove pxe-service flag in Cobbler and GPXE exclusion

    These options block Broadcom BCM5720 cards from PXE
    boot and are not necessary for provisioning.

    Change-Id: I4c6520bca8549fe96264f25827bc1e534669b860
    Closes-Bug: #1410280

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

New hardware support - Won't Fix for already released versions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.