Comment 63 for bug 1707999

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1707999] Re: [SRU] iPXE doesn't handle NAK requests when multiple DHCP server's offer

Then we would need a new version for xenial too right, and have to
re-verify it? Can we just skip zesty?

On Wed, Nov 29, 2017 at 5:45 PM, Steve Langasek <
<email address hidden>> wrote:

> No, you need to upload with a new version number. You can't reuse a
> version number in launchpad.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1707999
>
> Title:
> [SRU] iPXE doesn't handle NAK requests when multiple DHCP server's
> offer
>
> Status in MAAS:
> Invalid
> Status in MAAS 2.2 series:
> Invalid
> Status in ipxe package in Ubuntu:
> Fix Released
> Status in ipxe source package in Xenial:
> Fix Committed
> Status in ipxe source package in Zesty:
> Fix Committed
> Status in ipxe source package in Artful:
> Fix Released
> Status in ipxe source package in Bionic:
> Fix Released
>
> Bug description:
> [Impact]
> When there are multiple DHCP servers on the network, iPXE doesn't handle
> NAK's for the DHCP servers. This causes iPXE to get blocked without
> attempting to re-discover, hence, never obtaining an IP address.
>
> For example, in a MAAS HA environment with a DHCP master/slave
> configuration, the machine fails to PXE boot because at a certain
> point, the DHCP server is not fully in sync, which causes iPXE to get
> a NAK request. This prevents the machine from PXE booting.
>
> [Test case]
> The easiest way:
> 1. Install MAAS with two rack controllers
> 2. Configure HA
> 3. PXE boot KVM's.
>
> [Regression Potential]
> Minimal. This only ensures that iPXE attempts to re-discover the network
> when it receives a NACK.
>
> [Original bug report]
> A VM failed to PXE boot after receiving multiple DHCP offers.
>
> You can see this here on a log from the secondary controller:
> http://paste.ubuntu.com/25221939/
>
> The node is offered both 10.245.208.201 and 10.245.208.120, tries to
> get 10.245.208.120, and is refused.
>
> One strange thing is that it seems like the DHCP server on both the
> primary controller and the secondary controller are responding. The
> primary controller's log doesn't have the offer for 10.245.208.120 - only
> the offer for 10.245.208.201:
> http://paste.ubuntu.com/25221952/
>
> This is in an HA setup: region API's are at 10.245.208.30,
> 10.245.208.31 and 10.245.208.32. We're using hacluster to load
> balance, and a VIP in front at 10.245.208.33. There are rack
> controllers on 10.245.208.30 and 10.245.208.31. For the untagged vlan
> this VM is trying to boot from, 10.245.208.30 is set as the primary
> controller, and 10.245.208.31 is set as the secondary.
>
> Primary postgres is on 10.245.208.30, it's being replicated to backup
> postgres on 10.245.208.31. It has a VIP at 10.245.208.34.
>
> We don't hit this everytime - on this deployment only one machine out
> of about 30 hit this.
>
> We've also seen this on single node MAAS setups - non HA. So, it's
> not an HA specific issue.
>
> I've attached logs from the maas servers.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1707999/+subscriptions
>