2017-08-01 19:37:49 |
Jason Hobbs |
bug |
|
|
added bug |
2017-08-01 19:37:49 |
Jason Hobbs |
attachment added |
|
logs-2017-08-01-19.23.59.tar https://bugs.launchpad.net/bugs/1707999/+attachment/4925453/+files/logs-2017-08-01-19.23.59.tar |
|
2017-08-02 13:08:53 |
Chris Gregan |
maas: status |
New |
Incomplete |
|
2017-08-03 02:31:04 |
Jason Hobbs |
maas: status |
Incomplete |
New |
|
2017-08-08 19:15:34 |
Jason Hobbs |
tags |
cdo-qa cdo-qa-blocker foundation-engine |
cdo-qa cdo-qa-blocker foundations-engine |
|
2017-08-23 15:57:53 |
Andres Rodriguez |
maas: milestone |
|
2.3.0 |
|
2017-08-24 18:07:23 |
Andres Rodriguez |
maas: status |
New |
Incomplete |
|
2017-08-24 18:31:51 |
Jason Hobbs |
maas: status |
Incomplete |
New |
|
2017-08-28 17:07:21 |
Andres Rodriguez |
maas: status |
New |
Incomplete |
|
2017-08-28 18:35:21 |
Jason Hobbs |
maas: status |
Incomplete |
New |
|
2017-09-13 13:18:24 |
Andres Rodriguez |
maas: status |
New |
Incomplete |
|
2017-09-20 12:24:20 |
Jason Hobbs |
maas: status |
Incomplete |
New |
|
2017-09-20 12:45:21 |
Jason Hobbs |
attachment added |
|
logs-2017-09-20-12.20.46.tar https://bugs.launchpad.net/maas/+bug/1707999/+attachment/4953498/+files/logs-2017-09-20-12.20.46.tar |
|
2017-09-22 16:04:08 |
Jason Hobbs |
summary |
pod VM fails to PXE boot after receiving multiple DHCP offers |
pod VM fails to PXE boot after receiving multiple DHCP offers from both primary and secondary rack controllers, for different IPs |
|
2017-09-22 19:26:39 |
Andres Rodriguez |
maas: importance |
Undecided |
High |
|
2017-09-22 19:26:40 |
Andres Rodriguez |
maas: status |
New |
Triaged |
|
2017-09-25 19:34:02 |
Andres Rodriguez |
maas: milestone |
2.3.0 |
2.3.0beta2 |
|
2017-09-25 19:44:33 |
Andres Rodriguez |
maas: importance |
High |
Critical |
|
2017-09-28 18:23:32 |
Andres Rodriguez |
nominated for series |
|
maas/2.2 |
|
2017-09-28 18:23:32 |
Andres Rodriguez |
bug task added |
|
maas/2.2 |
|
2017-09-28 18:23:53 |
Andres Rodriguez |
maas/2.2: milestone |
|
2.2.3 |
|
2017-09-28 18:24:33 |
Chris Gregan |
tags |
cdo-qa cdo-qa-blocker foundations-engine |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine |
|
2017-09-28 18:24:43 |
Jason Hobbs |
summary |
pod VM fails to PXE boot after receiving multiple DHCP offers from both primary and secondary rack controllers, for different IPs |
pod VM fails to PXE boot after receiving multiple DHCP offers, for different IPs, from the dhcp server |
|
2017-09-28 18:25:14 |
Jason Hobbs |
description |
A VM failed to PXE boot after receiving multiple DHCP offers.
You can see this here on a log from the secondary controller:
http://paste.ubuntu.com/25221939/
The node is offered both 10.245.208.201 and 10.245.208.120, tries to get 10.245.208.120, and is refused.
One strange thing is that it seems like the DHCP server on both the primary controller and the secondary controller are responding. The primary controller's log doesn't have the offer for 10.245.208.120 - only the offer for 10.245.208.201:
http://paste.ubuntu.com/25221952/
This is in an HA setup: region API's are at 10.245.208.30, 10.245.208.31 and 10.245.208.32. We're using hacluster to load balance, and a VIP in front at 10.245.208.33. There are rack controllers on 10.245.208.30 and 10.245.208.31. For the untagged vlan this VM is trying to boot from, 10.245.208.30 is set as the primary controller, and 10.245.208.31 is set as the secondary.
Primary postgres is on 10.245.208.30, it's being replicated to backup postgres on 10.245.208.31. It has a VIP at 10.245.208.34.
We don't hit this everytime - on this deployment only one machine out of about 30 hit this.
I've attached logs from the maas servers. |
A VM failed to PXE boot after receiving multiple DHCP offers.
You can see this here on a log from the secondary controller:
http://paste.ubuntu.com/25221939/
The node is offered both 10.245.208.201 and 10.245.208.120, tries to get 10.245.208.120, and is refused.
One strange thing is that it seems like the DHCP server on both the primary controller and the secondary controller are responding. The primary controller's log doesn't have the offer for 10.245.208.120 - only the offer for 10.245.208.201:
http://paste.ubuntu.com/25221952/
This is in an HA setup: region API's are at 10.245.208.30, 10.245.208.31 and 10.245.208.32. We're using hacluster to load balance, and a VIP in front at 10.245.208.33. There are rack controllers on 10.245.208.30 and 10.245.208.31. For the untagged vlan this VM is trying to boot from, 10.245.208.30 is set as the primary controller, and 10.245.208.31 is set as the secondary.
Primary postgres is on 10.245.208.30, it's being replicated to backup postgres on 10.245.208.31. It has a VIP at 10.245.208.34.
We don't hit this everytime - on this deployment only one machine out of about 30 hit this.
We've also seen this on single node MAAS setups - non HA. So, it's not an HA specific issue.
I've attached logs from the maas servers. |
|
2017-09-28 18:33:32 |
Andres Rodriguez |
maas/2.2: importance |
Undecided |
Critical |
|
2017-09-28 18:33:32 |
Andres Rodriguez |
maas/2.2: status |
New |
Triaged |
|
2017-10-05 13:15:30 |
Andres Rodriguez |
tags |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal |
|
2017-10-06 20:55:27 |
Andres Rodriguez |
maas: milestone |
2.3.0beta2 |
2.3.0beta3 |
|
2017-10-11 16:21:37 |
Andres Rodriguez |
bug task added |
|
ipxe (Ubuntu) |
|
2017-10-11 16:21:43 |
Andres Rodriguez |
ipxe (Ubuntu): importance |
Undecided |
Critical |
|
2017-10-11 16:21:48 |
Andres Rodriguez |
maas: status |
Triaged |
Incomplete |
|
2017-10-11 16:21:50 |
Andres Rodriguez |
maas/2.2: status |
Triaged |
Incomplete |
|
2017-10-11 21:01:34 |
Andres Rodriguez |
maas: assignee |
|
Blake Rouse (blake-rouse) |
|
2017-10-12 20:27:04 |
Andres Rodriguez |
maas: status |
Incomplete |
Invalid |
|
2017-10-12 20:27:07 |
Andres Rodriguez |
maas/2.2: status |
Incomplete |
Invalid |
|
2017-10-12 20:27:10 |
Andres Rodriguez |
ipxe (Ubuntu): assignee |
|
Andres Rodriguez (andreserl) |
|
2017-10-12 20:41:36 |
Andres Rodriguez |
attachment added |
|
handle-dhcp-nack.patch https://bugs.launchpad.net/maas/+bug/1707999/+attachment/4969236/+files/handle-dhcp-nack.patch |
|
2017-10-13 00:24:04 |
Ubuntu Foundations Team Bug Bot |
tags |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch |
|
2017-10-13 17:23:40 |
David Britton |
ipxe (Ubuntu): assignee |
Andres Rodriguez (andreserl) |
ChristianEhrhardt (paelzer) |
|
2017-10-13 17:24:30 |
David Britton |
nominated for series |
|
Ubuntu Bb-series |
|
2017-10-13 17:24:30 |
David Britton |
bug task added |
|
ipxe (Ubuntu Bb-series) |
|
2017-10-13 17:24:30 |
David Britton |
nominated for series |
|
Ubuntu Xenial |
|
2017-10-13 17:24:30 |
David Britton |
bug task added |
|
ipxe (Ubuntu Xenial) |
|
2017-10-17 17:27:17 |
Launchpad Janitor |
ipxe (Ubuntu): status |
New |
Fix Released |
|
2017-10-25 12:58:57 |
Andres Rodriguez |
ipxe (Ubuntu Bionic): status |
New |
Fix Released |
|
2017-11-14 07:43:25 |
Christian Ehrhardt |
nominated for series |
|
Ubuntu Artful |
|
2017-11-14 07:43:25 |
Christian Ehrhardt |
bug task added |
|
ipxe (Ubuntu Artful) |
|
2017-11-14 07:43:25 |
Christian Ehrhardt |
nominated for series |
|
Ubuntu Zesty |
|
2017-11-14 07:43:25 |
Christian Ehrhardt |
bug task added |
|
ipxe (Ubuntu Zesty) |
|
2017-11-14 07:43:33 |
Christian Ehrhardt |
ipxe (Ubuntu Artful): status |
New |
Fix Released |
|
2017-11-14 07:44:54 |
Christian Ehrhardt |
ipxe (Ubuntu Zesty): status |
New |
Triaged |
|
2017-11-14 07:44:56 |
Christian Ehrhardt |
ipxe (Ubuntu Xenial): status |
New |
Triaged |
|
2017-11-20 18:23:43 |
Andres Rodriguez |
summary |
pod VM fails to PXE boot after receiving multiple DHCP offers, for different IPs, from the dhcp server |
[SRU] iPXE doesn't handle NAK requests when multiple DHCP server's offer |
|
2017-11-20 18:26:58 |
Andres Rodriguez |
description |
A VM failed to PXE boot after receiving multiple DHCP offers.
You can see this here on a log from the secondary controller:
http://paste.ubuntu.com/25221939/
The node is offered both 10.245.208.201 and 10.245.208.120, tries to get 10.245.208.120, and is refused.
One strange thing is that it seems like the DHCP server on both the primary controller and the secondary controller are responding. The primary controller's log doesn't have the offer for 10.245.208.120 - only the offer for 10.245.208.201:
http://paste.ubuntu.com/25221952/
This is in an HA setup: region API's are at 10.245.208.30, 10.245.208.31 and 10.245.208.32. We're using hacluster to load balance, and a VIP in front at 10.245.208.33. There are rack controllers on 10.245.208.30 and 10.245.208.31. For the untagged vlan this VM is trying to boot from, 10.245.208.30 is set as the primary controller, and 10.245.208.31 is set as the secondary.
Primary postgres is on 10.245.208.30, it's being replicated to backup postgres on 10.245.208.31. It has a VIP at 10.245.208.34.
We don't hit this everytime - on this deployment only one machine out of about 30 hit this.
We've also seen this on single node MAAS setups - non HA. So, it's not an HA specific issue.
I've attached logs from the maas servers. |
[Impact]
When there are multiple DHCP servers on the network, iPXE doesn't handle NAK's for the DHCP servers. This causes iPXE to get blocked without attempting to re-discover, hence, never obtaining an IP address.
For example, in a MAAS HA environment with a DHCP master/slave configuration, the machine fails to PXE boot because at a certain point, the DHCP server is not fully in sync, which causes iPXE to get a NAK request. This prevents the machine from PXE booting.
[Test case]
The easiest way:
1. Install MAAS with two rack controllers
2. Configure HA
3. PXE boot KVM's.
[Regression Potential]
Minimal. This only ensures that iPXE attempts to re-discover the network when it receives a NACK.
[Original bug report]
A VM failed to PXE boot after receiving multiple DHCP offers.
You can see this here on a log from the secondary controller:
http://paste.ubuntu.com/25221939/
The node is offered both 10.245.208.201 and 10.245.208.120, tries to get 10.245.208.120, and is refused.
One strange thing is that it seems like the DHCP server on both the primary controller and the secondary controller are responding. The primary controller's log doesn't have the offer for 10.245.208.120 - only the offer for 10.245.208.201:
http://paste.ubuntu.com/25221952/
This is in an HA setup: region API's are at 10.245.208.30, 10.245.208.31 and 10.245.208.32. We're using hacluster to load balance, and a VIP in front at 10.245.208.33. There are rack controllers on 10.245.208.30 and 10.245.208.31. For the untagged vlan this VM is trying to boot from, 10.245.208.30 is set as the primary controller, and 10.245.208.31 is set as the secondary.
Primary postgres is on 10.245.208.30, it's being replicated to backup postgres on 10.245.208.31. It has a VIP at 10.245.208.34.
We don't hit this everytime - on this deployment only one machine out of about 30 hit this.
We've also seen this on single node MAAS setups - non HA. So, it's not an HA specific issue.
I've attached logs from the maas servers. |
|
2017-11-20 18:48:10 |
Andres Rodriguez |
ipxe (Ubuntu Zesty): importance |
Undecided |
Critical |
|
2017-11-20 18:48:11 |
Andres Rodriguez |
ipxe (Ubuntu Xenial): importance |
Undecided |
Critical |
|
2017-11-20 18:48:20 |
Andres Rodriguez |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2017-11-22 15:56:07 |
Chris Gregan |
bug |
|
|
added subscriber Canonical Field Critical |
2017-11-22 17:10:15 |
Brian Murray |
ipxe (Ubuntu Xenial): status |
Triaged |
Fix Committed |
|
2017-11-22 17:10:19 |
Brian Murray |
bug |
|
|
added subscriber SRU Verification |
2017-11-22 17:10:22 |
Brian Murray |
tags |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-needed verification-needed-xenial |
|
2017-11-22 17:11:07 |
Brian Murray |
ipxe (Ubuntu Zesty): status |
Triaged |
Fix Committed |
|
2017-11-22 17:11:14 |
Brian Murray |
tags |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-needed verification-needed-xenial |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-needed verification-needed-xenial verification-needed-zesty |
|
2017-11-23 16:39:43 |
Jason Hobbs |
tags |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-needed verification-needed-xenial verification-needed-zesty |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-done-xenial verification-needed verification-needed-zesty |
|
2017-11-23 16:45:51 |
Jason Hobbs |
tags |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-done-xenial verification-needed verification-needed-zesty |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-done-xenial verification-failed-zesty verification-needed |
|
2017-11-29 05:48:59 |
Steve Langasek |
bug watch added |
|
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=839328 |
|
2017-11-29 17:58:24 |
Launchpad Janitor |
ipxe (Ubuntu Xenial): status |
Fix Committed |
Fix Released |
|
2017-11-29 17:58:27 |
Brian Murray |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2017-11-29 18:01:53 |
Brian Murray |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2017-11-29 18:01:58 |
Brian Murray |
tags |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-done-xenial verification-failed-zesty verification-needed |
cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-done-xenial verification-needed verification-needed-zesty |
|
2017-12-06 18:58:11 |
Chris Gregan |
removed subscriber Canonical Field Critical |
|
|
|
2018-09-12 19:17:45 |
Christian Reis |
ipxe (Ubuntu Zesty): status |
Fix Committed |
Invalid |
|
2021-03-12 07:31:12 |
Christian Ehrhardt |
bug |
|
|
added subscriber Ubuntu Server |