Activity log for bug #1707999

Date Who What changed Old value New value Message
2017-08-01 19:37:49 Jason Hobbs bug added bug
2017-08-01 19:37:49 Jason Hobbs attachment added logs-2017-08-01-19.23.59.tar https://bugs.launchpad.net/bugs/1707999/+attachment/4925453/+files/logs-2017-08-01-19.23.59.tar
2017-08-02 13:08:53 Chris Gregan maas: status New Incomplete
2017-08-03 02:31:04 Jason Hobbs maas: status Incomplete New
2017-08-08 19:15:34 Jason Hobbs tags cdo-qa cdo-qa-blocker foundation-engine cdo-qa cdo-qa-blocker foundations-engine
2017-08-23 15:57:53 Andres Rodriguez maas: milestone 2.3.0
2017-08-24 18:07:23 Andres Rodriguez maas: status New Incomplete
2017-08-24 18:31:51 Jason Hobbs maas: status Incomplete New
2017-08-28 17:07:21 Andres Rodriguez maas: status New Incomplete
2017-08-28 18:35:21 Jason Hobbs maas: status Incomplete New
2017-09-13 13:18:24 Andres Rodriguez maas: status New Incomplete
2017-09-20 12:24:20 Jason Hobbs maas: status Incomplete New
2017-09-20 12:45:21 Jason Hobbs attachment added logs-2017-09-20-12.20.46.tar https://bugs.launchpad.net/maas/+bug/1707999/+attachment/4953498/+files/logs-2017-09-20-12.20.46.tar
2017-09-22 16:04:08 Jason Hobbs summary pod VM fails to PXE boot after receiving multiple DHCP offers pod VM fails to PXE boot after receiving multiple DHCP offers from both primary and secondary rack controllers, for different IPs
2017-09-22 19:26:39 Andres Rodriguez maas: importance Undecided High
2017-09-22 19:26:40 Andres Rodriguez maas: status New Triaged
2017-09-25 19:34:02 Andres Rodriguez maas: milestone 2.3.0 2.3.0beta2
2017-09-25 19:44:33 Andres Rodriguez maas: importance High Critical
2017-09-28 18:23:32 Andres Rodriguez nominated for series maas/2.2
2017-09-28 18:23:32 Andres Rodriguez bug task added maas/2.2
2017-09-28 18:23:53 Andres Rodriguez maas/2.2: milestone 2.2.3
2017-09-28 18:24:33 Chris Gregan tags cdo-qa cdo-qa-blocker foundations-engine cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine
2017-09-28 18:24:43 Jason Hobbs summary pod VM fails to PXE boot after receiving multiple DHCP offers from both primary and secondary rack controllers, for different IPs pod VM fails to PXE boot after receiving multiple DHCP offers, for different IPs, from the dhcp server
2017-09-28 18:25:14 Jason Hobbs description A VM failed to PXE boot after receiving multiple DHCP offers. You can see this here on a log from the secondary controller: http://paste.ubuntu.com/25221939/ The node is offered both 10.245.208.201 and 10.245.208.120, tries to get 10.245.208.120, and is refused. One strange thing is that it seems like the DHCP server on both the primary controller and the secondary controller are responding. The primary controller's log doesn't have the offer for 10.245.208.120 - only the offer for 10.245.208.201: http://paste.ubuntu.com/25221952/ This is in an HA setup: region API's are at 10.245.208.30, 10.245.208.31 and 10.245.208.32. We're using hacluster to load balance, and a VIP in front at 10.245.208.33. There are rack controllers on 10.245.208.30 and 10.245.208.31. For the untagged vlan this VM is trying to boot from, 10.245.208.30 is set as the primary controller, and 10.245.208.31 is set as the secondary. Primary postgres is on 10.245.208.30, it's being replicated to backup postgres on 10.245.208.31. It has a VIP at 10.245.208.34. We don't hit this everytime - on this deployment only one machine out of about 30 hit this. I've attached logs from the maas servers. A VM failed to PXE boot after receiving multiple DHCP offers. You can see this here on a log from the secondary controller: http://paste.ubuntu.com/25221939/ The node is offered both 10.245.208.201 and 10.245.208.120, tries to get 10.245.208.120, and is refused. One strange thing is that it seems like the DHCP server on both the primary controller and the secondary controller are responding. The primary controller's log doesn't have the offer for 10.245.208.120 - only the offer for 10.245.208.201: http://paste.ubuntu.com/25221952/ This is in an HA setup: region API's are at 10.245.208.30, 10.245.208.31 and 10.245.208.32. We're using hacluster to load balance, and a VIP in front at 10.245.208.33. There are rack controllers on 10.245.208.30 and 10.245.208.31. For the untagged vlan this VM is trying to boot from, 10.245.208.30 is set as the primary controller, and 10.245.208.31 is set as the secondary. Primary postgres is on 10.245.208.30, it's being replicated to backup postgres on 10.245.208.31. It has a VIP at 10.245.208.34. We don't hit this everytime - on this deployment only one machine out of about 30 hit this. We've also seen this on single node MAAS setups - non HA. So, it's not an HA specific issue. I've attached logs from the maas servers.
2017-09-28 18:33:32 Andres Rodriguez maas/2.2: importance Undecided Critical
2017-09-28 18:33:32 Andres Rodriguez maas/2.2: status New Triaged
2017-10-05 13:15:30 Andres Rodriguez tags cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal
2017-10-06 20:55:27 Andres Rodriguez maas: milestone 2.3.0beta2 2.3.0beta3
2017-10-11 16:21:37 Andres Rodriguez bug task added ipxe (Ubuntu)
2017-10-11 16:21:43 Andres Rodriguez ipxe (Ubuntu): importance Undecided Critical
2017-10-11 16:21:48 Andres Rodriguez maas: status Triaged Incomplete
2017-10-11 16:21:50 Andres Rodriguez maas/2.2: status Triaged Incomplete
2017-10-11 21:01:34 Andres Rodriguez maas: assignee Blake Rouse (blake-rouse)
2017-10-12 20:27:04 Andres Rodriguez maas: status Incomplete Invalid
2017-10-12 20:27:07 Andres Rodriguez maas/2.2: status Incomplete Invalid
2017-10-12 20:27:10 Andres Rodriguez ipxe (Ubuntu): assignee Andres Rodriguez (andreserl)
2017-10-12 20:41:36 Andres Rodriguez attachment added handle-dhcp-nack.patch https://bugs.launchpad.net/maas/+bug/1707999/+attachment/4969236/+files/handle-dhcp-nack.patch
2017-10-13 00:24:04 Ubuntu Foundations Team Bug Bot tags cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch
2017-10-13 17:23:40 David Britton ipxe (Ubuntu): assignee Andres Rodriguez (andreserl) ChristianEhrhardt (paelzer)
2017-10-13 17:24:30 David Britton nominated for series Ubuntu Bb-series
2017-10-13 17:24:30 David Britton bug task added ipxe (Ubuntu Bb-series)
2017-10-13 17:24:30 David Britton nominated for series Ubuntu Xenial
2017-10-13 17:24:30 David Britton bug task added ipxe (Ubuntu Xenial)
2017-10-17 17:27:17 Launchpad Janitor ipxe (Ubuntu): status New Fix Released
2017-10-25 12:58:57 Andres Rodriguez ipxe (Ubuntu Bionic): status New Fix Released
2017-11-14 07:43:25 Christian Ehrhardt  nominated for series Ubuntu Artful
2017-11-14 07:43:25 Christian Ehrhardt  bug task added ipxe (Ubuntu Artful)
2017-11-14 07:43:25 Christian Ehrhardt  nominated for series Ubuntu Zesty
2017-11-14 07:43:25 Christian Ehrhardt  bug task added ipxe (Ubuntu Zesty)
2017-11-14 07:43:33 Christian Ehrhardt  ipxe (Ubuntu Artful): status New Fix Released
2017-11-14 07:44:54 Christian Ehrhardt  ipxe (Ubuntu Zesty): status New Triaged
2017-11-14 07:44:56 Christian Ehrhardt  ipxe (Ubuntu Xenial): status New Triaged
2017-11-20 18:23:43 Andres Rodriguez summary pod VM fails to PXE boot after receiving multiple DHCP offers, for different IPs, from the dhcp server [SRU] iPXE doesn't handle NAK requests when multiple DHCP server's offer
2017-11-20 18:26:58 Andres Rodriguez description A VM failed to PXE boot after receiving multiple DHCP offers. You can see this here on a log from the secondary controller: http://paste.ubuntu.com/25221939/ The node is offered both 10.245.208.201 and 10.245.208.120, tries to get 10.245.208.120, and is refused. One strange thing is that it seems like the DHCP server on both the primary controller and the secondary controller are responding. The primary controller's log doesn't have the offer for 10.245.208.120 - only the offer for 10.245.208.201: http://paste.ubuntu.com/25221952/ This is in an HA setup: region API's are at 10.245.208.30, 10.245.208.31 and 10.245.208.32. We're using hacluster to load balance, and a VIP in front at 10.245.208.33. There are rack controllers on 10.245.208.30 and 10.245.208.31. For the untagged vlan this VM is trying to boot from, 10.245.208.30 is set as the primary controller, and 10.245.208.31 is set as the secondary. Primary postgres is on 10.245.208.30, it's being replicated to backup postgres on 10.245.208.31. It has a VIP at 10.245.208.34. We don't hit this everytime - on this deployment only one machine out of about 30 hit this. We've also seen this on single node MAAS setups - non HA. So, it's not an HA specific issue. I've attached logs from the maas servers. [Impact] When there are multiple DHCP servers on the network, iPXE doesn't handle NAK's for the DHCP servers. This causes iPXE to get blocked without attempting to re-discover, hence, never obtaining an IP address. For example, in a MAAS HA environment with a DHCP master/slave configuration, the machine fails to PXE boot because at a certain point, the DHCP server is not fully in sync, which causes iPXE to get a NAK request. This prevents the machine from PXE booting. [Test case] The easiest way: 1. Install MAAS with two rack controllers 2. Configure HA 3. PXE boot KVM's. [Regression Potential] Minimal. This only ensures that iPXE attempts to re-discover the network when it receives a NACK. [Original bug report] A VM failed to PXE boot after receiving multiple DHCP offers. You can see this here on a log from the secondary controller: http://paste.ubuntu.com/25221939/ The node is offered both 10.245.208.201 and 10.245.208.120, tries to get 10.245.208.120, and is refused. One strange thing is that it seems like the DHCP server on both the primary controller and the secondary controller are responding. The primary controller's log doesn't have the offer for 10.245.208.120 - only the offer for 10.245.208.201: http://paste.ubuntu.com/25221952/ This is in an HA setup: region API's are at 10.245.208.30, 10.245.208.31 and 10.245.208.32. We're using hacluster to load balance, and a VIP in front at 10.245.208.33. There are rack controllers on 10.245.208.30 and 10.245.208.31. For the untagged vlan this VM is trying to boot from, 10.245.208.30 is set as the primary controller, and 10.245.208.31 is set as the secondary. Primary postgres is on 10.245.208.30, it's being replicated to backup postgres on 10.245.208.31. It has a VIP at 10.245.208.34. We don't hit this everytime - on this deployment only one machine out of about 30 hit this. We've also seen this on single node MAAS setups - non HA. So, it's not an HA specific issue. I've attached logs from the maas servers.
2017-11-20 18:48:10 Andres Rodriguez ipxe (Ubuntu Zesty): importance Undecided Critical
2017-11-20 18:48:11 Andres Rodriguez ipxe (Ubuntu Xenial): importance Undecided Critical
2017-11-20 18:48:20 Andres Rodriguez bug added subscriber Ubuntu Stable Release Updates Team
2017-11-22 15:56:07 Chris Gregan bug added subscriber Canonical Field Critical
2017-11-22 17:10:15 Brian Murray ipxe (Ubuntu Xenial): status Triaged Fix Committed
2017-11-22 17:10:19 Brian Murray bug added subscriber SRU Verification
2017-11-22 17:10:22 Brian Murray tags cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-needed verification-needed-xenial
2017-11-22 17:11:07 Brian Murray ipxe (Ubuntu Zesty): status Triaged Fix Committed
2017-11-22 17:11:14 Brian Murray tags cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-needed verification-needed-xenial cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-needed verification-needed-xenial verification-needed-zesty
2017-11-23 16:39:43 Jason Hobbs tags cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-needed verification-needed-xenial verification-needed-zesty cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-done-xenial verification-needed verification-needed-zesty
2017-11-23 16:45:51 Jason Hobbs tags cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-done-xenial verification-needed verification-needed-zesty cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-done-xenial verification-failed-zesty verification-needed
2017-11-29 05:48:59 Steve Langasek bug watch added https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=839328
2017-11-29 17:58:24 Launchpad Janitor ipxe (Ubuntu Xenial): status Fix Committed Fix Released
2017-11-29 17:58:27 Brian Murray removed subscriber Ubuntu Stable Release Updates Team
2017-11-29 18:01:53 Brian Murray bug added subscriber Ubuntu Stable Release Updates Team
2017-11-29 18:01:58 Brian Murray tags cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-done-xenial verification-failed-zesty verification-needed cdo-qa cdo-qa-blocker cdo-release-blocker foundations-engine internal patch verification-done-xenial verification-needed verification-needed-zesty
2017-12-06 18:58:11 Chris Gregan removed subscriber Canonical Field Critical
2018-09-12 19:17:45 Christian Reis ipxe (Ubuntu Zesty): status Fix Committed Invalid
2021-03-12 07:31:12 Christian Ehrhardt  bug added subscriber Ubuntu Server