initrd dhcp fails / ignores valid response
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
klibc (Ubuntu) |
Fix Released
|
High
|
Jay Vosburgh | ||
Trusty |
Fix Released
|
High
|
Mathieu Trudel-Lapierre | ||
Xenial |
Fix Released
|
High
|
Unassigned | ||
Yakkety |
Fix Released
|
Undecided
|
Unassigned | ||
Zesty |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[SRU justification]
Changes to ordering of kernel enumeration of network interfaces, which may happen in any release, can regress network configuration from an initramfs. Support for netbooting should not depend on interface order, it should work reliably on all systems.
[Test case]
Detailed reproducer described in
<https:/
[Regression potential]
Moderate regression potential, because of a relatively large patch touching a not-widely-used but still critical piece of code. Regression testing should include verifying that MAAS-booted cloud images still work as expected in a variety of environments.
Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect.
Over serial console it outputs:
IP-Config: no response after 2 secs - giving up
IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP
IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP
IP-Config: no response after 3 secs - giving up
with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea:
(initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe 0000:13:00.0 ens2f0: changing MTU from 1500 to 9000
d "ens2f0"
IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP
IP-Config: ens2f0 guessed broadcast address 10.0.1.255
IP-Config: ens2f0 complete (dhcp from 169.254.169.254):
addres[ 728.980448] ixgbe 0000:13:00.0 ens2f0: detected SFP+: 3
s: 10.0.1.56 broadcast: 10.0.1.255 netmask: 255.255.255.0
gateway: 10.0.1.1 [ 729.148410] ixgbe 0000:13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
dns0 : 169.254.169.254 dns1 : 0.0.0.0
rootserver: 169.254.169.254 rootpath:
filename : /ipxe.efi
tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig.
I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue.
Putting "ip=dhcp" back on with this kernel no longer fixes the problem.
I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components.
Ubuntu kernel bisect offending commit:
# first bad commit: [fd4b5fa6e3487d
Ubuntu kernel bisect offending commit submission:
https:/
affects: | linux-meta (Ubuntu) → linux (Ubuntu) |
tags: | added: needs-reverse-bisect |
tags: | added: regression-update |
Changed in linux (Ubuntu): | |
status: | Confirmed → Incomplete |
tags: |
added: reverse-bisect-done removed: needs-reverse-bisect |
tags: | added: cherry-pick |
tags: |
added: bisect-done kernel-bug-exists-upstream kernel-bug-exists-upstream-4.10-rc1 removed: cherry-pick kernel-fixed-upstream kernel-fixed-upstream-4.10-rc1 reverse-bisect-done |
Changed in linux (Ubuntu): | |
importance: | Low → High |
status: | Incomplete → Triaged |
description: | updated |
tags: | added: xenial |
tags: | added: kernel-da-key |
tags: |
added: kernel-key removed: kernel-da-key |
Changed in linux (Ubuntu): | |
status: | Incomplete → Triaged |
Changed in linux (Ubuntu Xenial): | |
importance: | Undecided → High |
status: | New → Triaged |
affects: | linux (Ubuntu) → klibc (Ubuntu) |
Changed in klibc (Ubuntu): | |
assignee: | nobody → Jay Vosburgh (jvosburgh) |
status: | Triaged → Confirmed |
Changed in klibc (Ubuntu): | |
status: | Confirmed → In Progress |
tags: | removed: kernel-bug-exists-upstream kernel-bug-exists-upstream-4.10-rc1 |
description: | updated |
tags: | added: verification-done-xenial |
Changed in klibc (Ubuntu Zesty): | |
status: | New → Fix Released |
Changed in klibc (Ubuntu Trusty): | |
assignee: | nobody → Mathieu Trudel-Lapierre (cyphermox) |
status: | New → In Progress |
importance: | Undecided → High |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1652348
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.