[oneiric] net-installer dhcp client fails with a DHCPDECLINE

Bug #848072 reported by Jacob Strauss on 2011-09-12
34
This bug affects 7 people
Affects Status Importance Assigned to Milestone
netcfg (Ubuntu)
Medium
Stéphane Graber

Bug Description

Installing recent oneiric builds via the net installer images fails. The first dhcp request to fetch the pxe boot image succeeds, as does the second before fetching a kickstart file.
The third and final time, however, fails with the client rejecting the lease with a DHCPDECLINE, repeating this over and over again.

I first started seeing this problem somewhere after the alpha-3 installer image, though I'm not sure exactly where. The 20101020ubuntu63 installer build fails.

Tracing through /sbin/dhclient-script it looks like the call in BOUND|RENEW|REBIND|REBOOT) where it calls 'ip -4 addr add [..]' was failing with a File Exists response. The old and new settings appeared identical, however.

I managed to work around this by adding an unconditional call to 'ip -4 addr flush dev $interface' in /sbin/dhclient-script right after the call to set_hostname. With this workaround, all proceeds as expected, though I'm not sure what the underlying problem is.

Changed in base-installer (Ubuntu):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Steve Langasek (vorlon) wrote :

Stéphane, this sounds right up your alley. Can you take a look at this?

Changed in base-installer (Ubuntu):
assignee: Canonical Foundations Team (canonical-foundations) → Stéphane Graber (stgraber)
importance: Undecided → Medium
Colin Watson (cjwatson) wrote :

A possible cause that comes to mind would be the IPv6 support changes in isc-dhcp 4.1.1-P1-17ubuntu7.

affects: base-installer (Ubuntu) → netcfg (Ubuntu)
Stéphane Graber (stgraber) wrote :

Ok, so just to make sure I understood your setup correctly.

You have a PXE server (DHCP + TFTP) serving the content of netboot.tar.gz with I'm guessing a kickstart/preseed value set on the kernel command line pointing to a kickstart file available on the internet?

Can you attach your /var/log/syslog from the installer environment and confirm that it's the network configuration step that's indeed failing in d-i?

Assuming you can fix your network by hand, netcat should work to send your syslog to another machine where you can then attach it to the bug.

Jacob Strauss (jacob-strauss) wrote :

Yes, I have a PXE & DHCP server serving out the netboot kernel and initrd.gz file, as copied from the mirrors in ubuntu/dists/oneiric/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/.

I created a stripped down kickstart file to demonstrate, and will attach the kickstart config file and a resulting syslog. Yes, it's the network configuration step that's failing.

The commented-out %pre section in the kickstart file is how I worked around it as previously mentioned.

The pxe server is using the following to serve out the boot images:

label ubuntu-minimal
        menu label ^Ubuntu Minimal
        ipappend 2
        kernel ubuntu-installer-11.10/amd64/linux
        append initrd=ubuntu-installer-11.10/amd64/initrd.gz DEBCONF_DEBUG=5 ks=http://172.18.0.11/ks/ubuntu-minimal.cfg console=ttyS1,115200n8

Jacob Strauss (jacob-strauss) wrote :
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in netcfg (Ubuntu):
status: New → Confirmed
Stéphane Graber (stgraber) wrote :

Hmm, I tried reproducing the issue here with Oneiric 64bit but don't seem to get the same issue as you did.

I'm using your ubuntu-minimal.cfg hosted at http://www.stgraber.org/download/ubuntu-minimal.cfg and booting the current kernel and initrd from http://archive.ubuntu.com/ubuntu/dists/oneiric/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/ using pxelinux in a VM using the etherboot boot rom.

I first see d-i starting up and doing basic network configuration, then restarting once it's parsed the kickstart file, then doing dhcp configuration again, succeeding and starting to download installer components.

Can you confirm you still have that issue with what's currently on http://archive.ubuntu.com/ubuntu/dists/oneiric/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/ and if so, could you share your dhcp configuration?

Jacob Strauss (jacob-strauss) wrote :

Yes, I do still see the same problem with the current kernel and initrd.

I also tried creating a VM here and pxe-booting it. I do see the same problem in the new VM.

The relevant portions of the dhcpd.conf for the new VM are as follows. The dhcp server is an old CentOS box.

allow booting;
allow bootp;
option domain-name "qrclab.com";
option domain-name-servers 172.17.0.111,172.17.0.101;
default-lease-time 3600;
max-lease-time 7200;
get-lease-hostnames true;
authoritative;

subnet 172.18.0.0 netmask 255.255.255.0 {
   option routers 172.18.0.1;

   host dunsel {
        hardware ethernet 52:54:00:18:00:80;
        fixed-address dunsel;
        next-server 172.18.0.11;
        filename "/pxelinux.0";
   }
}

Jacob Strauss (jacob-strauss) wrote :

Here is a tcpdump trace of the dhcp interactions from boot through the beginning of the declines.

Stéphane Graber (stgraber) wrote :

Can you try with this initrd.gz?

It's the amd64 initrd patched with a custom netcfg that will give a slightly different debug output as well as try to flush all addresses and routes on all interfaces prior to setup.

Would be great if you could test it and post the syslog.

Thanks

Jacob Strauss (jacob-strauss) wrote :

Okay, still fails the same way. Here's the syslog of booting the vm with the newinitrd.gz

Stéphane Graber (stgraber) wrote :

Oops, my bad, my patch would only work with > 1 network card. Will have a new one in a few minutes for you to test.

Stéphane Graber (stgraber) wrote :

Here's the new initrd. Running it here I see it correctly flushing everything before autoconfig starts.

Let me know if it works for you.

Jacob Strauss (jacob-strauss) wrote :

Yes, the new newinitrd does indeed work. syslog attached.

Stéphane Graber (stgraber) wrote :

Attaching the diff of the fix itself.

Will now run a bunch of tests to ensure we don't get regressions, specifically with ipv6.

The attachment "netcfg-flush-addresses-and-routes.diff" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Stéphane Graber (stgraber) wrote :

Just done running a few tests and didn't find any regression, though that was just with a standard:
 - Start the installer
 - Let netcfg run automatically
 - Download components
 - End of test

I only tried once to get back to netcfg from a later point and it seemed to work fine too.

Changed in netcfg (Ubuntu):
status: Confirmed → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package netcfg - 1.68ubuntu6

---------------
netcfg (1.68ubuntu6) oneiric; urgency=low

  [ Colin Watson ]
  * Backport from trunk:
    - Don't log "Starting netcfg" messages when invoked as ptom.

  [ Stéphane Graber ]
  * Flush all addresses and routes before configuring interfaces (LP: #848072)
 -- Stephane Graber <email address hidden> Thu, 29 Sep 2011 13:46:34 -0400

Changed in netcfg (Ubuntu):
status: Fix Committed → Fix Released
Jason Sharp (jsharp) wrote :

I'm still seeing a problem doing DNS look up even after this seems to be 'fixed'

If, when trying to download a kickstart config by DNS, it immediately fails. I can see it go through the process of getting an IP, then it attempts to get another IP again. I know if i swap to a different tty and run udhcp everything comes back to life, but by then its too late

Can you walk me through the steps to see if this is actually my issue or if i've uncovered another bug some where?

Jacob Strauss (jacob-strauss) wrote :

I wasn't trying to use DNS names in the name of the kickstart server before the problems mentioned above were fixed, but I just tried doing that now and it works for me.

jaiber (jaiber-john) wrote :

This problem exists in Ubuntu 12.04 amd64 version. Had to fiddle with dhcp-client-script to get the installation going./

jaiber (jaiber-john) wrote :

I forgot to mention it was the server edition Ubuntu 12.04 amd64, and the filename I modified within the initrd.gz was /sbin/dhcpclient-script

Keshav Prabhakar (keshav-m) wrote :

I'm seeing the same issue on 12.04 i386.
I tried patching dhclient-script as jaiber-john did but couldn't succeed. would you mind sharing the diff please?

Keshav Prabhakar (keshav-m) wrote :

and it happens with 11.10 i386 as well. looks like it works fine with hardy - http://archive.ubuntu.com/ubuntu/dists/hardy/main/installer-i386/current/images/netboot/netboot.tar.gz

Jorge Castro (jorge) wrote :

Reopening since this seems to affect 12.04.

Changed in netcfg (Ubuntu):
status: Fix Released → Confirmed
Launchpad Janitor (janitor) wrote :
Download full text (20.6 KiB)

This bug was fixed in the package netcfg - 1.111ubuntu1

---------------
netcfg (1.111ubuntu1) trusty; urgency=low

  * Merge from Debian unstable. Remaining changes:
    - Set default hostname to 'ubuntu'.
    - Set priority for get_domain to high for static configurations.
    - Set priority for get_domain to medium for non-static configurations.
    - Use 'auto <interface>' for all interfaces, dropping allow-hotplug
      which doesn't work with current udev.
    - Set DHCP and DHCPv6 timeout to 30s.
    - Use isc-dhcp-client-udeb on all architectures.
    - Flush all addresses and routes before configuring interfaces
      (LP: #848072)
    - Don't copy /etc/resolv.conf to target if resolvconf is installed.
      (We already write resolvconf configuration to /etc/network/interfaces.)
    - Add a post-base-install hook to detect resolvconf and copy
      /run/resolvconf/resolv.conf to outside the target so that when /run
      is bind-mounted DNS resolving continues to work. (LP: #926447)
    - Apply patch from Alec Warner making netcfg respect netcfg/dhcpv6_timeout
      and running dhclient in one-shot mode (-1). (LP: #917905)
  * Fix FTBFS by checking the return value of fgets and fscanf.
  * Fix nm-conf to generate a valid NetworkManager static configuration file.

netcfg (1.111) unstable; urgency=high

  [ Dmitrijs Ledkovs ]
  * Bump debhelper compat level to 9.
  * Set Vcs-* to canonical format.

  [ Samuel Thibault ]
  * Add -lm after -lcheck, since libcheck.a needs some maths functions.
    (Closes: Bug#713616)

  [ Cyril Brulebois ]
  * Also add -lpthread and -lrt.
  * Set urgency to high for the bugfix below and the upcoming d-i release.

  [ Philipp Kern ]
  * Wrap dpkg-query call to check for network-manager with sh.
    Thanks to Michael Biebl for the patch. (Closes: #717449)

netcfg (1.110) unstable; urgency=low

  [ Colin Watson ]
  * Use correct compiler when cross-building.

netcfg (1.109) unstable; urgency=low

  [ Samuel Thibault ]
  * Fix build on hurd-i386.

  [ Updated translations ]
  * Croatian (hr.po) by Tomislav Krznar

netcfg (1.108) unstable; urgency=low

  [ Samuel Thibault ]
  * Do not set netcfg/use_autoconfig to true just because netcfg/disable_dhcp
    is false (which is the default), otherwise netcfg/disable_autoconfig has no
    effect. (Closes: #703747, #688273)

  [ Philipp Kern ]
  * Install iw whenever wireless-tools is installed on the target.
    Patch by Charles Plessy. (Closes: #697890)

  [ Updated translations ]
  * Amharic (am.po) by Tegegne Tefera
  * Croatian (hr.po) by Tomislav Krznar
  * Tamil (ta.po) by Dr.T.Vasudevan

netcfg (1.107) unstable; urgency=low

  * finish-install.d/55netcfg-copy-config: Do not rely on dpkg -l
    to check if a package is installed; use dpkg-query -s instead
    and check status explicitly. (Closes: #700939)

netcfg (1.106) unstable; urgency=low

  * finish-install.d/55netcfg-copy-config: Exit if connection type
    information was not written by netcfg. This preserves the
    generated /etc/network/interfaces on kFreeBSD. It will also
    preserved files generated by users between base install and
    finishing. (Closes: #698626)

netcfg (1.105) unstable; urgen...

Changed in netcfg (Ubuntu):
status: Confirmed → Fix Released
Metin OSMAN (metin-osmanoglu) wrote :

Hi all,

sorry to reopen this old ticket (maybe I should open a new ticket, I don't know the correct workflow), I am facing the same issue on Ubuntu 14.04.

I am trying to netboot using cobbler + PXE + TFTP, everything works fine except that DHCP auto configuration fails.
I can see on the logs that every DHCP offers is rejected with DHCP decline.

I have tried to modify /sbin/dhclient-script as initially proposed by Jacob Strauss without success.

Please let me know what relevant data you need to help me.

Thanks

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers