resolv.conf empty when doing PXE installations

Bug #1013843 reported by Daniel Manrique on 2012-06-15
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
casper (Ubuntu)
High
Stéphane Graber
Quantal
High
Stéphane Graber
resolvconf (Ubuntu)
High
Unassigned
Quantal
High
Unassigned
ubiquity (Ubuntu)
High
Unassigned
Quantal
High
Unassigned

Bug Description

Quantal daily image as of 20110615.
resolvconf 1.65ubuntu4.

In our environment we do network installs via PXE booting. We noticed that name resolution wasn't working during the late_command phase (e.g. any apt-get install operations in the ubiquity/late_command fail).

PXE passes IP information (including DNS) to the kernel, and in this case, the entry in /etc/network/interfaces is:

auto eth0
iface eth0 inet manual

What we found is that, with this configuration, /etc/resolv.conf will be unconfigured (even though DHCP *did* send DNS information):

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN

if I change the /etc/network/interfaces entry to

auto eth0
iface eth0 inet dhcp

then I run:

sudo ifup --force eth0

then /etc/resolv.conf is populated correctly and DNS resolution starts working.

This would be a regression from Precise, where the resolv.conf file gets correctly populated even if the interface is set as manual.

Thomas Hood (jdthood) wrote :

Hi. At the point where the system has booted and /etc/resolv.conf is empty (except for the comment), please do the following and post the output here.

ls -l /run/resolvconf
ls -l /run/resolvconf/interface
for F in /run/resolvconf/interface/* ; do echo === $F === ; cat $F ; done
for F in /etc/resolvconf/resolv.conf.d/* ; do echo === $F === ; cat $F ; done
cat /etc/NetworkManager/NetworkManager.conf
cat /run/nm-dns-dnsmasq.conf

Thomas Hood (jdthood) wrote :

And also

ls -l /etc/resolv.conf

Steve Langasek (vorlon) on 2012-06-17
Changed in resolvconf (Ubuntu):
importance: Undecided → High
Daniel Manrique (roadmr) wrote :

Hi Thomas,

A few of the files you requested weren't in /run, here's the output of the commands you requested (I put them all in one script):

# bash script.sh
total 4
-rw-r--r-- 1 root root 0 Jun 18 09:53 enable-updates
drwxr-xr-x 2 root root 40 Jun 18 09:53 interface
-rw-r--r-- 1 root root 151 Jun 18 09:53 resolv.conf
total 0
=== /run/resolvconf/interface/* ===
cat: /run/resolvconf/interface/*: No such file or directory
=== /etc/resolvconf/resolv.conf.d/base ===
=== /etc/resolvconf/resolv.conf.d/head ===
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
[main]
plugins=ifupdown,keyfile
dns=dnsmasq

[ifupdown]
managed=false
cat: /run/nm-dns-dnsmasq.conf: No such file or directory

The remaining file you requested /etc/resolv.conf has nothing but the comment as seen in the original report:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN

Please let me know if you need more information about this problem. Thanks!

Thomas Hood (jdthood) wrote :

Thanks very much for the additional info. The error message "cat: /run/resolvconf/interface/*: No such file or directory" indicates that resolvconf has had no nameserver information registered with it. It seems that nothing has called resolvconf to register the nameserver information obtain via DHCP.

When you define eth0 as an inet dhcp i'face in /etc/network/interfaces and do "ifup eth0", ifup starts one of the standard DHCP clients, which are integrated with resolvconf. That's why name service starts working.

I don't know much about Ubiquity. Someone who does know Ubiquity well will be better equipped than I am to carry on the investigation with you.

Thomas Hood (jdthood) wrote :

I found an old New report of a similar phenomenon: no nameserver addresses after installation: bug #214492.

Thomas Hood (jdthood) wrote :

The late_command phase is after a reboot in the target environment.[0] (Please correct me if I'm wrong.)

[0]https://help.ubuntu.com/community/InstallCDCustomization

On the target system, with resolvconf, it's to be expected that no nameserver addresses are listed in /etc/resolv.conf, given that no addresses are listed in /etc/resolvconf/resolv.conf.d/* or in /etc/network/interfaces and the interface is defined in /e/n/i as "manual".

Would it be possible for you to run all the commands I mentioned in #1 and #2, plus "cat /etc/network/interfaces", on a Precise system (where you say resolv.conf is not empty) at exactly the same stage?

Thomas Hood (jdthood) wrote :

See if the following fixes the problem. In the stanza

    iface eth0 inet manual

add

    dns-nameservers <nameserver-address>...

so that if the addresses are 8.8.8.8 and 8.8.4.4 it looks like this

    iface eth0 inet manual
        dns-nameservers 8.8.8.8 8.8.4.4

Daniel Manrique (roadmr) wrote :

Hi Thomas,

The late_command (or ubiquity's success_command) is run after the rest of the installation has finished (I think it even runs after installing the boot loader), but the system doesn't reboot prior to running it.

I did the test you requested on a Precise system. Here's what I do (and also what I did on Quantal to produce the output on comment #3):

- Start the PXE installation process
- As the installation starts (e.g. the system says creating filesystem), I go to a virtual console
- From here, cat /etc/resolv.conf shows the file I posted in the original report (on Quantal) or a correctly-poopulated file (on Precise; I'll attach this below).
- I run the script you provided, using "script" to capture the output, e.g.
  script output.txt
  ./thomas-script.sh
- I put the output.txt file on a USB stick and shuttle back to my laptop :0

So as promised, here's the result of running all those commands on a Precise installation. I see resolv.conf has correct DNS information (as taken from DHCP) and I also notice it's a symlink:

Script started on Tue 19 Jun 2012 10:55:26 AM EDT
root@ubuntu:/mnt# bash script.sh
total 4
-rw-r--r-- 1 root root 0 Jun 19 10:54 enable-updates
drwxr-xr-x 2 root root 60 Jun 19 10:53 interface
-rw-r--r-- 1 root root 197 Jun 19 10:54 resolv.conf
total 4
-rw-r--r-- 1 root root 113 Jun 19 10:53 casper
=== /run/resolvconf/interface/casper ===
# /etc/resolv.conf
# Autogenerated by casper
search canonical.com
domain canonical.com
nameserver 10.153.104.60

=== /etc/resolvconf/resolv.conf.d/base ===
=== /etc/resolvconf/resolv.conf.d/head ===
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
[main]
plugins=ifupdown,keyfile
dns=dnsmasq

[ifupdown]
managed=false
cat: /run/nm-dns-dnsmasq.conf: No such file or directory
root@ubuntu:/mnt# cat /etc/network/interfaces
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual

root@ubuntu:/mnt# cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.153.104.60
search canonical.com
root@ubuntu:/mnt# ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 29 Apr 25 12:04 /etc/resolv.conf -> ../run/resolvconf/resolv.conf
root@ubuntu:/mnt# exit
Script done on Tue 19 Jun 2012 10:55:59 AM EDT

Daniel Manrique (roadmr) wrote :

Hi Thomas,

For your suggestion on comment #7, note that the "iface eth0 inet manual" stanza was added automatically by casper as a result of netbooting, thus I don't really have an opportunity to add the dns-nameservers entry prior to the system booting.

I could do it in one of three places:

- Somehow hacking casper's scripts/casper-bottom/23networking file to add the dns-nameservers entry and rebuilding the initrd with this modified casper.

- Manually, by jumping to a console and editing the file by hand. But I'd still have to ifup the interface for the changes to take effect.

- Automatically, in my preseed file, possibly in the late_command or in one of the early_commands. Again, I probably would have to somehow reconfigure the interface for the changes to be considered.

I'll try the manual option (second one) to see which results this workaround yields and update again. Thanks!

Changed in ubiquity (Ubuntu):
importance: Undecided → High
Thomas Hood (jdthood) wrote :

+casper (1.316) quantal; urgency=low
+
[...]
+ * Disable network-manager when installing from NETBOOT, otherwise we
+ loose access to the media when NM starts, causing the system to freeze.
+
+ -- Stéphane Graber <email address hidden> Mon, 30 Apr 2012 15:39:32 +0200

Hmm!

Thomas Hood (jdthood) wrote :

> if [ ! -f /root/etc/resolv.conf ] || [ -z "$(cat /root/etc/resolv.conf)" ]; then
> if [ -n "${DEVICE}" ] && [ -e /tmp/net-"${DEVICE}".conf ]; then
> # create a resolv.conf if it is not present
[...]
> # Deal with resolvconf
> # Writing to /run instead or /root/run as /sbin/init will move /run
> # to /root/run a bit later on
> if [ -x /root/sbin/resolvconf ] && [ -L /root/etc/resolv.conf ]; then
> mkdir -p /run/resolvconf/interface/
> resolv=/run/resolvconf/interface/casper
[...]
> cat > $resolv <<EOF

I have just started learning about casper, ubiquity, and so on and am still highly clueless about them, but reading the code above makes me a bit suspicious.

Clearly casper will fail to cat to /run/resolvconf/interface/casper, the file that was present in Precise and has gone missing in Quantal, if any of the quoted "if" lines fails.

The first line looks like an attempt to test for /etc/resolv.conf being empty in the chroot. Now, when resolvconf is managing resolv.conf, the file always consists of at least two lines of comments. Hmm.

Thomas Hood (jdthood) wrote :

I just grabbed the Quantal Desktop daily image of 19-Jun-2012 and extracted the squashfs.

There I found that resolvconf's database contains an "original.resolvconf" record with "nameserver 10.122.37.1" in it. Explanation: When resolvconf is installed it puts a copy of the original resolv.conf file in /etc/resolv.conf.d/original and another copy in /run/resolvconf/interface/original.resolvconf so that no nameserver information is lost. The latter file (correctly) disappears at the next reboot. I wonder if that file belongs in the live image.

=================BEGIN LOG ================
 # ls -l ./etc/resolv.conf
./etc/resolv.conf -> ../run/resolvconf/resolv.conf
# cd run/resolvconf
# ls -l
total 12
drwxr-xr-x 2 root root 4096 Jun 19 09:39 interface
-r--r--r-- 1 root root 0 Jun 19 09:39 resolv.conf
# ls -l interface
total 12
-r--r--r-- 1 root root 417 Jan 23 20:56 original.resolvconf

# cat interface/original.resolvconf
#
# m
# mmmm m m mmmm mmmm mmm mm#mm
# #" "# # # #" "# #" "# #" # #
# # # # # # # # # #"""" #
# ##m#" "mm"# ##m#" ##m#" "#mm" "mm
# # # #
# " " "
# This file is managed by puppet. Do not make local changes.

domain buildd
nameserver 10.122.37.1
============== END LOG ==============

Can someone please explain exactly how nameserver and other networking information is passed from one program to another when booting a live image, installing Ubuntu on a machine, (apparently) chrooting to the installed system to run late_command, and rebooting?

A good overview of how live CDs are built would also be helpful to me.

Thomas Hood (jdthood) wrote :

This issue (#1013843) needs to be addressed together with #388060.

Thomas Hood (jdthood) wrote :

@Daniel: What's in /var/log/netboot.config on the affected system at the point where you can't connect?

We have established that nothing is in /run/network/interface/ then, but casper should have written /run/network/interface/casper. It appends the latter file to /root/var/log/netboot.config. If we see

    # /etc/resolv.conf
    # Autogenerated by casper
    search canonical.com
    domain canonical.com
    nameserver 10.153.104.60

in there then we can conclude that /run/network/interface/casper got written but was deleted later.

If /run/network/interface/casper never got written then we need to debug scripts/casper-bottom/23networking. We need to find out why the code quoted in #11 is never reached.

Can you add some logging commands to the script to find out what the results are of the individual tests? The tests I have in mind are the following ones.

    -f /root/etc/resolv.conf
     -z "$(cat /root/etc/resolv.conf)"
    -n "${DEVICE}"
     -e /tmp/net-"${DEVICE}".conf
    -x /root/sbin/resolvconf
    -L /root/etc/resolv.conf

Thomas Hood (jdthood) wrote :

The erroneous presence of the original.resolvconf file on the squashfs, first mentioned in comment #12, I have just reported in bug #1016015. I don't think that the presence of that file has anything to do with the problem here (#1013843).

Tip: Someone has at least made a rough outline of the Casper boot process: http://wiki.flimzy.com/index.php/Casper_boot_process

Steve Langasek (vorlon) wrote :

I'm pretty sure this doesn't involve ubiquity itself. Marking invalid.

Changed in ubiquity (Ubuntu Quantal):
status: New → Invalid
tags: added: rls-q-incoming
Changed in resolvconf (Ubuntu Quantal):
milestone: none → quantal-alpha-2
Changed in casper (Ubuntu Quantal):
milestone: none → quantal-alpha-2
Steve Langasek (vorlon) on 2012-06-22
tags: removed: rls-q-incoming
Changed in casper (Ubuntu Quantal):
assignee: nobody → Stéphane Graber (stgraber)
Stéphane Graber (stgraber) wrote :

It sounds pretty unlikely to be a bug in resolvconf itself, so marking it invalid against resolvconf for now.

The excepted behaviour on netboot is:
 - No network manager
 - No ifupdown
 - Configured /etc/resolv.conf

I'll have to do some more tests but the changes I pushed early in the quantal cycles were used to deploy a few hundred machines using netboot+cifs, so it at least to some extent worked in the past...

Changed in resolvconf (Ubuntu Quantal):
status: New → Invalid
Changed in casper (Ubuntu Quantal):
milestone: quantal-alpha-2 → quantal-alpha-3
importance: Undecided → High
Stéphane Graber (stgraber) wrote :

The cause of the problem was a change in ipconfig from writing /tmp/net-$DEVICE.conf to writing /run/net-$DeVICE.conf and breaking 23networking in the process.

I did a few more tests here and fixed some code duplication and missing kernel modules with cifs but it looks like it's all working now, NM is running but not touching the interface, resolv.conf contains the expected content and the interface is properly configured.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package casper - 1.319

---------------
casper (1.319) quantal; urgency=low

  * Remove conf.d/compcache as compcache doesn't exist anymore.
    A possible replacement would be zram.
  * Add md4 and des_generic to the initramfs, required by cifs.
  * Revert my previous change as the network-manager override isn't actually
    needed when the /e/n/i logic works (and marks the interface as manual).
  * Update paths to net-$INTERFACE.conf to /run/ as it now lives there instead
    of /tmp. (LP: #1013843)
 -- Stephane Graber <email address hidden> Tue, 03 Jul 2012 22:36:16 -0400

Changed in casper (Ubuntu Quantal):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers