Connection requests to saucy server VMs from a hosts fail after fresh VM installs

Bug #1197484 reported by Para Siva on 2013-07-03
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
isc-dhcp (Ubuntu)
High
Jamie Strandboge

Bug Description

The ssh/ ping requests from a precise host (aldebaran) to the client VMs of saucy server installations fail on reboot after fresh installations of today's images (20130703). The ping requests to the IP address allocated to the VMs during the installations fail with 'Destination Host Unreachable'. There does not appear to be anything wrong during installations.
The VMs use libvirt/KVM and using bridged interface.
This issue happens not always but most of the time. I think the issue happens more often when more VMs present in the host.

The following error is reported during the reboot, (though I am not sure if that is the cause for the above behaviour)
ul 3 05:20:38 utah-11011-saucy-server-i386 dhclient: Internet Systems Consortium DHCP Client 4.2.4
Jul 3 05:20:38 utah-11011-saucy-server-i386 dhclient: Copyright 2004-2012 Internet Systems Consortium.
Jul 3 05:20:38 utah-11011-saucy-server-i386 dhclient: All rights reserved.
Jul 3 05:20:38 utah-11011-saucy-server-i386 dhclient: For info, please visit https://www.isc.org/software/dhcp/
Jul 3 05:20:38 utah-11011-saucy-server-i386 dhclient:
Jul 3 05:20:38 utah-11011-saucy-server-i386 dhclient: Listening on LPF/eth0/52:54:00:73:cc:82
Jul 3 05:20:38 utah-11011-saucy-server-i386 dhclient: Sending on LPF/eth0/52:54:00:73:cc:82
Jul 3 05:20:38 utah-11011-saucy-server-i386 dhclient: Sending on Socket/fallback
Jul 3 05:20:38 utah-11011-saucy-server-i386 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 (xid=0x2bd1e2f7)
Jul 3 05:20:39 utah-11011-saucy-server-i386 kernel: [ 5.995994] type=1400 audit(1372843239.131:5): apparmor="STATUS" operation="profile_replace" parent=441 profile="unconfined" name="/sbin/dhclient" pid=442 comm="apparmor_parser"
Jul 3 05:20:39 utah-11011-saucy-server-i386 kernel: [ 6.008224] type=1400 audit(1372843239.147:6): apparmor="STATUS" operation="profile_replace" parent=441 profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=442 comm="apparmor_parser"
Jul 3 05:20:39 utah-11011-saucy-server-i386 kernel: [ 6.008368] type=1400 audit(1372843239.147:7): apparmor="STATUS" operation="profile_replace" parent=441 profile="unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=442 comm="apparmor_parser"
Jul 3 05:20:39 utah-11011-saucy-server-i386 failsafe: Failsafe of 120 seconds reached.
Jul 3 05:20:39 utah-11011-saucy-server-i386 kernel: [ 6.433462] type=1400 audit(1372843239.571:8): apparmor="STATUS" operation="profile_replace" parent=478 profile="unconfined" name="/sbin/dhclient" pid=486 comm="apparmor_parser"
Jul 3 05:20:39 utah-11011-saucy-server-i386 kernel: [ 6.433655] type=1400 audit(1372843239.571:9): apparmor="STATUS" operation="profile_replace" parent=478 profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=486 comm="apparmor_parser"
Jul 3 05:20:39 utah-11011-saucy-server-i386 kernel: [ 6.433788] type=1400 audit(1372843239.571:10): apparmor="STATUS" operation="profile_replace" parent=478 profile="unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=486 comm="apparmor_parser"
Jul 3 05:20:39 utah-11011-saucy-server-i386 ntpdate[602]: Can't find host ntp.ubuntu.com: System error (-11)
Jul 3 05:20:39 utah-11011-saucy-server-i386 ntpdate[602]: no servers can be used, exiting
Jul 3 05:20:39 utah-11011-saucy-server-i386 kernel: [ 6.433655] type=1400 audit(1372843239.571:9): apparmor="STATUS" operation="profile_replace" parent=478 profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=486 comm="apparmor_parser"
Jul 3 05:20:39 utah-11011-saucy-server-i386 kernel: [ 6.433788] type=1400 audit(1372843239.571:10): apparmor="STATUS" operation="profile_replace" parent=478 profile="unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=486 comm="apparmor_parser"
Jul 3 05:20:39 utah-11011-saucy-server-i386 ntpdate[602]: Can't find host ntp.ubuntu.com: System error (-11)
Jul 3 05:20:39 utah-11011-saucy-server-i386 ntpdate[602]: no servers can be used, exiting
Jul 3 05:20:41 utah-11011-saucy-server-i386 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 (xid=0x2bd1e2f7)
Jul 3 05:20:41 utah-11011-saucy-server-i386 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 (xid=0x2bd1e2f7)
Jul 3 05:20:42 utah-11011-saucy-server-i386 dhclient: DHCPREQUEST of 10.98.2.36 on eth0 to 255.255.255.255 port 67 (xid=0x2bd1e2f7)
Jul 3 05:20:42 utah-11011-saucy-server-i386 dhclient: DHCPOFFER of 10.98.2.36 from 10.98.2.1
Jul 3 05:20:42 utah-11011-saucy-server-i386 dhclient: DHCPREQUEST of 10.98.2.36 on eth0 to 255.255.255.255 port 67 (xid=0x2bd1e2f7)
Jul 3 05:20:42 utah-11011-saucy-server-i386 dhclient: DHCPOFFER of 10.98.2.36 from 10.98.2.1
Jul 3 05:20:42 utah-11011-saucy-server-i386 dhclient: DHCPACK of 10.98.2.36 from 10.98.2.1
Jul 3 05:20:42 utah-11011-saucy-server-i386 dhclient: execve (/sbin/dhclient-script, ...): Permission denied
Jul 3 05:20:42 utah-11011-saucy-server-i386 dhclient: bound to 10.98.2.36 -- renewal in 40307 seconds.
Jul 3 05:20:42 utah-11011-saucy-server-i386 kernel: [ 9.102657] type=1400 audit(1372843242.239:11): apparmor="DENIED" operation="file_perm" parent=508 profile="/sbin/dhclient" name="/bin/bash" pid=649 comm="dhclient" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Jul 3 05:20:42 utah-11011-saucy-server-i386 dhclient: DHCPACK of 10.98.2.36 from 10.98.2.1
Jul 3 05:20:42 utah-11011-saucy-server-i386 dhclient: execve (/sbin/dhclient-script, ...): Permission denied
Jul 3 05:20:42 utah-11011-saucy-server-i386 dhclient: bound to 10.98.2.36 -- renewal in 40307 seconds.
Jul 3 05:20:42 utah-11011-saucy-server-i386 kernel: [ 9.102657] type=1400 audit(1372843242.239:11): apparmor="DENIED" operation="file_perm" parent=508 profile="/sbin/dhclient" name="/bin/bash" pid=649 comm="dhclient" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Jul 3 05:20:42 utah-11011-saucy-server-i386 ntpdate[687]: Can't find host ntp.ubuntu.com: System error (-11)
Jul 3 05:20:42 utah-11011-saucy-server-i386 ntpdate[687]: no servers can be used, exiting
==============================================
How to reproduce:
1. Install utah using
   sudo apt-add-repository -y ppa:utah/stable
   sudo apt-get update
   sudo apt-get install utah

2. Now run the installation test using
sudo -u utah -i run_utah_tests.py -i /path/to/saucy-server-amd64.iso -p lp:ubuntu-test-cases/server/preseeds/mail-server.preseed lp:ubuntu-test-cases/server/runlists/mail-server.run -x /etc/utah/bridged-network-vm.xml

3. Now it could be seen that the the connection to the VMs from the host after the installation fails.

https://jenkins.qa.ubuntu.com/view/Saucy/view/Smoke%20Testing/job/saucy-server-i386-smoke-mail-server/58/ is one of the impacted jobs

Note: This issue is seen in our lab (precise host)and I have not yet tried reproducing in a local system..

Installer syslog and the boot log of a failing job are attached.

Related branches

Para Siva (psivaa) wrote :
Para Siva (psivaa) wrote :
Para Siva (psivaa) on 2013-07-03
summary: - ping request to saucy server VMs from the host fail from the host
+ Connection requests to saucy server VMs from a precise host fail after
+ fresh VM installs

Can you add the output of

uname -a
sudo aa-status

and attach the /etc/apparmor.d/cache/.features file

Jamie Strandboge (jdstrand) wrote :

I can't reproduce this. I have 3.10.0-2.9-generic, apparmor 2.8.0-0ubuntu19, and isc-dhcp-client 4.2.4-6ubuntu3. I tried in a VM by disabling networking in network-manager and adjusting /etc/network/interfaces to use 'iface eth0 inet dhcp' and using 'ifup eth0'. I also tried by using 'dhclient -sf /sbin/dhclient-script eth0'.

What is the contents of /etc/network/interfaces? Is /sbin/dhclient-script executable? Can you give the output of 'ls -R /etc/dhcp'?

Para Siva (psivaa) wrote :

The following were obtained from a VM that had an issue

1. uname -a
Linux utah-11060-saucy-server-i386 3.10.0-2-generic #9-Ubuntu SMP Mon Jul 1 18:36:54 UTC 2013 i686 i686 i686 GNU/Linux

2. sudo aa-status
apparmor module is loaded.
5 profiles are loaded.
5 profiles are in enforce mode.
   /sbin/dhclient
   /usr/lib/NetworkManager/nm-dhcp-client.action
   /usr/lib/connman/scripts/dhclient-script
   /usr/sbin/named
   /usr/sbin/tcpdump
0 profiles are in complain mode.
2 processes have profiles defined.
2 processes are in enforce mode.
   /sbin/dhclient (610)
   /usr/sbin/named (796)
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.
------
I am in the process of getting the other information from another VM since the above got destroyed by the automation tool. (When I did "sudo ifdown eth0 && sudo ifup eth0" for me to transfer the required files, the original IP address was restored to the VM and the test ran to completion and the VM got destroyed - Basically the comms between the host and the guest comes back when i do sudo ifdown eth0 and sudo ifup eth0, but not by default after the reboot)

Para Siva (psivaa) wrote :

The output of ls -R /etc/dhcp
------------------------------------------
/etc/dhcp:
dhclient.conf
dhclient-enter-hooks.d
dhclient-exit-hooks.d

/etc/dhcp/dhclient-enter-hooks.d:
debug
resolvconf

/etc/dhcp/dhclient-exit-hooks.d:
debug
ntpdate
rfc3442-classless-routes

/etc/apparmor.d/cache/.features file is attached.

Para Siva (psivaa) wrote :

The contents of /etc/network/interfaces
------------------------------------------------------------
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp
------------------------------
ls -la /sbin/dhclient-script
-rwxr-xr-x 1 root root 14484 Jun 10 16:50 /sbin/dhclient-script

Serge Hallyn (serge-hallyn) wrote :

Can you confirm whether you do NOT get the error about /sbin/dhclient-script execution being denied, when the VMS come up correctly?

Serge Hallyn (serge-hallyn) wrote :

(Not yet marking confirmed as it has not been reproduced anywhere but the one machine)

Changed in isc-dhcp (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Para Siva (psivaa) wrote :

Yes I can confirm that "dhclient: execve (/sbin/dhclient-script, ...): Permission denied" error occurs only on VM's which has this connectivity issues and not on the ones that come up correctly.

Please contrast
https://jenkins.qa.ubuntu.com/view/Saucy/view/Smoke%20Testing/job/saucy-server-amd64-smoke-mail-server/63/artifact/log/utah-11181-boot.log --> VM came up correctly and connection was fine
vs
https://jenkins.qa.ubuntu.com/view/Saucy/view/Smoke%20Testing/job/saucy-server-amd64-smoke-mail-server/62/artifact/log/utah-11164-boot.log --> VM came up with connectivity issue with the host

Changed in isc-dhcp (Ubuntu):
status: Incomplete → New
Para Siva (psivaa) wrote :

I have been able to reproduce this issue on a raring host as well on my local machine. Please find attached the log of that failure.

Note: Please note that the latter part of the logs may suggest the network is back to normal. This is because I manually did sudo ifdown eth0 and sudo ifup eth0 to transfer the file across to the host.

Jamie Strandboge (jdstrand) wrote :

Using 3.10.0-2.10-generic (the reported kernel is 3.10.0-2.9-generic though), I still cannot reproduce this. I updated /etc/network/interfaces to use the above and removed any hook files from my VM. The netowrking comes up fine. I modified run_hooks() in /sbin/dhclient-script to output the scripts it was trying to run to /tmp/foo, and saw it was running everything correctly with no denials, so I don't think this is a kernel issue in general with dealing with interpreted scripts via shebang (if it were, we should see the problem every time I think).

Jamie Strandboge (jdstrand) wrote :

Attached is a small C program using the same type of code that dhclient.c uses demonstrating that the kernel should not be at fault.
$ cd /tmp
$ tar -zxvf ./bug-lp1197484.tar.gz
bug-lp1197484/
bug-lp1197484/lp1197484.c
bug-lp1197484/lp1197484.profile
bug-lp1197484/lp1197484.sh
bug-lp1197484/run.sh
$ cd ./bug-lp1197484
$ sh ./run.sh
[sudo] password for jamie:
argv[0]: ./lp1197484
argv[1]: lp1197484.sh
child: 7402
parent exiting
here

At this point, there are no denied messages and lp1197484.sh succeeds.

Jamie Strandboge (jdstrand) wrote :

Are you using a special dhnsmasq/dhcpd configuration? Do you have a reliable reproducer?

Jamie Strandboge (jdstrand) wrote :

You mentioned this is via libvirt/kvm-- is that via openstack or straight libvirt?

Changed in isc-dhcp (Ubuntu):
status: New → Incomplete
Jamie Strandboge (jdstrand) wrote :

Also, this is only during installation, not after install?

Para Siva (psivaa) wrote :

Thank you for the comments.
 The failure occurred with 3.10.0-2.10-generic as well for me. Please see the log that's attached in comment #11, and that occurred on my local raring machine. I can reproduce this issue on my local machine once every 4-5 attempts. As given in the description, the issue occurs more often when there are more VMs running. Therefore the frequency is very high in our server which is a precise 12.04.01 machine using 3.2.0-40-generic.

I am not sure if we have any special dnsmasq configuration in the lab. I'll check and confirm it with our systems engineers. But on my local machine (raring amd64 desktop) there is no specific configurations.

Since the issue is more often when there are more VMs installations in one host, it would be easier to reproduce by running 'sudo -u utah -i run_utah_tests.py -i /path/to/saucy-server-amd64.iso -p lp:ubuntu-test-cases/server/preseeds/mail-server.preseed lp:ubuntu-test-cases/server/runlists/mail-server.run -x /etc/utah/bridged-network-vm.xml' in parallel at the same time from a couple of terminal windows.

libvirt-bin that we are using is 0.9.8-2ubuntu17.8

The issue is not occurring during the install but on the first reboot after the installation.

Stéphane Graber (stgraber) wrote :

Can you also install and run "debsums -sa" on the affected system?

I recognized the name "aldebaran" as the one that gave us some interesting random data corruption a while back (without any obvious entry in dmesg which made things confusing), so I'd like to verify the VM is in perfect condition before digging any further.

Jamie Strandboge (jdstrand) wrote :

I'm still unable to reproduce this. I have tried with a precise host using bridged networking with 3 other VMs running (pretty much the limit I can do here) and rebooted the saucy VM with the specified networking configuration 10 times without incident.

Jamie Strandboge (jdstrand) wrote :

I'll try with utah next. What are the contents of /etc/utah/bridged-network-vm.xml?

Para Siva (psivaa) wrote :

The file, /etc/utah/bridged-network-vm.xml is attached. That will anyway copied over when utah is installed. Could you please use any other host than saucy for this,to avoid hitting a possible bug 1199349 as Serge pointed out earlier and that I was also able to reproduce. Thanks

Para Siva (psivaa) wrote :

I am still able to reproduce this *outside aldebaran. Marking the bug as New then. thanks

Changed in isc-dhcp (Ubuntu):
status: Incomplete → New
Para Siva (psivaa) on 2013-07-10
summary: - Connection requests to saucy server VMs from a precise host fail after
- fresh VM installs
+ Connection requests to saucy server VMs from a hosts fail after fresh VM
+ installs
Jamie Strandboge (jdstrand) wrote :

Still not able to produce this. I saw it once on what I thought was a release renewal, but I adjusted the dnsmasq lease time to 30 seconds, rebooted and after several renewals could still not reproduce.

Jamie Strandboge (jdstrand) wrote :

I'm quite convinced this is not an apparmor problem with interpreted scripts. I created another test program that used wstatus and looped it for many minutes and had no errors.

I took a look at the isc-dhcp source and while I can't be sure since I can't reproduce this here, I think the bit of code that is triggering this is bind_lease() from client/dhclient.c, specifically line 1292 that does:
1289 /* If the BOUND/RENEW code detects another machine using the
1290 offered address, it exits nonzero. We need to send a
1291 DHCPDECLINE and toss the lease. */
1292 if (script_go (client) == 2) {

script_go has:
3403 pid = fork ();
3404 if (pid < 0) {
...
3407 } else if (pid) {
...
3415 } else {
3416 /* We don't want to pass an open file descriptor for
3417 * dhclient.leases when executing dhclient-script.
3418 */
3419 if (leaseFile != NULL)
3420 fclose(leaseFile);
3421 execve (scriptName, argv, envp);
3422 log_error ("execve (%s, ...): %m", scriptName);

I still don't know what the problem, but I'm ok with adjusting the access in the dhclient profile to allow reading of /bin/bash. I would not be surprised if this didn't solve the connectivity issue.

Changed in isc-dhcp (Ubuntu):
assignee: nobody → Jamie Strandboge (jdstrand)
status: New → In Progress
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package isc-dhcp - 4.2.4-7ubuntu3

---------------
isc-dhcp (4.2.4-7ubuntu3) saucy; urgency=low

  * debian/apparmor-profile.dhclient: allow read of /bin/bash (LP: #1197484)
 -- Jamie Strandboge <email address hidden> Tue, 16 Jul 2013 15:13:49 -0500

Changed in isc-dhcp (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers