12.04 hangs on shutdown deconfiguring network interfaces

Bug #1010045 reported by Frederic on 2012-06-07
76
This bug affects 14 people
Affects Status Importance Assigned to Milestone
ifupdown (Ubuntu)
Undecided
Unassigned

Bug Description

When shutting down the 12.04 guest, it hangs on "Deconfiguring network interfaces...", when there are multiple networkcards.

The system is going down for halt NOW!
 * Stopping web server apache2 [ OK ] ... waiting
 * Stopping Bacula File daemon... [ OK ]
Checking for running unattended-upgrades:
 * Running nssldap-update-ignoreusers... [ OK ]
 * Stopping Name Service Cache Daemon nscd [ OK ]
 * Stopping Postfix Mail Transport Agent postfix [ OK ]
 * Stopping ftp server proftpd [ OK ]
 * Asking all remaining processes to terminate... [ OK ]
 * All processes ended within 2 seconds.... [ OK ]
rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
 * Deconfiguring network interfaces...

<hang>

We only have this issue when a vm has multiple network cards assigned.
When we do a manual "/etc/init.d/network stop; halt", the vm shuts down like expected.

dpkg -l | grep ifupdown
ii ifupdown 0.7~beta2ubuntu8 high level tools to configure network interfaces

Frederic (frederic-itaf) wrote :

stty: standard input: unable to perform all requested operations
* Stopping web server apache2 [ OK ] ... waiting
* Stopping Bacula File daemon... [ OK ]
Checking for running unattended-upgrades:
* Running nssldap-update-ignoreusers... [ OK ]
* Stopping Name Service Cache Daemon nscd [ OK ]
* Stopping PHP5 FastCGI Process Manager php5-fpm [ OK ]
* Stopping Postfix Mail Transport Agent postfix [ OK ]
* Stopping ftp server proftpd [ OK ]
* Asking all remaining processes to terminate... [ OK ]
* All processes ended within 2 seconds.... [ OK ]
* Saving random seed... [ OK ]
rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
+ PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
+ [ -x /sbin/ifup ]
+ . /lib/lsb/init-functions
+ FANCYTTY=
+ [ -e /etc/lsb-base-logging.sh ]
+ . /etc/lsb-base-logging.sh
+ LOG_DAEMON_MSG=
+ check_network_file_systems
+ [ -e /proc/mounts ]
+ [ -e /etc/iscsi/iscsi.initramfs ]
+ exec
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ exec
+ check_network_swap
+ [ -e /proc/swaps ]
+ exec
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ read DEV MTPT FSTYPE REST
+ exec
+ initctl emit deconfiguring-networking
+ log_action_begin_msg Deconfiguring network interfaces
+ log_daemon_msg Deconfiguring network interfaces...
+ [ -z Deconfiguring network interfaces... ]
+ log_use_fancy_output
+ TPUT=/usr/bin/tput
+ EXPR=/usr/bin/expr
+ [ -t 1 ]
+ [ xlinux != x ]
+ [ xlinux != xdumb ]
+ [ -x /usr/bin/tput ]
+ [ -x /usr/bin/expr ]
+ /usr/bin/tput hpa 60
+ /usr/bin/tput setaf 1
+ [ -z ]
+ FANCYTTY=1
+ true
+ /usr/bin/tput xenl
+ /usr/bin/tput cols
+ COLS=80
+ [ 80 ]
+ [ 80 -gt 6 ]
+ /usr/bin/expr 80 - 7
+ COL=73
+ log_use_plymouth
+ [ n = y ]
+ plymouth --ping
+ printf * Deconfiguring network interfaces...
* Deconfiguring network interfaces... + /usr/bin/expr 80 - 1
+ /usr/bin/tput hpa 79
+ printf
+ [ yes != no ]
+ ifdown -a --exclude=lo

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ifupdown (Ubuntu):
status: New → Confirmed
Stéphane Graber (stgraber) wrote :

Can you attach a tarball of /var/log/upstart?

Is that machine relying on nfs mounts? I saw a mention of rpcbind in the log before.

Changed in ifupdown (Ubuntu):
status: Confirmed → Incomplete
Frederic (frederic-itaf) wrote :

Hello Stephane,

Thanks for the feedback. I attached the tar of /var/log/upstart.
The vm from where i created the tar, doesn't use NFS nor other network attached shares.

Best Regards,
Frederic

Luis Roalter (roalter) wrote :

Same for my system. If i start /etc/init.d/networking stop or ifdown -a directly, it is working...

Steve Langasek (vorlon) wrote :

You might want to attach the output of:

dpkg -S /etc/network/if-down.d/*

Frederic (frederic-itaf) wrote :

#dpkg -S /etc/network/if-down.d/*

postfix: /etc/network/if-down.d/postfix
resolvconf: /etc/network/if-down.d/resolvconf
ifupdown: /etc/network/if-down.d/upstart

Stéphane Graber (stgraber) wrote :

It seems unlikely that upstart or resolvconf would be the problem as these are on all systems and I've never seen this issue myself.
Maybe the postfix hook is doing something weird...

Can you perhaps modify /etc/init.d/networking to call ifdown with the "-v" parameter that might show where it's getting stuck during the ifdown.

Gildas (gildas-bayard) wrote :

Hello
For me, dpkg -S /etc/network/if-down.d/* returns:
resolvconf: /etc/network/if-down.d/resolvconf
ifupdown: /etc/network/if-down.d/upstart
wpasupplicant: /etc/network/if-down.d/wpasupplicant

It's a fresh 12.04.1 install on qemu-kvm with 2 network interfaces

Hi,
I have the same problem with one KVM guest-system using an iscsi target.
The problem cannot be solved with unmounting the iSCSI target manual, before shutdown.

#dpkg -S /etc/network/if-down.d/*
open-iscsi: /etc/network/if-down.d/open-iscsi
resolvconf: /etc/network/if-down.d/resolvconf
ifupdown: /etc/network/if-down.d/upstart

The system is configured with 3 bridge interfaces.

#/etc/init.d/networking -v

*Deconfiguring network interfaces...
Configuring interface eth0=eth0 (inet)
run-parts --verbose /etc/network/if-down.d
run-parts: executing /etd/network/if-down.d/open-iscsi
* Disconnecting iSCSI targets
[ 1232.432014] connecting7:0: ping timeout of 5 secs expired
.....
and so on ...

Gari (garikolc-x) wrote :

Hi,

I have the same Bug but I have no SCSI elements and it hangs when I run "/etc/intit.d/networking stop" or restart.
I work with a resolution of 1920x1280 and when it hangs, is quite "curious", all my windows resizes to 1024 (or something like that), unity and app menus dissapear, terminal windows los their focus, some windows can take the focus and you can interact with then but not with the menus.
If I switch off the rj45 cable, then I can't turn the network On until I reboot or shutdown the whole system.

My pc is a HP ProBook 6555b

# df -h
S.ficheros Tamaño Usado Disp Uso% Montado en
/dev/sda6 123G 61G 56G 53% /
udev 869M 4,0K 869M 1% /dev
tmpfs 351M 1,3M 349M 1% /run
none 5,0M 0 5,0M 0% /run/lock
none 876M 484K 875M 1% /run/shm

# dpkg -S /etc/network/if-down.d/*
avahi-autoipd: /etc/network/if-down.d/avahi-autoipd
resolvconf: /etc/network/if-down.d/resolvconf
ifupdown: /etc/network/if-down.d/upstart
wpasupplicant: /etc/network/if-down.d/wpasupplicant

Gari (garikolc-x) wrote :

upstart.tgz

Tais Plougmann Hansen (taisph) wrote :

I have this problem as well.

Tried adding -v to ifdown in /etc/init.d/networking but it just hangs with:

[...]
+ ifdown -v -a --exclude=lo

Running /etc/init.d/networking stop before rebooting works though.

Hi,

this problem could be reproduced although on a physical machine with 12.04.1 LTS and Kernel 3.2.0 ....

At the attached screenshot you ca seen the kernel panic message when stopping the open-iscsi deamon.
It's the same for a reboot or a networking stop/restart.

After installation of the Kernel 3.4 from
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-precise/linux-image-3.4.0-030400-generic_3.4.0-030400.201205210521_amd64.deb
the system has no problem with rebooting anymore.
This has been tested in a KVM-Environment and a physical machine.

I hope this c,ould be a solution for some installations.
In some cases (production) you should wait for the official 3.4-Kernel-Support in Ubuntu 12.04

Tais Plougmann Hansen (taisph) wrote :

Resolvconf seems to be causing the "hang" in at least one of my testcases. It seems to be attempting to speak with our ldap authentication servers. Strace shows it is trying to lookup the hostnames of the auth servers and the connect to them multiple times for every interface, making the shutdown process take a very long time.

26880 ? S 0:00 /bin/sh -ex /etc/rc6.d/S02networking stop
26891 ? S 0:00 ifdown -v -a --exclude=lo
26939 ? S 0:00 /bin/sh -c run-parts --verbose /etc/network/if-down.d
26940 ? S 0:00 run-parts --verbose /etc/network/if-down.d
26953 ? S 0:00 /bin/sh /etc/network/if-down.d/resolvconf
26954 ? S 0:00 /bin/sh /sbin/resolvconf -d brvlan1000.inet

# strace -p 26954
Process 26954 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.31.17.6")}, 16) = 0
poll([{fd=5, events=POLLOUT}], 1, 0) = 1 ([{fd=5, revents=POLLOUT}])
sendto(5, "\206\33\1\0\0\1\0\0\0\0\0\0\5auth2\6cncore\3lan\6cn"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=5, events=POLLIN}], 1, 5000) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "\206\33\1\0\0\1\0\0\0\0\0\0\5auth2\6cncore\3lan\6cn"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=4, events=POLLIN}], 1, 5000) = 0 (Timeout)
poll([{fd=5, events=POLLOUT}], 1, 0) = 1 ([{fd=5, revents=POLLOUT}])
sendto(5, "\206\33\1\0\0\1\0\0\0\0\0\0\5auth2\6cncore\3lan\6cn"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=5, events=POLLIN}], 1, 5000) = 0 (Timeout)
close(4) = 0
close(5) = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=215, ...}) = 0
[...]

This bug seems to depend on the kernel version being used.
In my case it seems to happen with kernel versions >3.2 .0

Stéphane Graber (stgraber) wrote :

Right, so we appear to have a few things going on here:
 - People using networked auth in blocking mode
 - People using an iscsi root that for some reason isn't brought up correctly and that ifupdown tries to kill

Those two are likely configuration mistakes as you should never configure a network nss plugin to be blocking, otherwise your boot and shutdown sequence may randomly hang. Instead, there are flags to make the calls immediately return when there's no route available.

Anyone is having this bug and isn't part of one of the scenario above?
If so, please send all the log files asked before and try to explain your setup as best you can.

Tais Plougmann Hansen (taisph) wrote :

@stgraber
Networked auth is not configured in blocking mode. ldap is even setup as a secondary lookup - not primary. I don't know why resolvconf insists on talking to the ldap servers.

Stéphane Graber (stgraber) wrote :

It doesn't, it's just doing a regular nss query.

Try the following:
 - ifdown eth0 (or any other network device you use to connect to your network)
 - getent passwd abcd

If the getent passwd hangs for more than a second, your ldap client is misconfigured.

Tais Plougmann Hansen (taisph) wrote :

@stgraber
That's oversimplifying things a bit. :)

As the OP, our servers have several interfaces which gives a server access to various subnets. One of these holds the ldap auth servers. If this interface is downed first, resolvconf will hang for ldap.conf TIMEOUT time for every other interface downed. When they're all down (or any default or matching subnet routes), resolvconf/getent passwd returns immediately.

So... In my case, reducing the /etc/ldap.conf /etc/ldap/ldap.conf timeouts or adding a blackhole route with a high metric value matching the ldap subnets would work around this.

As such it can't be categorized as a bug. When I initially chimed in, the problem was masking the infamous scsi kernel softlockup bug.

I'm still puzzled over the nss query resolvconf is triggering. I cannot reproduce it with getent password or similar lookups. For whatever reason resolvconf triggers a query looking for a posixaccount with uid=\2a (uid="*"), which returns 0 results.

Launchpad Janitor (janitor) wrote :

[Expired for ifupdown (Ubuntu) because there has been no activity for 60 days.]

Changed in ifupdown (Ubuntu):
status: Incomplete → Expired
Steffen Sledz (sledz) wrote :

We hit this problem too. After some investigations i could find at least one cause (it seems to be bind9).

Running the final "ifdown -a --exclude=lo" in /etc/init.d/networking a bit more verbose showed that the call of

  /etc/network/if-down.d/bind9

does not return. The script just calls "rndc reconfig". The problem is that the named is not running at this moment any longer because it was stopped before.

If i call "rndc -V reconfig" without a running named i see this

-----------> snip <------------
create memory context
create socket manager
create task manager
create task
create logging context
setting log tag
creating log channel
enabling log channel
create parser
get key
decode base64 secret
reconfig
post event
using server 127.0.0.1 (127.0.0.1#953)
create socket
bind socket
connect
-----------> snip <------------

So it seems that the connect does not return.

Steffen Sledz (sledz) on 2013-04-11
Changed in ifupdown (Ubuntu):
status: Expired → Incomplete
Tione (chobian) wrote :

Hello
Maybe this will help someone
I solved the problem myself in this way:
I added at the end of /etc/acpi/powerbtn.sh
the command /etc/init.d/networking stop

so it looks like:
# If all else failed, just initiate a plain shutdown.
/etc/init.d/networking stop "Network shutdown"
/sbin/shutdown-h now "Power button pressed"

dseira (davidseira) wrote :

Same problem with a ubuntu 12.04.02 on a VM under VMWare ESXi 5.1 with several network interfaces. I also have an LDAP into the network. I've tried by changing the timeout to the ldap but with the same problem.

If I execute the service networking stop; halt it works correctly; the problem is rebooting.

David Martin (dmartina) wrote :

We have been preparing this machine for about a year. It is supposed to work as a a firewall and web proxy connected to about 10 vlans, with DNS server, LAMP, Squid, etc. It's a 64 bit Ubuntu Server perfectly up-to-date.

One week ago we decided it was finished and we moved it into production use. To this point everything was OK. During the last week we configured samba + swat - libpam-smb + cups + hplip and there was a kernel update: on Monday we noticed the hang at the "Deconfiguring network interfaces" but yesterday we saw that it was not a real hang because the system powered off in about tree-four minutes (it used to be about 20-30 seconds).

I have just removed cups + hplip with no success. My next plan is to remove the samba set (I was suspicious of the problems between swat and libpam-smb) and maybe go back to previous kernel as this was the whole story of the last week.

Here, above, I saw some relation between the length of the delay, the point where it happens and the number of interfaces, so I'm writing about it. I may try to add some verbosity, but I have very little clues...

Steffen Sledz (sledz) wrote :

As i wrote in #22 one cause is definitely the DNS server package bind9.

Is there any progress in fixing this?

David Martin (dmartina) wrote :

We have "dns-nameservers 127.0.0.1" set in /etc/network/interfaces under the "eth1" entry. Do you think that moving it to "lo" might help?

Yesterday I tried to stop the network before shutdown and everything went fine and fast, and that makes me feel better. In the worst case we may fix the shutdown scripts (crond) with little danger.

David Martin (dmartina) wrote :

:-)) Seems to be fixed. It was an "old friend" of mine...

I removed much software installed during the last week. But only after this one I got the shutdown in time. Boot process seems to be faster now and a lot of OpenSSH start/stop messages are gone:

  sudo apt-get purge avahi-daemon libnss-mdns

It's about Zeroconf and APIPA IP numbering, but I don't think it is necessary for desktop servers.

So, I'll try to reinstall things and get my system back to production. On the other hand, I still don't understand why this didn't bother us a couple of weeks ago...

Regards
David

David Martin (dmartina) wrote :

I'm now thinking of going back into this issue. Definitely the problem appeared when installing cups as it requires avahi+mdns packages. This sounds quite fair if cups is to autodetect printers in the LAN.

I've been reading more about mdns. I remember that previously I removed it in certain host to avoid delays in SSH connections and now I have been reading again about it at bug #94940 https://bugs.launchpad.net/ubuntu/+source/nss-mdns/+bug/94940. If the delay is caused by mdns it may be amplified when having an interface configured as multiple VLAN sub-interfaces. Cheating nsswitch.conf sounds easier than completely removing avahi+mdns...

So, I would like to know if the rest of the people have avahi+mdns installed and what does their host entry in nsswitch.conf look like. I'm quite confident that keeping "hosts: files dns" as it is now (with avahi+mdns uninstalled) may help in my case.

We also have a MS AD with a .local private domain but I doubt this may interfere as both AD servers switch off every day a couple of minutes before our Ubuntu host does. ¿Do you have .local AD in your LANs?

David

Launchpad Janitor (janitor) wrote :

[Expired for ifupdown (Ubuntu) because there has been no activity for 60 days.]

Changed in ifupdown (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers