systemctl restart networking hangs reloading ssh.service

Bug #1584393 reported by Gustavo Lopes on 2016-05-21
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
openssh (Debian)
Fix Released
Unknown
openssh (Ubuntu)
Low
Martin Pitt
Xenial
Low
Unassigned

Bug Description

Issues "systemctl restart networking" never exits. It hangs during a call to "systemctl reload ssh.service":

● networking.service - Raise network interfaces
   Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/networking.service.d
           └─50-insserv.conf-$network.conf
   Active: activating (start) since Sat 2016-05-21 21:41:58 UTC; 18s ago
     Docs: man:interfaces(5)
  Process: 1288 ExecStop=/sbin/ifdown -a --read-environment (code=exited, status=0/SUCCESS)
  Process: 1376 ExecStartPre=/bin/sh -c [ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] &
 Main PID: 1383 (ifup)
    Tasks: 7 (limit: 512)
   Memory: 1.8M
      CPU: 111ms
   CGroup: /system.slice/networking.service
           ├─1383 /sbin/ifup -a --read-environment
           ├─1479 /sbin/dhclient -1 -v -pf /run/dhclient.enp0s3.pid -lf /var/lib/dhcp/dhclient.enp0s3.leases -I -df /var/lib/dhcp/dhclient6
           ├─1480 /bin/sh -c /bin/run-parts --exit-on-error /etc/network/if-up.d
           ├─1481 /bin/run-parts --exit-on-error /etc/network/if-up.d
           ├─1504 /bin/sh /etc/network/if-up.d/openssh-server
           ├─1507 /bin/sh /usr/sbin/invoke-rc.d ssh reload
           └─1527 systemctl reload ssh.service

Then issuing the same command from another terminal causes the first to exit successfully (the second also exists successfully).

On the official vagrant image, I get this problem independently of whether the command is issued through and ssh session. I've gotten the same behavior on the official xenial AMIs.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: ifupdown 0.8.10ubuntu1
ProcVersionSignature: Ubuntu 4.4.0-22.40-generic 4.4.8
Uname: Linux 4.4.0-22-generic x86_64
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
Date: Sat May 21 21:35:08 2016
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: ifupdown
UpgradeStatus: No upgrade log present (probably fresh install)

CVE References

Gustavo Lopes (artefacto) wrote :
Gustavo Lopes (artefacto) wrote :

When networking is hung, stopping ssh.service will also resolve the hang, but reloading or restating ssh.service will hang as well. There are no problems restarting or reloading ssh.service in other circumstances.

Dan Watkins (daniel-thewatkins) wrote :

I've reproduced this on GCE also. Here's what ssh.service looks like during the hang (and it looks like the reload hasn't affected it):

$ sudo systemctl status ssh.service
● ssh.service - OpenBSD Secure Shell server
   Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2016-06-23 08:32:11 UTC; 6min ago
  Process: 2702 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
 Main PID: 1643 (sshd)
    Tasks: 1
   Memory: 3.8M
      CPU: 108ms
   CGroup: /system.slice/ssh.service
           └─1643 /usr/sbin/sshd -D

Jun 23 08:32:11 xenial-160623-0931 systemd[1]: Started OpenBSD Secure Shell server.
Jun 23 08:32:17 xenial-160623-0931 sshd[1747]: Accepted publickey for ubuntu from 86.179.225.58 port 46404 ssh2: RSA SHA256:HRlolnELGM4aMQXejVDjc0kWktid481/BsLIHtc7Tv0
Jun 23 08:32:17 xenial-160623-0931 sshd[1747]: pam_unix(sshd:session): session opened for user ubuntu by (uid=0)
Jun 23 08:37:17 xenial-160623-0931 systemd[1]: Reloading OpenBSD Secure Shell server.
Jun 23 08:37:17 xenial-160623-0931 sshd[1643]: Received SIGHUP; restarting.
Jun 23 08:37:17 xenial-160623-0931 systemd[1]: Reloaded OpenBSD Secure Shell server.
Jun 23 08:37:17 xenial-160623-0931 sshd[1643]: Server listening on 0.0.0.0 port 22.
Jun 23 08:37:17 xenial-160623-0931 sshd[1643]: Server listening on :: port 22.
Jun 23 08:37:22 xenial-160623-0931 sshd[3134]: Accepted publickey for ubuntu from 86.179.225.58 port 46520 ssh2: RSA SHA256:HRlolnELGM4aMQXejVDjc0kWktid481/BsLIHtc7Tv0
Jun 23 08:37:22 xenial-160623-0931 sshd[3134]: pam_unix(sshd:session): session opened for user ubuntu by (uid=0)

Changed in ifupdown (Ubuntu):
status: New → Confirmed
Martin Pitt (pitti) wrote :

This indeed looks like a deadlock. It could be broken by openssh's if-up.d script if it reloads ssh asynchronously instead of blocking on it. It would be even better to finally get rid of this silly hack and make openssh use IP_FREEBIND properly :-)

I'm very reluctant to make invoke-rc.d reload itself async. This is too regression prone, as a scripts might depend on the reload actually being done after the call.

Also, low priority. Restarting "networking" has never worked under upstart at all, and it mostly works now (can't reproduce this on several machines). So this is mostly a case of "don't do that then"..

affects: ifupdown (Ubuntu) → openssh (Ubuntu)
Changed in openssh (Ubuntu):
importance: Undecided → Low
status: Confirmed → Triaged
Martin Pitt (pitti) wrote :

I still cannot reproduce this. In a xenial cloud instance I ran

   for i in `seq 50`; do systemctl reset-failed networking; systemctl restart networking;
done

successfully, this is with a standard /etc/network/interfaces.d/50-cloud-init.cfg with just "auto ens3" and "iface ens3 inet dhcp" and no other interface (besides lo).

We can fix this case more centrally in invoke-rc.d though; this is only safe for "reload" and "force-reload", but that will cover this networking/openssh deadlock.

If you can reproduce this, please apply this patch to invoke-rc.d:

   sudo patch /usr/sbin/invoke-rc.d /tmp/invoke-rc.d.patch

and confirm that this fixes it. Thanks!

Changed in openssh (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
status: Triaged → Incomplete
tags: added: patch

My initial testing on this looks good - after applying the patch I wasn't able to trigger the timeout that we were seeing before.

Martin Pitt (pitti) wrote :

I discussed that patch with Michael Biebl -- we need to ensure that this does not break things like openvpn.service → openvpn@.service, i. e. that --job-mode=ignore-dependencies still keeps ReloadPropagatedTo=. If not, we need something else.

Changed in openssh (Ubuntu):
status: Incomplete → In Progress
Martin Pitt (pitti) wrote :

> --job-mode=ignore-dependencies still keeps ReloadPropagatedTo=. If not, we need something else.

It doesn't, and indeed that would make --job-mode=ignore-dependencies a bit pointless. So we would apply this generally, we would break e. g. some postinst script that does "invoke-rc.d openvpn reload", and the reload of openvpn.service should then be propagated to all openvpn@*.service instances.

So let's be more cautious here and limit this to openssh. Please revert the patch to invoke-rc.d (sudo apt-get install --reinstall init-system-helpers), and run

   sudo patch /etc/network/if-up.d/openssh-server /tmp/openssh-server.patch

instead, and verify that this helps? Thanks!

Martin Pitt (pitti) on 2016-07-15
Changed in openssh (Ubuntu Xenial):
status: New → Confirmed

This looks good as well - reverted to an original (unpatched) image and verified the timeout was present.. Applied this patch and I haven't been able to reproduce the timeout.

Martin Pitt (pitti) wrote :

I forwarded the patch to Debian, as we currently keep the package in sync and Colin wants to keep it that way. I'll SRU this to xenial once it lands in yakkety.

Changed in openssh (Debian):
status: Unknown → New
Changed in openssh (Debian):
status: New → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssh - 1:7.2p2-8

---------------
openssh (1:7.2p2-8) unstable; urgency=medium

  [ Colin Watson ]
  * Stop enabling ssh-session-cleanup.service by default; instead, ship it
    as an example and add a section to README.Debian. libpam-systemd >= 230
    and "UsePAM yes" should take care of the original problem for most
    systemd users (thanks, Michael Biebl; closes: #832155).

  [ Martin Pitt ]
  * Add debian/agent-launch: Helper script for conditionally starting the SSH
    agent in the user session. Use it in ssh-agent.user-session.upstart.
  * Add systemd user unit for graphical sessions that use systemd. Override
    the corresponding upstart job in that case (closes: #832445).
  * debian/openssh-server.if-up: Don't block on a finished reload of
    openssh.service, to avoid deadlocking with restarting networking.
    (closes: #832557, LP: #1584393)

 -- Colin Watson <email address hidden> Fri, 29 Jul 2016 02:51:32 +0100

Changed in openssh (Ubuntu):
status: In Progress → Fix Released
Martin Pitt (pitti) wrote :

I uploaded a xenial fix to the SRU review queue.

Changed in openssh (Ubuntu Xenial):
status: Confirmed → In Progress

Hello Gustavo, or anyone else affected,

Accepted openssh into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/openssh/1:7.2p2-4ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in openssh (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in openssh (Ubuntu Xenial):
importance: Undecided → Low

Hi Brian,

I have tested the updated packages (1:7.2p2-4ubuntu2) and can confirm that they resolve the issue for us. Tags updated.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssh - 1:7.2p2-4ubuntu2.1

---------------
openssh (1:7.2p2-4ubuntu2.1) xenial-security; urgency=medium

  * SECURITY UPDATE: user enumeration via covert timing channel
    - debian/patches/CVE-2016-6210-1.patch: determine appropriate salt for
      invalid users in auth-passwd.c, openbsd-compat/xcrypt.c.
    - debian/patches/CVE-2016-6210-2.patch: mitigate timing of disallowed
      users PAM logins in auth-pam.c.
    - debian/patches/CVE-2016-6210-3.patch: search users for one with a
      valid salt in openbsd-compat/xcrypt.c.
    - CVE-2016-6210
  * SECURITY UPDATE: denial of service via long passwords
    - debian/patches/CVE-2016-6515.patch: skip passwords longer than 1k in
      length in auth-passwd.c.
    - CVE-2016-6515

 -- Marc Deslauriers <email address hidden> Thu, 11 Aug 2016 08:38:27 -0400

Changed in openssh (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.