sshd stop on two SIGHUP

Bug #497781 reported by PierreF on 2009-12-17
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
portable OpenSSH
Fix Released
Unknown
openssh (Ubuntu)
Low
Colin Watson

Bug Description

When you send two SIGHUP to sshd (to reload it configuration), sshd simply die.

How to reproduce:

1)start an sshd server

if you do killall -s SIGHUP sshd. The server restart successfully.

2) To kill sshd with SIGHUP, run "killall -s SIGHUP sshd & killall -s SIGHUP sshd"
i.e. run two killall -s SIGHUP sshd at the same time.

3) # ps waux | grep sshd
root 19265 0.0 0.0 3352 820 pts/10 S+ 15:42 0:00 grep sshd

No sshd running :(

I think it's because the second SIGHUP happend before sshd finished his startup.
Their is not problem running several killall -s SIGHUP sshd we a little delay (like the time to press up arrow and enter). But two SIGHUP at the same time cause sshd to die nearly everytime.

This is an issues because afaik SIGHUP is sent by networking script (ifup ?) when an interface have an IP. On a server with 3 static IP, the sshd process get killed this way. No sshd, not possible to do an ssh on that server. Rebooted the server and everything get fine.

PierreF (pierre-fersing) wrote :

Forget to tell the version affected:

This was tester on jaunty (with the killall -s SIGHUP).
The server is on hardy.

So hardy and jaunty are affected.

Dave Walker (dogatemycomputer) wrote :

Thanks for reporting this bug and any supporting documentation. Since this bug has enough information provided for a developer to begin work, I'm going to mark it as confirmed and let them handle it from here. Thanks for taking the time to make Ubuntu better!

Changed in openssh (Ubuntu):
status: New → Confirmed
Colin Watson (cjwatson) wrote :

Thanks. Although I haven't managed to reproduce this myself, I've sent my best guess at a patch upstream.

Note that this should become less important in Lucid, as sshd is now supervised by Upstart which should ensure that it keeps running.

Changed in openssh (Ubuntu):
assignee: nobody → Colin Watson (cjwatson)
importance: Undecided → Low
status: Confirmed → Triaged
PierreF (pierre-fersing) wrote :

Thanks for the patch, with this patch I can't reproduce the bug.

I toke the patch, applied it on fresh apt-get source. builded it with a pbuilder, installed the result. With that I was unable to reproduce the problem.
Got back to karmic release of openssh and I can reproduce the bug.

So the patch effectively resolved my issue.

Created an attachment (id=1766)
ignore SIGHUP across sshd re-exec window

In Ubuntu bug #497781, "PierreF" reported that it's sometimes possible to end up with no sshd running if you send it two SIGHUP signals in quick succession, which can sometimes happen due to configuration of networking scripts. Although I haven't been able to reproduce this myself so far, I think this is because SIGHUP is reset to the default action by execve(), and sshd's handler isn't reinstalled until shortly after the exec, so there's a window when it's simply set to the default action of terminating the process.

If this hypothesis is correct, which I think is likely, then the attached patch should fix it by ignoring SIGHUP across the exec window.

Seems like a reasonable hypothesis, but I don't see the patch making any difference.

The execv will result in an entirely new process address space (including address layout randomization on platforms that have it) and the disposition of the old process' signal handlers will be irrelevant. You'd still have a window until the signal handler is reinstalled where the default action of SIGHUP would kill sshd.

You could minimize this window by moving the "signal(SIGCHLD, main_sigchld_handler)" to the start of main(). This wouldn't eliminate the window but it would shrink it a lot (particularly because the generation of the protocol 1 ephemeral server key would no longer be in the window).

That isn't how execve() works, though. Signal dispositions are inherited, not reset by the action of loading the new process image. Lots of things wouldn't work if things were the way you posit - for example, nohup(1) would be entirely non-functional.

Here's a transcript of a test demonstrating that ignoring SIGHUP before execve() is effective. Does this convince you?

$ cat t.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
typedef void (*sighandler_t)(int);
extern char **environ;
int main(int argc, char **argv) {
    if (getenv("SECOND_TIME")) {
        sighandler_t prev = signal(SIGHUP, SIG_DFL);
        if (prev == SIG_IGN) {
            printf("SIGHUP was SIG_IGN\n");
        } else {
            printf("SIGHUP was not SIG_IGN\n");
        }
        exit(0);
    } else {
        setenv("SECOND_TIME", "1", 1);
        if (getenv("IGNORE_SIGHUP"))
            signal(SIGHUP, SIG_IGN);
        execve(argv[0], argv, environ);
    }
}
$ make t
cc t.c -o t
$ ./t
SIGHUP was not SIG_IGN
$ IGNORE_SIGHUP=1 ./t
SIGHUP was SIG_IGN

BTW, I agree that ASLR makes a difference when the signal handler is a function, but in this case that is not so. SIG_IGN is (typically; certainly on Linux, I imagine on other Unixes too) just a constant and not affected by ASLR.

I notice that the original reporter of https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/497781 confirmed in a comment on that bug that my patch appears to fix the problem for him.

Fair enough, I've never actually thought about how nohup works :-)

Adding to the list for 5.4.

Patch applied, thanks. It will be in the 5.4 release.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssh - 1:5.2p1-2ubuntu1

---------------
openssh (1:5.2p1-2ubuntu1) lucid; urgency=low

  * Resynchronise with Debian. Remaining changes:
    - Add support for registering ConsoleKit sessions on login.
    - Drop openssh-blacklist and openssh-blacklist-extra to Suggests; they
      take up a lot of CD space, and I suspect that rolling them out in
      security updates has covered most affected systems now.
    - Convert to Upstart. The init script is still here for the benefit of
      people running sshd in chroots.

openssh (1:5.2p1-2) unstable; urgency=low

  [ Colin Watson ]
  * Backport from upstream:
    - After sshd receives a SIGHUP, ignore subsequent HUPs while sshd
      re-execs itself. Prevents two HUPs in quick succession from resulting
      in sshd dying (LP: #497781).
    - Output a debug if we can't open an existing keyfile (LP: #505301).
  * Use host compiler for ssh-askpass-gnome when cross-compiling.
  * Don't run tests when cross-compiling.
  * Drop change from 1:3.6.1p2-5 to disable cmsg_type check for file
    descriptor passing when running on Linux 2.0. The previous stable
    release of Debian dropped support for Linux 2.4, let alone 2.0, so this
    very likely has no remaining users depending on it.

  [ Kees Cook ]
  * Implement DebianBanner server configuration flag that can be set to "no"
    to allow sshd to run without the Debian-specific extra version in the
    initial protocol handshake (closes: #562048).
 -- Colin Watson <email address hidden> Sat, 16 Jan 2010 03:58:17 +0000

Changed in openssh (Ubuntu):
status: Triaged → Fix Released

With the release of 5.4p1, this bug is now considered closed.

Changed in openssh:
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.