openssh-server SIGSYS with 'UsePrivilegeSeparation sandbox'

Bug #1690485 reported by KEVIN KENNY
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
openssh (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

The 'sshd' process gets 'authentication failure' and refuses to allow any login.

dmesg indicates that the problem is SIGSYS on a call to 'socket' (syscall #41, signal #31).

On a hunch, I decided to test whether the problem is related to 'seccomp' and changed /etc/ssh/sshd_config from the default

# UsePrivilegeSeparation sandbox

to the former standard value

UsePrivilegeSeparation yes

and logins started to work again.

Obviously, I'd like to have the additional protection that sandboxing would give me.

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: openssh-server 1:7.4p1-10
ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8
Uname: Linux 4.10.0-20-generic x86_64
ApportVersion: 2.20.4-0ubuntu4
Architecture: amd64
CurrentDesktop: XFCE
Date: Fri May 12 21:06:20 2017
InstallationDate: Installed on 2017-04-08 (35 days ago)
InstallationMedia:

SourcePackage: openssh
UpgradeStatus: Upgraded to zesty on 2017-04-24 (19 days ago)

Revision history for this message
KEVIN KENNY (ke9tv) wrote :
Revision history for this message
Colin Watson (cjwatson) wrote :

It seems surprising to me that the pre-authentication network listener would need to create new sockets. I think the best thing to do here is going to be to rebuild sshd locally with SANDBOX_SECCOMP_FILTER_DEBUG defined (see the top of sandbox-seccomp-filter.c) to try to narrow this down.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Kevin,
I quickly checked in the default config in zesty but ssh logins are working fine (pw logins is what I tested).

There must be more to your config that triggers this.
Is what you see in dmesg an apparmor denial or something else?
Could you attach your dmesg covering the issue that you see?

You logs attached so far only hold plenty of:
May 12 19:17:15 hostname sshd[4241]: pam_unix(sshd:auth): check pass; user unknown
[...]
May 12 20:51:55 hostname sshd[8608]: error: maximum authentication attempts exceeded for root from 192.168.1.1 port 9990 ssh2 [preauth]
May 12 20:51:55 hostname sshd[8608]: PAM service(sshd) ignoring max retries; 6 > 3

But those could just as well be from your testing.

I imported your sshd config as auto attached to the report, restarted it and it still worked fine. So we really would need more to help debugging.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better.

There isn't really enough information here for a developer to confirm this issue is a bug, or to begin working on it, so I am marking this bug Incomplete for now.

If you can provide exact steps so that a developer can reproduce the original problem, then please add them to this bug and change the status back to New.

Changed in openssh (Ubuntu):
status: New → Incomplete
Revision history for this message
KEVIN KENNY (ke9tv) wrote :

I reset /etc/ssh/sshd_config to the attached version, and attempted

ssh -v localhost

while logged in as username=kennykb uid=117

The output of 'ssh -v' is attached as 'sshclient.txt'.

The only lines that appeared in syslog after I restarted the daemon were in the attached 'syslog.txt'.

The corresponding time period in auth.log is attached as 'authlog.txt'.

The contents of /etc/ssh/ssh_config at the time of the failure are attached.

(I'll attach the files in the next few messages.)

The first is auth.log. 192.168.1.1 is my gateway machine. I see that I'm being hammered 3-4 times a minute with unauthorized requests to log in as root from some external machine. Isn't the internet a spectacularly hostile place?

If this is not enough, what else do you need? Another developer mentioned rebuilding from source with a particular debugging option turned on - if you need this, can you give me a precise description of the location and version of the package you want rebuilt and the exact change you want?

Revision history for this message
KEVIN KENNY (ke9tv) wrote :

Here's /var/log/syslog from the ssh daemon restart through the authorization failure

Revision history for this message
KEVIN KENNY (ke9tv) wrote :

Here's the output of 'ssh -v localhost' when authorization is failing

Revision history for this message
KEVIN KENNY (ke9tv) wrote :

And here's /etc/ssh/sshd_config on which authorization is failing.

Changed in openssh (Ubuntu):
status: Incomplete → New
Revision history for this message
KEVIN KENNY (ke9tv) wrote :

Upgraded to 17.10. Uninstalled and reinstalled openssh-client, openssh-server (including loading a fresh /usr/etc/ssh directory).

Still fails.

I attach the output of:
sudo strace -f -e trace=socket /usr/sbin/sshd -d 2>&1 | tee sshd.result.txt

All the other configuration is as before.

What other information can I get for you so as not to just get this issue closed again?

I've downloaded source, but of course it configures to run from /usr/local. What are the correct flags for ./configure to get it to build the way Ubuntu builds it and install to the Ubuntu paths? Alternatively, where can I find out how to run the daemon from /usr/local/sbin instead of /usr/sbin? (That would allow me to configure the extra debugging information requested in the second comment.)

Revision history for this message
KEVIN KENNY (ke9tv) wrote :

Oh, another note: changing the UsePrivilegeSeparation setting no longer works. It reports that the setting is deprecated and ignores it. This leaves me without a workaround.

Revision history for this message
Colin Watson (cjwatson) wrote :

This issue was never previously closed, only marked Incomplete (which is an open state).

While it's possible to do a normal package build to get things configured exactly the way we do, I don't think that's necessary here. I suggest:

 * git clone https://anonscm.debian.org/git/pkg-ssh/openssh.git
 * cd openssh
 * sudo apt build-dep ./
 * make the change I suggested in comment #2
 * ./configure --sysconfdir=/etc/ssh --libexecdir=/usr/lib/openssh --with-privsep-path=/run/sshd --with-pid-dir=/run --with-pam
 * make

Don't install the result. Instead, run "sudo `pwd`/sshd -p 2222 -ddd" (where 2222 is some free port on your system) and try "ssh -oStrictHostKeyChecking=no -p 2222 localhost". That should be close enough for this purpose, and if it isn't then we can refine from there.

Also, could you attach your PAM configuration (/etc/pam.d/sshd plus any files mentioned in @include lines there)?

The strace you attached is unfortunately not very useful. What we need to find out here is what bit of code is making the offending socket call, which is going to require some context around it: that's usually best achieved by not limiting the set of syscalls traced by strace. Unfortunately that also means that your private host keys will show up in the strace, so if you do that then you need to be careful to redact anything like that from the output!

Revision history for this message
Colin Watson (cjwatson) wrote :

And the exact change I want is to uncomment the "#define SANDBOX_SECCOMP_FILTER_DEBUG 1" line in sandbox-seccomp-filter.c; but please read the block comment above that line first.

Revision history for this message
KEVIN KENNY (ke9tv) wrote :

OK, I think I've followed instructions here.

I built with the '#define SANDBOX_SECCOMP_FILTER_DEBUG 1'
uncommented. Recalling at long last that Ubuntu is Debian
(I use Red Hat/CentOS at work and get them confused), I
used 'dpkg-buildpackage -rfakeroot -uc -b' to do the build;
hope that's OK. I also found that I needed to comment away
the four'#include' lines that follow the
SANDBOX_SECCOMP_FILTER_DEBUG definition, or else I got
many errors relating to conflicting structure definitions.

Attached tarball 'ssd-test-20171025.tar.gz' contains 'sshd.log',
the result of running 'sudo sshd -p 2222 -ddd'. It also contains
an etc/ hierarchy that includes the current /etc/ssh/sshd_config
and the relevant files from /etc/pam.d. I also threw in the result
of 'strace -f' applied to that command, in case it helps narrow
the point of failure further. I took a quick troll through the
output, and I don't *think* I see it revealing more than a
few bytes of a private key.

Thanks for responding! I hope this stuff helps to move the walls
in on the problem.

Revision history for this message
Colin Watson (cjwatson) wrote :

Thanks. The strace is indeed useful, although it doesn't get us all the way there. This is very strange! It appears that the network monitor is attempting to create a Unix socket in the middle of sending a USERAUTH_SUCCESS packet, which is peculiar to say the least.

This may take a few iterations. Could you please revert your changes to sandbox-seccomp-filter.c, and instead apply the attached patch? You can apply it with "patch -p1 </path/to/sshd-debug.patch", and then repeat exactly what you did before to build and install it; I then only need the strace, generated using the same procedure as before. What I'm trying to do here is to narrow down exactly where sshd is heading off to try to create a socket.

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Hello Kevin, I don't want to step on Colin's toes here, but I couldn't sort out anything from the logs or files so far. I've had success before with perf to grab call graph information. The usual process is along the lines of:

- enable the ddebs.ubuntu.com repository to get access to debug symbols: https://wiki.ubuntu.com/Debug%20Symbol%20Packages

- install packages openssh-server-dbgsym libc6-dbgsym .. as needed

- Record perf results during testing with something like:
  sudo perf record -e syscalls:sys_enter_socket -R -a --call-graph dwarf sleep 20

- test

- Check results with something like:
  sudo perf script

With luck this will spit out the function names of the call chain at the time of the socket() syscall that is being killed (probably alongside a lot of unrelated instances -- it's possible to tune the perf line to narrow it down if necessary).

I hope this helps,
Thanks

Revision history for this message
KEVIN KENNY (ke9tv) wrote :

The plot thickens.

The configuration of the build appears to be partially implicated.
Could it be that with your ./configure flags, it's failing to find a
failing PAM or something?

When I build with the ./configure that you suggested, it works.
When I build with dpkg-buildpackage, on the same source code, it fails.

Output from the failing case, with your patch to add debug output,
attached as sshd-strace-20171026.txt

Revision history for this message
KEVIN KENNY (ke9tv) wrote :

Seth: Your requested output is perf-20171026.txt

It's not clear to me what's out of the ordinary in the stack
traces, except of course that once we're in the Python code
of 'apport', things have unquestionably already gone to Hell.
Maybe someone who's familiar with the code will have a better
idea.

Revision history for this message
Colin Watson (cjwatson) wrote :

Seth, this is not as helpful as you might think, because the socket call is being denied by seccomp so the relevant call never shows up in perf. Feel free to try to construct a different perf call that takes account of that ...

Revision history for this message
Colin Watson (cjwatson) wrote :

OK, so that puts it somewhere inside ssh_packet_send2_wrapped. Can you revert my previous patch (patch -p1 -R <sshd-debug.patch, or use git if you're comfortable with it) and apply this one instead to narrow it down further?

The configuration difference is certainly suggestive. You can see the other configure options in debian/rules, and it would perhaps be worth trying different combinations of those if you have the time: the relevant ones are probably --with-kerberos5=/usr, --with-ssl-engine, --with-selinux, and --with-audit=linux.

Revision history for this message
KEVIN KENNY (ke9tv) wrote :

OK, I applied your patch, and was lucid enough to follow farther
into the maze with a few additional debug3's.

I get down into the 'cipher_crypt' function, at line 378:

 if (EVP_Cipher(cc->evp, dest + aadlen, (u_char *)src + aadlen,
     len) < 0)
  return SSH_ERR_LIBCRYPTO_ERROR;

... so now we're off spelunking into libcrypto somewhere.

I tried plunging a short distance into the OpenSSL source, but didn't
have time to get very far down that road. At the time of the crash,
cc->evp is set, EVP_Cipher_nid(cc->evp) is returning 0x480c0, and
EVP_CIPHER_type(cc->evp) is returning 0. I don't know if either of
these results is significant - I'm looking at this API for the very
first time, and not yet entirely sure of what all the objects
represent.

I likely shan't have any further time today to experiment with
./configure flags. Maybe on the weekend.

Revision history for this message
KEVIN KENNY (ke9tv) wrote :

I lied. I experimented with ./configure flags.

    --with-kerberos5=/usr

is definitely the flag that is triggering the crash. Removing this
flag alone cures the crash.

Command that was used:

../configure --build=x86_64-linux-gnu --prefix=/usr \
      --includedir=\${prefix}/include --mandir=\${prefix}/share/man \
      --infodir=\${prefix}/share/info --sysconfdir=/etc \
      --localstatedir=/var --disable-silent-rules \
      --libdir=\${prefix}/lib/x86_64-linux-gnu \
      --libexecdir=\${prefix}/lib/x86_64-linux-gnu \
      --disable-maintainer-mode --disable-dependency-tracking \
      --sysconfdir=/etc/ssh --libexecdir=\${prefix}/lib/openssh \
      --disable-strip --with-mantype=doc --with-4in6 \
      --with-privsep-path=/run/sshd --with-pid-dir=/run \
      --with-tcp-wrappers --with-pam --with-libedit \
      --with-systemd \
      --with-xauth=/usr/bin/xauth \
      --with-default-path=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games \
      --with-superuser-path=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
      --with-cflags="-Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fdebug-prefix-map=/home/kennykb/debian.org/openssh=. -fstack-protector-strong -Wformat -Werror=format-security -DSSH_EXTRAVERSION=\\\"Ubuntu-3\\\"" "--with-ldflags=-Wl,--as-needed -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now" \
      --with-ssl-engine --with-selinux --with-audit=linux
# removed
# --with-kerberos5=/usr

Revision history for this message
KEVIN KENNY (ke9tv) wrote :

(And, for what it's worth, I don't, to the best of my knowledge, have anything
Kerberos-related set up. There is no /etc/krb5.conf file.

Revision history for this message
Colin Watson (cjwatson) wrote :

I'm at a loss as to why Kerberos should affect this particular thing, at least when there's no actual Kerberos authentication involved. Silly question, but you don't have a modified OpenSSL or anything related to it, do you, and what exact package version of libssl1.0.0 do you have installed? Also, if you put --with-kerberos5=/usr back and remove --with-ssl-engine, does that also cure the crash?

I'm not entirely convinced about your reported value for EVP_Cipher_nid(cc->evp), since as far as I can see 0x480c0 isn't a valid NID. Something seems fishy there. In general that's a very odd place to see a socket being created, unless we're somehow hitting RAND_query_egd_bytes - but in that case I think I'd expect to see an attempt to open /dev/urandom between the getpid and the socket.

The next thing I can think of to try is to allow the network monitor to use this system call and see what else happens around it. Obviously do this very cautiously, and do not run with the attached patch in production (I'm pretty sure the socket syscall is deliberately forbidden in this context), but it should be enough to get a more complete strace and (probably more usefully) to try Seth's perf idea again: with this patch, the socket syscall should actually make it as far as the tracepoint, so we should be able to get a stack trace for it.

Revision history for this message
KEVIN KENNY (ke9tv) wrote :
Download full text (3.7 KiB)

Disable ssl-engine, re-enable kerberos5, it still crashes.

I'm pretty sure I'm running a stock OpenSSL. The only things
that I can think of that I might have done in that general
vicinity were to install xinetd, althttpd, and stunnel4, and
to obtain a host certificate from letsencrypt.org.

I'm still seeing it falling apart in EVP_Cipher. When I
allow 'socket', it unsurprisingly fails with SIGSYS on a
'connect' that follows immediately. The stack trace is
confusing. I've installed libkeyutils1-dbgsym, but it still
reports [unknown] for the locations.

sshd 9257 [006] 260133.546558: syscalls:sys_enter_socket: family: 0x00000001, type: 0x00000001, protocol: 0x00000000
                  115eb7 __socket (/lib/x86_64-linux-gnu/libc-2.26.so)
                    464e [unknown] (/lib/x86_64-linux-gnu/tls/libkeyutils.so.1.5)
                    484b [unknown] (/lib/x86_64-linux-gnu/tls/libkeyutils.so.1.5)
                    4be7 [unknown] (/lib/x86_64-linux-gnu/tls/libkeyutils.so.1.5)
                   4a24a cipher_crypt (/home/kennykb/debian.org/openssh/kevinbuild/sshd)
                   4feec ssh_packet_send2_wrapped (/home/kennykb/debian.org/openssh/kevinbuild/sshd)
                   500ef ssh_packet_send2 (/home/kennykb/debian.org/openssh/kevinbuild/sshd)
                   52b78 ssh_packet_send (/home/kennykb/debian.org/openssh/kevinbuild/sshd)
                   1a920 userauth_finish (/home/kennykb/debian.org/openssh/kevinbuild/sshd)
                   1ae72 input_userauth_request (/home/kennykb/debian.org/openssh/kevinbuild/sshd)
                   56699 ssh_dispatch_run (/home/kennykb/debian.org/openssh/kevinbuild/sshd)
                   56748 ssh_dispatch_run_fatal (/home/kennykb/debian.org/openssh/kevinbuild/sshd)
                   19d06 do_authentication2 (/home/kennykb/debian.org/openssh/kevinbuild/sshd)
                    de67 main (/home/kennykb/debian.org/openssh/kevinbuild/sshd)
                   211c0 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.26.so)
                    f2b9 _start (/home/kennykb/debian.org/openssh/kevinbuild/sshd)
        ffffffffffffffff [unknown] ([unknown])

But at least we know it's going through keyutils, which isn't a huge library!

What I get from a separate run using strace is also likely a usefil clue. The abort
on 'connect' shows that it's trying to connect to D-Bus. I have Absolutely No Idea
what's going on there, but it seems quite peculiar. (There are a bunch of writes
from debug3 calls that I strewed throughout cipher_crypt.)

[pid 9903] write(7, "\0\0\0#\0\0\0\7\0\0\0\33cipher_crypt seqnr=7"..., 39 <unfinished ...>
[pid 9901] <... write resumed> ) = 53
[pid 9903] <... write resumed> ) = 39
[pid 9901] write(2, "debug3: mm_request_receive enter"..., 37debug3: mm_request_receive entering
 <unfinished ...>
[pid 9903] write(7, "\0\0\0\26\0\0\0\7\0\0\0\16not CHACHAPOLY", 26 <unfinished ...>
[pid 9901] <... write resumed> ) = 37
[pid 9903] <... write resumed> ) = 26
[pid 9901] read(5, <unfinished ...>
[pid 9903] write(7, "\0\0\0#\0\0\0\7\0\0\0\33neither CHACHAPOLY n"..., 39) = 39
[pid 9903] write(7, "\0\0\0\23\0\0\0\7\0\0\0\vauthlen = 0", 23) = 23
[pid 9903]...

Read more...

Revision history for this message
Colin Watson (cjwatson) wrote :

This looks very weird. If it's calling EVP_Cipher, that should go through OpenSSL; it's conceivable that it might end up in libkeyutils via the Kerberos libraries, sure, but a simple cipher call should already have everything it needs; and libkeyutils doesn't use D-Bus.

Could you attach /lib/x86_64-linux-gnu/tls/libkeyutils.so.1 so that I can see if I can make anything out of the offsets in your perf output?

Revision history for this message
KEVIN KENNY (ke9tv) wrote :

Yeah, this is almost starting to look like malware trying to exfiltrate keys somehow! Unauthorized socket I/O being done by a library that doesn't match the symbol tables? (I'm glad I don't have any Kerberos keys to leak!)

Revision history for this message
Kevin Kenny (kennykb) wrote :

Not trying to pester, but bumping this in case you didn't see that I posted the requested file.

Revision history for this message
Luke A. Perkins (public-a) wrote :

See also Ticket #1831765 and #1832110 regarding the path of the privilege separation directory (aka: /run/sshd).

This path is hard-coded into sshd and there is no means of changing its location. I have found that if the directory is missing, all ssh communication unceremoniously stops (iff UsePrivilegeSeparation yes). The original OpenSSH 7.6p1 assigns the privilege separation directory to "/var/empty" (see man sshd at openssh.com). When Ubuntu changed to systemd from Upstart, the privilege separation directory was changed from "/var/empty" to "/run/sshd". This is also supported by reviewing the value of /lib/systemd/system/ssh.service and look at the RuntimeDirectory=sshd.

My work-around is to create a service that does nothing other than create the "/run/sshd" directory and define the RuntimeDirectory=(anything but sshd) in your .service file.

Revision history for this message
Robie Basak (racb) wrote :

Thank you for posting the additional information here. It sounds like this will help others affected.

I see that the bug Importance has never been set, so I'm setting it now, to Low, based on "unusual end-user configurations" from https://wiki.ubuntu.com/Bugs/Importance. I'd like to make it clear that no progress can be expected from others on this bug, but volunteers are welcome to work on it.

Changed in openssh (Ubuntu):
importance: Undecided → Low
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in openssh (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.