systemd fails to start sshd at reboot

Bug #1811580 reported by Dogbert Prime
334
This bug affects 14 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

So far reported issues turned out to be:
- obsolete/buggy/vulnerable 3rd party provided kernels
- bad permissions on /

Please ensure / is owned by root:root.
Please ensure you are running up to date kernels.

===

Ubuntu 16.04.5, systemd 229-4ubuntu21.15

The latest systemd update has somehow changed the method it uses to start 'ssh.service' i.e. 'sshd'. systemd fails to start sshd if /etc/ssh/sshd_config contains "UsePrivilegeSeparation yes" and /var/run/sshd/ does not already exist. Being as this is the default, virtually EVERY Ubuntu 16.04 server in the world has UsePrivilegeSeparation set to yes. Furthermore, at the time when the user performs 'apt upgrade' and receives the newest version of systemd, /var/run/sshd/ already exists, so sshd successfully reloads for as long as the server doesn't get rebooted. BUT, as soon as the server is rebooted for any reason, /var/run/sshd/ gets cleaned away, and sshd fails to start, causing the remote user to be completely locked out of his system. This is a MAJOR issue for millions of VPS servers worldwide, as they are all about to get locked out of their servers and potentially lose data. The next reboot is a ticking time bomb waiting to spring. The bomb can be defused by implicitly setting 'UsePrivilegeSeparation no' in /etc/ssh/sshd_config, however unsuspecting administrators are bound to be caught out by the millions. I got caught by it in the middle of setting up a new server yesterday, and it took a whole day to find the source.

The appropriate fix would be to ensure that systemd can successfully 'start ssh.service' even when 'UsePrivilegeSeparation yes' is set. systemd needs to test that /var/run/sshd/ exists before starting sshd, just as the init.d script for sshd does. openssl could also be patched so that UsePrivilegeSeparation is no longer enabled by default, however that is not going to solve the problem for millions of pre-existing config files. Only an update to openssl to force-override that flag to 'no' would solve the problem. Thus systemd still needs to be responsible for ensuring that it inits sshd properly by ensuring that /var/run/sshd/ exists before it sends the 'start' command.

Revision history for this message
Sebastien Bacher (seb128) wrote :

Thank you for your bug report. What update are you talking about? Did you try to downgrade the packages to see if that resolves your issue? The most recent security update includes fixes for journal and tmpfile error, that shouldn't have an impact on the ssh service and there has been no other report about that...

Revision history for this message
Andreas Kar (thexmanxyz) wrote :
Download full text (4.1 KiB)

This problem is not new and it appears again with the new systemd "229-4ubuntu21.15" released.

See also these bug reports which somehow relate to the issue:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1804847
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1804603

As I'm no Linux pro I can't tell how these issues relate to each other but they definitely do. I don't quite understand why this issue was strictly focused on OpenVZ and then neglected because it definitely is not only OpenVZ related.

I didn't face the issue in systemd versions <systemd-229-4ubuntu21.9 with systemd-229-4ubuntu21.10 the problem was fixed again via a regression and now it's back again in 229-4ubuntu21.15. I'm on:

Linux xxx 3.4.113-sun8i #2 SMP PREEMPT Sat Jan 12 15:54:26 CET 2019 armv7l armv7l armv7l GNU/Linux
Distributor ID: Ubuntu
Description: Ubuntu 16.04.5 LTS
Release: 16.04
Codename: xenial

Armbian and I suspect this has nothing to do with OpenVZ. See more here in the following link concerning Armbian based distributions with the same issue https://forum.armbian.com/topic/8852-ssh-doesnt-work-on-orange-pi-zero/

Moreover it doesn't only affect SSH but rather many other services because from what I understand some directories aren't created on startup, because of some system-tmpfiles changes. Here is the output of journalctl -b 0 -u systemd-tmpfiles-setup.service

Jän 14 11:01:51 xxx systemd[1]: Starting Create Volatile Files and Directories...
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: Failed to validate path /var: Bad file descriptor
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: Failed to validate path /var/log: Bad file descriptor
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: Failed to validate path /var/lib: Bad file descriptor
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: Failed to validate path /run/sendsigs.omit.d: Bad file descriptor
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: Failed to validate path /home: Bad file descriptor
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: Failed to validate path /srv: Bad file descriptor
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: Failed to validate path /run/lock/subsys: Bad file descriptor
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: Failed to validate path /var/run/lighttpd: Bad file descriptor
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: Failed to validate path /var/cache: Bad file descriptor
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: Failed to validate path /var/cache/man: Bad file descriptor
Jän 14 11:01:51 xxx systemd-tmpfiles[581]: Failed to validate path /run/openvpn: Bad file descriptor
Jän 14 11:01:51 xxx systemd[1]: systemd-tmpfiles-setup.service: Main process exited, code=exited, status=1/FAILURE
Jän 14 11:01:51 xxx systemd[1]: Failed to start Create Volatile Files and Directories.
Jän 14 11:01:51 xxx systemd[1]: systemd-tmpfiles-setup.service: Unit entered failed state.
Jän 14 11:01:51 xxx systemd[1]: systemd-tmpfiles-setup.service: Failed with result 'exit-code'.

In my case the following service won't start because of the systemd update. I'm no facing the exact sam...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
David (dasoto) wrote :

I have exactly the same problem. I check and an auto update was made for systemd from 229-4ubuntu21.10 to 229-4ubuntu21.15 on many of my devices (Jetson TX2). Now devices rebooted doesn't start ssh and I lost complete communication with all those devices.

I were able to log in through serial console and confirm that this is the issue.

nvidia@tegra-ubuntu:~$ uname -a
Linux tegra-ubuntu 4.4.38-tegra #1 SMP PREEMPT Mon Oct 15 15:16:41 MDT 2018 aarch64 aarch64 aarch64 GNU/Linux

Start-Date: 2019-01-12 00:17:20
Commandline: /usr/bin/unattended-upgrade
Upgrade: libsystemd0:arm64 (229-4ubuntu21.10, 229-4ubuntu21.15), udev:arm64 (229-4ubuntu21.10, 229-4ubuntu21.15), libudev1:arm64 (229-4ubuntu21.10, 229-4ubuntu21.15), systemd-sysv:arm64 (229-4ubuntu21.10, 229-4ubuntu21.15), libpam-systemd:arm64 (229-4ubuntu21.10, 229-4ubuntu21.15), systemd:arm64 (229-4ubuntu21.10, 229-4ubuntu21.15)
End-Date: 2019-01-12 00:17:35

nvidia@tegra-ubuntu:~$ sudo systemd-tmpfiles --create |more
[sudo] password for nvidia:
[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Unsafe symlinks encountered in /var/log, refusing.
Unsafe symlinks encountered in /var/lib, refusing.
Unsafe symlinks encountered in /run/sendsigs.omit.d, refusing.
Unsafe symlinks encountered in /run/lock/subsys, refusing.
Unsafe symlinks encountered in /var/cache, refusing.
Unsafe symlinks encountered in /var/cache/man, refusing.
Unsafe symlinks encountered in /run/rpcbind, refusing.
Unsafe symlinks encountered in /run/rpcbind/rpcbind.xdr, refusing.
Unsafe symlinks encountered in /run/rpcbind/portmap.xdr, refusing.
Unsafe symlinks encountered in /var/run/sshd, refusing.
Unsafe symlinks encountered in /var/run/sudo, refusing.
Unsafe symlinks encountered in /var/run/sudo/ts, refusing.
Unsafe symlinks encountered in /run/user, refusing.
Unsafe symlinks encountered in /run/systemd/ask-password, refusing.
Unsafe symlinks encountered in /run/systemd/seats, refusing.
Unsafe symlinks encountered in /run/systemd/sessions, refusing.
Unsafe symlinks encountered in /run/systemd/users, refusing.
Unsafe symlinks encountered in /run/systemd/machines, refusing.
Unsafe symlinks encountered in /run/systemd/shutdown, refusing.
Unsafe symlinks encountered in /run/systemd/netif, refusing.
Unsafe symlinks encountered in /run/systemd/netif/links, refusing.
Unsafe symlinks encountered in /run/systemd/netif/leases, refusing.
Unsafe symlinks encountered in /run/log, refusing.
Unsafe symlinks encountered in /var/lib/systemd, refusing.
Unsafe symlinks encountered in /var/lib/systemd/coredump, refusing.
Unsafe symlinks encountered in /var/log/wtmp, refusing.
Unsafe symlinks encountered in /var/log/btmp, refusing.
Unsafe symlinks encountered in /var/spool, refusing.
Unsafe symlinks encountered in /tmp/.X11-unix, refusing.
Unsafe symlinks encountered in /tmp/.ICE-unix, refusing.
Unsafe symlinks encountered in /tmp/.XIM-unix, refusing.
Unsafe symlinks encountered in /tmp/.font-unix, refusing.
Unsafe symlinks encountered in /tmp/.Test-unix, refusing.
Unsafe symlinks encountered in /run/log/journal, refusing.
Unsafe symlinks encountered in /run/log/journal/ae15719763c84b35196c20a95728b806, refusing.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@ Andreas Kar (thexmanxyz)
3.4.113-sun8i is an extremely ancient kernel. Why are you using such a kernel and where does it come from?

Which filesystems are you using? is it btfs or something else?

@ David (dasoto)
Your system appears to have an up to date kernel... but also and odd one where does 4.4.38-tegra kernel comes from? Can you use stock ubuntu kernel?

You are hitting this code path:

fd = chase_symlinks(dn, NULL, CHASE_OPEN|CHASE_SAFE, NULL);
if (fd == -EPERM)
       return log_error_errno(fd, "Unsafe symlinks encountered in %s, refusing.", path);

Can you please show us your mountpoints? is / a symlink to somewhere? what about /tmp and /run? are they symlinks too? What filesystems are you using?

What filesystems are you using? is it btfs? or something else?

information type: Public → Public Security
tags: added: regression-security regression-update
Revision history for this message
David (dasoto) wrote :
Download full text (3.4 KiB)

@Dimitri John Ledkov (xnox)

nvidia@tegra-ubuntu:~$ mount
/dev/mmcblk0p1 on / type ext4 (rw,relatime,data=ordered)
devtmpfs on /dev type devtmpfs (rw,relatime,size=7976720k,nr_inodes=1994180,mode=755)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/debug type cgroup (rw,nosuid,nodev,noexec,relatime,debug)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda1 on /media type ext4 (rw,relatime,data=ordered)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=804352k,mode=700,uid=1001,gid=1001)
/dev/sda1 on /media/docker/overlay2 type ext4 (rw,relatime,data=ordered)

nvidia@tegra-ubuntu:/$ ls -la
total 108
drwxrwxr-x 23 ubuntu ubuntu 4096 Oct 30 17:36 .
drwxrwxr-x 23 ubuntu ubuntu 4096 Oct 30 17:36 ..
drwxr-xr-x 2 root root 4096 Jan 12 00:17 bin
drwxr-xr-x 5 root root 4096 Oct 13 00:40 boot
drwxr-xr-x 4 root root 4096 Jan 3 23:00 data
drwxr-xr-x 14 root root 5480 Jan 15 01:14 dev
drwxr-xr-x 141 root root 12288 Jan 15 17:41 etc
drwxr-xr-x 4 root root 4096 Jan 6 2017 home
drwxrwxr-x 22 ubuntu ubuntu 4096 Oct 29 20:31 lib
drwx------ 2 root root 16384 Oct 12 20:30 lost+found
drwxr-xr-x 5 nvidia nvidia 4096 Oct 29 20:36 media
drwxr-xr-x 2 root root 4096 Apr 20 2016 mnt
drwxr-xr-x 3 root root 4096 Oct 12 20:28 opt
dr-xr-xr-x 313 root root 0 Jan 1 1970 proc
-rw-r--r-- 1 ubuntu ubuntu 62 May 17 2018 README.txt
drwx------ 4 root root 4096 Jan 2 23:47 root
drwxr-xr-x 23 root root 820 Jan 15 17:41 run
drwxr-xr-x 2 root root 12288 Jan 12 00:17 sbin
drwxr-xr-x 2 root root 4096 Apr 19 2016 snap
drwxr...

Read more...

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@ David (dasoto)

nvidia@tegra-ubuntu:/$ ls -la
total 108
drwxrwxr-x 23 ubuntu ubuntu 4096 Oct 30 17:36 .
drwxrwxr-x 23 ubuntu ubuntu 4096 Oct 30 17:36 ..
drwxrwxr-x 22 ubuntu ubuntu 4096 Oct 29 20:31 lib
drwxr-xr-x 5 nvidia nvidia 4096 Oct 29 20:36 media
-rw-r--r-- 1 ubuntu ubuntu 62 May 17 2018 README.txt

The above look very bad too me.

Imho /README.txt shouldn't be there at all.
Not sure why /media is owned by nvidia:nvidia, usually /media is owned by root.
Ditto / and /lib should be owned by root. It is a security vulnerability for unpriviledged user to own these top level directories.

Can you please try this:
$ sudo chown root:root / /lib /media

And check if systemd-tmpfiles works after that?

Revision history for this message
David (dasoto) wrote :

This solve the issue.

sudo chown root:root /

I think this will affect a lot of NVIDIA Jetson TX2 users, due the Jetpack version that they include with ubuntu have the / folder with owner ubuntu:ubuntu

Revision history for this message
Andreas Kar (thexmanxyz) wrote :

@Dimitri John Ledkov (xnox) It's Armbian (Xenial) for OrangePI One. Here df -Th of my filesystem:

udev devtmpfs 165M 0 165M 0% /dev
tmpfs tmpfs 50M 3,9M 46M 8% /run
/dev/mmcblk0p1 ext4 7,2G 4,1G 3,1G 57% /
tmpfs tmpfs 248M 0 248M 0% /dev/shm
tmpfs tmpfs 5,0M 4,0K 5,0M 1% /run/lock
tmpfs tmpfs 248M 0 248M 0% /sys/fs/cgroup
tmpfs tmpfs 248M 12K 248M 1% /tmp
/dev/zram0 ext4 49M 20M 27M 42% /var/log
tmpfs tmpfs 50M 0 50M 0% /run/user/999
tmpfs tmpfs 50M 0 50M 0% /run/user/1000

Revision history for this message
Andreas Kar (thexmanxyz) wrote :

@Dimitri John Ledkov (xnox) I think I'm forced by Armbian to this kernel revision by default. TBH I don't know if a manual upgrade will break the OS or not. I also have not investigated on how to upgrade it manually. I have no permission issues like David (dasoto) I already checked that before I replied to this bug report. Everything on the root directory of my machine is root / root.

Revision history for this message
Andreas Kar (thexmanxyz) wrote :

@Dimitri John Ledkov (xnox) I manully upgrade to
Linux pan 4.19.13-sunxi #5.70 SMP Sat Jan 12 15:43:21 CET 2019 armv7l armv7l armv7l GNU/Linux
and the issue seems also to be fixed for me now

Revision history for this message
Spas Spasov (spaszspasov) wrote :

I'm experiencing the same problem with Ubuntu 16.04.5 on OpenVZ (with kernel: 2.6.32-042stab127.2).

The workaround that I found is to edit `/usr/lib/tmpfiles.d/sshd.conf` in the following way:

    d /run/sshd 0755 root root

Here is the question on askubuntu.com, where my case decribed in more details: https://askubuntu.com/q/1109934/566421

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@ David (dasoto)
Please report bad permissions to whoever is distributing the "NVIDIA Jetson TX2 users" installation you are using. Maybe there are some upgrade hooks that could be shipped by packages install in that installation to fix / mitigate this issue.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@ Spas Spasov (spaszspasov)

Your kernel is out of date. Please upgrade the kernel, or contact the host administrators to apply at least https://wiki.openvz.org/Download/kernel/rhel6/042stab134.7

Changed in systemd (Ubuntu):
status: Confirmed → Incomplete
description: updated
Revision history for this message
Andreas Kar (thexmanxyz) wrote :

@Dimitri John Ledkov (xnox)
I know that in my case the Kernel is quite old, however it still affects the version I reported above and upgrading has some drawbacks in this case. What is the reason causing the issue I described? Are there any options fixing it without switching kernel version? Thanks in advance!

Revision history for this message
Adrian Kaegi (cadirol) wrote :

Good day
Same issue here! After upgrade systemd from 229-4ubuntu21.10 to 229-4ubuntu21.15 and a reboot, ssh did not start anymore.
As ugly workaround i ran #mkdir -p -m0755 /var/run/sshd && systemctl restart ssh.service

$ uname -a
Linux aws01.nts.ch 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/issue
Ubuntu 16.04.5 LTS

$ dpkg -l | grep systemd
ii libpam-systemd:amd64 229-4ubuntu21.15 system and service manager - PAM module
rc libsystemd-daemon0:amd64 204-5ubuntu20.19 systemd utility library
rc libsystemd-login0:amd64 204-5ubuntu20.19 systemd login utility library
ii libsystemd0:amd64 229-4ubuntu21.15 systemd utility library
ii python3-systemd 231-2build1 Python 3 bindings for systemd
ii systemd 229-4ubuntu21.15

Do you provide a fix? what is the plan?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@cadirol

most likely your system is broken, insecure and vulnerable to the relevant CVE. Please provide output from journal logs, and permissions of the root directory

$ ls -latr /

$ journalctl -b

Revision history for this message
Adrian Kaegi (cadirol) wrote :

Hi xnox
Thank your for your fast reply!

Indeed, i found the problem with a self-compiled package, which made a file in /usr/lib/tmpfiles.d and ran the command "chown nrpe.nrpe /var".
After deleting the affected file out of /usr/lib/tmpfiles.d and made a manual "chown root.root /var" everything worked well!

BR Cadirol

Revision history for this message
Matt P (matp) wrote :
Download full text (3.2 KiB)

Same situation. Ubuntu 16.04 openvz vps image of unknown origin.

Minimized image, ran security updates and rebooted. openssh server failed to start due to systemd-tmpfiles failing with

    Failed to validate path /var/run/sshd: Too many levels of symbolic links

Which then causes ssh server to fail to start with error:

    Missing privilege separation directory: /var/run/sshd

#
# pre breaking update
#

# uname -a
Linux NJ01 2.6.32-openvz-042stab120.18-amd64 #1 SMP Fri Jan 13 10:33:34 MSK 2017 x86_64 x86_64 x86_64 GNU/Linux

# cat /usr/lib/tmpfiles.d/sshd.conf
d /var/run/sshd 0755 root root

# systemd-tmpfiles --version
systemd 229
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN

# systemd-tmpfiles --create /usr/lib/tmpfiles.d/sshd.conf
# # success

# ls -ld /
drwxr-xr-x 23 root root 4096 Feb 26 09:35 /
# ls -ld /var
drwxr-xr-x 12 root root 4096 Nov 26 2016 /var
# ls -ld /var/run
lrwxrwxrwx 1 root root 4 Nov 26 2016 /var/run -> /run
# ls -ld /var/run/sshd
drwxr-xr-x 2 root root 40 Feb 26 09:35 /var/run/sshd

# apt-cache policy systemd
systemd:
  Installed: 229-4ubuntu12
  Candidate: 229-4ubuntu12
  Version table:
 *** 229-4ubuntu12 100
        100 /var/lib/dpkg/status

#---BREAKING UPDATE START----

apt-get update

# "minimize" the system
export DEBIAN_FRONTEND=noninteractive
apt-get --assume-yes install aptitude ubuntu-minimal
aptitude --assume-yes markauto '~i!?name(ubuntu-minimal~|linux-generic~|openssh-server~|systemd)'
aptitude --assume-yes purge '~c'

# apply security updates
apt-get --assume-yes install unattended-upgrades
unattended-upgrade

# reboot
shutdown -r now

#---BREAKING UPDATE END----

# post update (pre-reboot).
# apt-cache policy systemd
systemd:
  Installed: 229-4ubuntu21.16
  Candidate: 229-4ubuntu21.16
  Version table:
 *** 229-4ubuntu21.16 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
        100 /var/lib/dpkg/status
     229-4ubuntu4 500
        500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages
# ls -ld /
drwxr-xr-x 23 root root 4096 Feb 26 09:03 /
# ls -ld /var
drwxr-xr-x 12 root root 4096 Nov 26 2016 /var
# ls -ld /var/run
lrwxrwxrwx 1 root root 4 Nov 26 2016 /var/run -> /run
# ls -ld /var/run/sshd
drwxr-xr-x 2 root root 40 Feb 26 09:03 /var/run/sshd
# systemd-tmpfiles --version
systemd 229
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN
# systemd-tmpfiles --create /usr/lib/tmpfiles.d/sshd.conf
Failed to validate path /var/run/sshd: Too many levels of symbolic links

Anyway, root cause seems to be this systemd-tmpfiles error. Tmpfile gets purged at reboot and doesn't get recreated.

Seems pretty major that applying security updates would lock you out of your server. If I didn't happen to have a serial console with this particular VPS provider (some others I use don't provide one)...I would have no idea what was going on.

I get this might be due to weird openvz image or older kernel......

Read more...

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1811580] Re: systemd fails to start sshd at reboot

On Tue, Feb 26, 2019 at 02:39:55PM -0000, Matt P wrote:

> Anyway, root cause seems to be this systemd-tmpfiles error. Tmpfile gets
> purged at reboot and doesn't get recreated.

> Seems pretty major that applying security updates would lock you out of
> your server. If I didn't happen to have a serial console with this
> particular VPS provider (some others I use don't provide one)...I would
> have no idea what was going on.

> I get this might be due to weird openvz image or older kernel...but
> these ubuntu openvz images are very common.

As per
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1811580/comments/14
you must have at least 042stab134.7 installed. Your comment shows that you
have 042sta120.18 installed. You will need to contact your hosting provider
about updating.

Given that an updated kernel exists, we do not intend to reduce security for
all other Ubuntu users on account of hosting providers who are both running
Ubuntu container guests on top of an unsupported non-Ubuntu kernel, *and*
are not keeping their kernel up to date.

Revision history for this message
Matt P (matp) wrote :

Okay. I guess I would have expected that if there was a dependency on a specific kernel version, that I wouldn't be able to install a package that wasn't compatible and breaks the system by installing a security update. It would be preferable to be informed there is a security update but that I can't install it because I am running an out of date kernel...then I know I am insecure and that the kernel is the issue. But I guess that is a topic for the package management guys. The error message from systemd-tmpfiles about too many symlinks isn't particularly helpful either since in this case the problem (apparently) has nothing to do with symlinks but rather filesystem apis in the old kernel (I guess?).

Yes of course I can contact the hosting provider and ask them to provide an updated kernel and the likely result may be that I just have to use an alternate provider if I want this to work. Perhaps I should anyway since the hosting provider having such old kernels isn't a good sign.

I also saw this comment: https://github.com/systemd/systemd/commit/6a89d671dfdd92c0b1b703d7fcb5b0551cafb570

For now I have worked around this issue by just updating the paths to point to /run instead of /var/run so systemd-tmpfiles doesn't barf on the symlinks.

    sed -i -e 's;/var/run;/run;g' /usr/lib/tmpfiles.d/*.conf

Revision history for this message
Julian Alarcon (julian-alarcon) wrote :

There is something that I cant understand.

I get that systemd new update changes some stuff and needs an specific path.

But, why are the owners of / directory or others directories being changed?

I got this issue with this ami-059eeca93cf09eebd on AWS, but not in all the servers that have this error.

As AWS has no serial console access the only way to restore that servers was to create a new instance, attach the old disk, fix permissions and reattach disk to old instance.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Tue, Apr 23, 2019 at 04:44:07PM -0000, Julian Alarcon wrote:
> There is something that I cant understand.

> I get that systemd new update changes some stuff and needs an specific
> path.

> But, why are the owners of / directory or others directories being
> changed?

> I got this issue with this ami-059eeca93cf09eebd on AWS, but not in all
> the servers that have this error.

From http://cloud-images.ubuntu.com/query/xenial/server/released.txt, this
is a genuine Ubuntu AMI

  xenial server release 20180912 ebs-ssd amd64 us-east-1 ami-059eeca93cf09eebd hvm

Our cloud team has confirmed that the contents of that image are correct,
the / directory is owned by root.

So I don't know what is changing the permissions of / for you.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

On Tue, 26 Feb 2019 at 18:05, Matt P <email address hidden> wrote:
>
> Okay. I guess I would have expected that if there was a dependency on a
> specific kernel version, that I wouldn't be able to install a package
> that wasn't compatible and breaks the system by installing a security
> update. It would be preferable to be informed there is a security

The kernel you are running is not an Ubuntu one. There is no package
for it known to either apt nor dpkg, thus there is no possible
dependency we could express.
How can we express a dependency, on something that is unknown to us?
After all, one can prepare installs using chroots, to then later run
the system on an incompatible kernel.

> update but that I can't install it because I am running an out of date
> kernel...then I know I am insecure and that the kernel is the issue.

Escalate to your provider. Who is your provider? Maybe Canonical can
get in touch with them?

> But I guess that is a topic for the package management guys. The error
> message from systemd-tmpfiles about too many symlinks isn't particularly
> helpful either since in this case the problem (apparently) has nothing
> to do with symlinks but rather filesystem apis in the old kernel (I
> guess?).
>
> Yes of course I can contact the hosting provider and ask them to provide
> an updated kernel and the likely result may be that I just have to use
> an alternate provider if I want this to work. Perhaps I should anyway
> since the hosting provider having such old kernels isn't a good sign.
>
> I also saw this comment:
> https://github.com/systemd/systemd/commit/6a89d671dfdd92c0b1b703d7fcb5b0551cafb570
>
> For now I have worked around this issue by just updating the paths to
> point to /run instead of /var/run so systemd-tmpfiles doesn't barf on
> the symlinks.
>
> sed -i -e 's;/var/run;/run;g' /usr/lib/tmpfiles.d/*.conf

I wonder if we should SRU that. Which might not help for all instances
of this issue, but maybe at least some.

--
Regards,

Dimitri.

Revision history for this message
Julian Alarcon (julian-alarcon) wrote :

Thank you @vorlon . I made some test with that image on a clean server and it works with no issues, but I had issues in different servers with the same image and other worked fine with the same AMI.

I use a proxy repository, this can be related to different/incompatible systemd and kernel versions installed using that repo?

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for systemd (Ubuntu) because there has been no activity for 60 days.]

Changed in systemd (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Darxus (darxus) wrote :

I think I had the same problem. I think it was fixed with "chown root:root /var".

Related to this: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1891394

I think it also caused screen to be unusable until it was run by root.

To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.