PostgreSQL does not start in lx-brand container

Bug #1608953 reported by Christopher Horrell
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Martin Pitt

Bug Description

We have a 16.04 Ubuntu lx-brand container image available in our public cloud and recently discovered a systemd bug that's related to running in a container environment.

I'm forwarded below what one of our engineers discovered:

----

After installing postgres (apt-get install -y -q postgresql), systemd does not actually start any of the postgres services. We tracked this down to a failure from sed from within the /lib/systemd/system-generators/postgresql-generator script. The sed command tries to close stderr (fd 2) which fails, so sed returns an error code, which causes the entire postgres generator to fail.

The root cause of the problem lies in the systemd code. Because we are running inside of a container (see detect_container) we don't execute the following block of code in the systemd main().

        if (getpid() == 1 && detect_container() <= 0) {

                /* Running outside of a container as PID 1 */
                arg_running_as = MANAGER_SYSTEM;
                make_null_stdio();

The make_null_stdio function is what sets up fd 0-2 as /dev/null in systemd on bare metal. Having those fd's setup is what allows the postgres system-generator to work properly since sed expects to be able to close stderr.

Because we never call make_null_stdio when inside any container, the low fd's wind up getting setup later using /dev/console with O_CLOEXEC, so when we actually run the system generator script, we don't have the low fd's setup at all like sed expects.

Interestingly, looking at the master branch of systemd, at src/core/main.c this bug appears to no longer exist. The relevant code block has been moved so it is no longer conditional on being in a container, but the commit was not intended to fix this problem. It was apparently due to color handling on the console/

commit 3a18b60489504056f9b0b1a139439cbfa60a87e1

It would be great if this fix could be pulled in to an update for Ubuntu 16.04.

SRU INFORMATION
===============
Fix: https://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?h=ubuntu-xenial&id=6df46531727baa

Regression potential: very low; this does not affect lxc and lxd (our officially supported container engines) nor nspawn, as they already set up pid1's stdout/stderr. And it's hard to imagine anything depending on pid1's stdout/err *not* being existant file descriptors, as in pretty much all cases they already are.

Test case: Specific to lx-brand, must be verified by reporter. However, we need to verify that LXC, LXD, and nspawn containers still boot with this version.

Revision history for this message
Martin Pitt (pitti) wrote :

Thanks for the initial analysis! This is very helpful. I'm marking this as fixed in yakkety and add a xenial task.

I tried to reproduce this. I created a standard xenial and yakkety container:

  lxc launch images:ubuntu/xenial/amd64 x1
  lxc launch images:ubuntu/yakkety/amd64 y1

In both of them pid1's low fds look okay:

$ lxc exec x1 -- ls -l /proc/1/fd/{0,1,2}
lrwx------ 1 root root 64 Aug 18 05:10 /proc/1/fd/0 -> /dev/null
lrwx------ 1 root root 64 Aug 18 05:10 /proc/1/fd/1 -> /dev/null
lrwx------ 1 root root 64 Aug 18 05:10 /proc/1/fd/2 -> /dev/null

(same for y1)

PostgreSQL starts fine after installation:

$ lxc exec x1 -- apt install -y postgresql
$ lxc exec x1 -- pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
9.5 main 5432 online postgres /var/lib/postgresql/9.5/main /var/log/postgresql/postgresql-9.5-main.log

(again, same for y1)

The generator ran:

$ lxc exec x1 -- ls -lR /run/systemd/generator/postgresql.service.wants
/run/systemd/generator/postgresql.service.wants:
total 0
lrwxrwxrwx 1 root root 39 Aug 18 05:13 postgresql@9.5-main.service -> /lib/systemd/system/postgresql@.service

So I'm afraid I cannot reproduce this for testing the fix. This is a requirement for SRUs. Can you please describe how this can be reproduced?

Changed in systemd (Ubuntu):
status: New → Fix Released
Changed in systemd (Ubuntu Xenial):
status: New → Incomplete
Revision history for this message
Martin Pitt (pitti) wrote :

I tried again with "classic" LXC (not lxd), and there the FDs look differently: In xenial they point to /dev/pts/4, in yakkety they point to /dev/pts/1. But installing postgresql in both xenial and yakkety works fine.

I also tried with lxc-start-ephemeral, but in that case the low FDs are again pointing to /dev/null and things work fine (but I suppose you don't use emphemeral containers).

So, I'm happy to backport the patch, but I'm unable to create an SRU test case or verify the fix myself.

Revision history for this message
Martin Pitt (pitti) wrote :

FTR, there is a similar bug 1611973 about postgresql not starting in a cloud instance, but this is specific to cloud-init, not to LXC. How sure are you that this is really due to the low FDs of pid 1, as opposed to that bug? Do you have some useful logs to look at, like a journal output from that container? (although generators don't log into journal, they run earlier than journald starts -- on a "real" system their error messages land in dmesg, not sure if containers are allowed to do that).

summary: - Issue with systemd issue when run inside a container
+ PostgreSQL does not start in container
Changed in systemd (Ubuntu):
status: Fix Released → Incomplete
Revision history for this message
Christopher Horrell (chorrell) wrote : Re: PostgreSQL does not start in container

We have not observed this issue under lxc, but under lx-brand. Some background information on lx-brand:

http://www.slideshare.net/bcantrill/illumos-lx

Based on my discussion with the engineer who discovered this, it's probably an issue you'll hit in a Docker container as well as an lx-brand environment.

Revision history for this message
Martin Pitt (pitti) wrote :

OK, thanks. Marking this as "hw-specific" then, which is not entirely accurate, but it essentially means "only the reporter can test a stable release update". I'll pull in the patch into the xenial branch and will ask you for testing once it gets into xenial-proposed (which might still take a while as this is low-priority). Thanks!

summary: - PostgreSQL does not start in container
+ PostgreSQL does not start in lx-brand container
tags: added: hw-specific
Changed in systemd (Ubuntu):
status: Incomplete → Fix Released
Martin Pitt (pitti)
Changed in systemd (Ubuntu Xenial):
status: Incomplete → New
Revision history for this message
Martin Pitt (pitti) wrote :

https://github.com/systemd/systemd/commit/6edefe0b06 is actually introducing a regression (disabling color mode for containers), and does not backport at all.

I think https://github.com/systemd/systemd/commit/3a18b6048950405 is much closer to what you actually need here: Calling make_null_stdio() for containers as well.

I backported that one change, verified that LXC and LXD still work fine, and put a test package into https://launchpad.net/~pitti/+archive/ubuntu/sru-test (systemd 229-4ubuntu7pitti1). I would appreciate if you could test this and confirm that it fixes the problem. There are no other packages in that PPA, so dist-upgrading to it is safe.

Thanks!

Changed in systemd (Ubuntu Xenial):
assignee: nobody → Martin Pitt (pitti)
Revision history for this message
Christopher Horrell (chorrell) wrote :

Great, thanks! We'll take a look at this today.

Revision history for this message
Christopher Horrell (chorrell) wrote :

So far it looks good. I awaiting feedback from a couple people and I'll let you know.

Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello Christopher, or anyone else affected,

Accepted systemd into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/229-4ubuntu8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in systemd (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed
Martin Pitt (pitti)
description: updated
Revision history for this message
Christopher Horrell (chorrell) wrote :

I installed the systemd 229-4ubuntu8 package from xenial-proposed and it looks like the package fixes this bug.

Revision history for this message
Christopher Horrell (chorrell) wrote :

TEST CASE:

- Added xenial-proposed to /etc/apt/sources.list
- Installed systemd: pt install systemd/xenial-proposed
- Install postgres: apt-get install -y -q postgresql

Verified the postgresql service is running:

- su postgres
- psql

and also:

- systemctl status postgresql
- ystemctl status postgresql@9.5-main.service

Martin Pitt (pitti)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 229-4ubuntu8

---------------
systemd (229-4ubuntu8) xenial-proposed; urgency=medium

  * Queue loading transient units after setting their properties. Fixes
    starting VMs with libvirt. (LP: #1529079)
  * Connect pid1's stdin/out/err fds to /dev/null also for containers. This
    fixes generators which expect a valid stdout/err fd in some container
    technologies. (LP: #1608953)
  * 73-usb-net-by-mac.rules: Do not run readlink for *every* uevent, and
    merely check if /etc/udev/rules.d/80-net-setup-link.rules exists.
    A common way to disable an udev rule is to just "touch" it in
    /etc/udev/rule.d/ (i. e. empty file), and if the rule is customized we
    cannot really predict anyway if the user wants MAC-based USB net names or
    not. (LP: #1615021)
  * systemd-networkd-resolvconf-update.service: Also pick up DNS servers from
    individual link leases, as they sometimes don't appear in the global
    ifstate. (LP: #1620559)

 -- Martin Pitt <email address hidden> Tue, 06 Sep 2016 14:16:29 +0200

Changed in systemd (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.