snapd 2.26.14 on ubuntu-core won't start in containers anymore

Bug #1709536 reported by Stéphane Graber on 2017-08-09
48
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Snap Layer
Critical
Unassigned
snapd
Undecided
Michael Vogt
systemd (Ubuntu)
Status tracked in Artful
Xenial
Medium
Dimitri John Ledkov
Artful
High
Dimitri John Ledkov

Bug Description

[Impact]

Systemd treats a failure to apply the requested Nice value as critical to unit startup.

Unprivileged LXD containers do not allow the use of negative nice values. snapd will fail to start inside containers now that snapd uses a negative Nice value.

Aug 09 05:54:37 core systemd[1]: snapd.service: Main process exited, code=exited, status=201/NICE
Aug 09 05:54:37 core systemd[1]: snapd.service: Unit entered failed state.
Aug 09 05:54:37 core systemd[1]: snapd.service: Failed with result 'exit-code'.

The fix is for systemd to ignore permission errors when attempting to setup such custom nice values in containers.

I have confirmed that setting up a unit override by hand which sets Nice = 0 does resolve the problem.

[Test Case]

Boot a Xenial image in lxd:

$ lxc launch xenial x1
$ lxc exec x1 -- systemctl --state=failed

Observe failures for snapd :

● snapd.service loaded failed failed Snappy daemon
● snapd.socket loaded failed failed Socket activation for snapp

Install updated systemd from -proposed and get status: (lxc exec <container> reboot; lxc exec <container> systemctl status)

State: running
Jobs: 0 queued
Failed: 0 units

[Regression Potential]

Services will now run with a Nice value other than what was specified in the unit if it cannot be changed for some reason.

Stéphane Graber (stgraber) wrote :

Added an Ubuntu systemd task.

Stéphane Graber (stgraber) wrote :

This bug affects anyone currently running ubuntu-core inside a LXD container as the current stable core snap is affected by this problem.

tags: added: lxd
Dimitri John Ledkov (xnox) wrote :

commit 5b8e457f8d883fc6f55d33d46b3474926a495d29
Author: Dimitri John Ledkov <email address hidden>
Date: Tue Aug 1 18:51:20 2017 +0100

    Ignore failures to set Nice priority on services in containers.

Is in artful-proposed. Also please see - https://github.com/systemd/systemd/pull/6503 which needs further work to get merged upstream.

Dimitri John Ledkov (xnox) wrote :

Is SRU of this fix to e.g. xenial's systemd desired?

Changed in systemd (Ubuntu):
assignee: nobody → Dimitri John Ledkov (xnox)
milestone: none → ubuntu-17.08
importance: Undecided → High
status: New → Fix Committed
Stéphane Graber (stgraber) wrote :

Yeah, xenial is where most of the snapd users are for us, so that'd certainly be desired.
We wouldn't need trusty though as snapd doesn't work inside trusty containers.

Oliver Grawert (ogra) wrote :

this is in snapd since may https://github.com/snapcore/snapd/pull/3270 why did this break all of a sudden ?

Michael Vogt (mvo) wrote :
Dimitri John Ledkov (xnox) wrote :

That was not useful at all @ogra @mvo
As on the 18th systemd migrated that can set Nice in artful.... yet you disabled it on the 18th.

Please back out your comment, and re-enable setting Nice in artful.

Manually uncommenting and rebooting artful container results in:

Aug 21 12:35:23 noinit systemd[329]: snapd.service: Failed to adjust OOM setting, assuming containerized execution, ignoring: Permission denied
Aug 21 12:35:23 noinit systemd[329]: snapd.service: Failed to adjust Nice setting, assuming containerized execution, ignoring: Operation not permitted
Aug 21 12:35:23 noinit systemd[329]: snapd.service: Executing: /usr/lib/snapd/snapd

Changed in systemd (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Dimitri John Ledkov (xnox)
Changed in systemd (Ubuntu Artful):
status: Fix Committed → Fix Released
Zygmunt Krynicki (zyga) on 2017-08-22
Changed in snapd:
assignee: nobody → Michael Vogt (mvo)
Stuart Bishop (stub) on 2017-08-28
Changed in layer-snap:
importance: Undecided → Critical
Stéphane Graber (stgraber) wrote :

So I'm confused, wasn't the SRU supposed to have been fixed for this?

We're still getting reports of users that have a broken snapd because of this issue, some of whom then decided to switch to privileged containers just to avoid this problem, therefore loosing a lot of LXD's security features and potentially exposing their hosts to attacks...

Jacek Nykis (jacekn) wrote :

Is there any workaround available other than switching to privileged containers?

Oliver Grawert (ogra) wrote :

@xnox

"As on the 18th systemd migrated that can set Nice in artful.... yet you disabled it on the 18th."

our development focus is 16.04 and we do not have release specific systemd units for the forward ported snapd packages so the comment will have to stay in until xenial has a fixed systemd ...

Stéphane Graber (stgraber) wrote :

As a workaround, you can override the snapd systemd unit with:

systemctl edit snapd

Then add:
  [Service]
  Nice=0

After saving the override, run "systemctl daemon-reload" and "systemctl start snapd"

Stuart Bishop (stub) wrote :

All charms using snaps are currently failing, so I'm looking forward to a snapd release with the commented out Nice. The alternative is adding the systemd override workaround to the snap layer and making everyone rebuild and republish their charms.

Changed in systemd (Ubuntu Xenial):
status: Confirmed → In Progress

Thanks for uploading the fix for this bug report to -proposed. However, when reviewing the package in -proposed and the details of this bug report I noticed that the bug description is missing information required for the SRU process. You can find full details at http://wiki.ubuntu.com/StableReleaseUpdates#Procedure but essentially this bug is missing some of the following: a statement of impact, a test case and details regarding the regression potential. Thanks in advance!

description: updated
Mathew Hodson (mathew-hodson) wrote :

Artful was fixed in systemd 234-2ubuntu2 - Ignore failures to set Nice priority on services in containers.

---
systemd (234-2ubuntu2) artful; urgency=medium

  * Ignore failures to set Nice priority on services in containers.
  * Disable execute test on armhf.
  * units: set ConditionVirtualization=!private-users on journald audit socket.
    It fails to start in unprivileged containers.
  * boot-smoke: refactor ADT test.
    Wait for system to settle down and get to either running or degraded state,
    then collect all metrics, and exit with an error if any of the tests failed.

 -- Dimitri John Ledkov <email address hidden> Wed, 02 Aug 2017 03:02:03 +0100

Hello Stéphane, or anyone else affected,

Accepted systemd into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/229-4ubuntu20 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in systemd (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
Stéphane Graber (stgraber) wrote :

I've confirmed that snapd with Nice=-5 will start with the updated systemd.

tags: added: verification-done-xenial
removed: verification-needed-xenial
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers