systemd in degraded state on startup in LXD containers

Bug #1576341 reported by Scott Moser on 2016-04-28
46
This bug affects 5 people
Affects Status Importance Assigned to Milestone
lvm2 (Ubuntu)
High
Unassigned
open-iscsi (Ubuntu)
High
Unassigned

Bug Description

The ubuntu:xenial image shows 'degraded' state in lxd on initial boot.

$ lxc launch xenial x1
$ sleep 10
$ lxc file pull x1/etc/cloud/build.info -
build_name: server
serial: 20160420-145324

$ lxc exec x1 systemctl is-system-running
degraded

$ lxc exec x1 -- systemctl --state=failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
● dev-hugepages.mount loaded failed failed Huge Pages File System
● iscsid.service loaded failed failed iSCSI initiator daemon (iscsid)
● open-iscsi.service loaded failed failed Login to default iSCSI targets
● systemd-remount-fs.service loaded failed failed Remount Root and Kernel File Systems
● systemd-sysctl.service loaded failed failed Apply Kernel Variables
● lvm2-lvmetad.socket loaded failed failed LVM2 metadata daemon socket
● systemd-journald-audit.socket loaded failed failed Journal Audit Socket

LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.

7 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: open-iscsi 2.0.873+git0.3b4b4500-14ubuntu3
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Uname: Linux 4.4.0-18-generic x86_64
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
Date: Thu Apr 28 17:28:04 2016
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
SourcePackage: open-iscsi
UpgradeStatus: No upgrade log present (probably fresh install)

Scott Moser (smoser) wrote :
description: updated
Scott Moser (smoser) on 2016-04-28
description: updated
Changed in lvm2 (Ubuntu):
status: New → Confirmed
Changed in open-iscsi (Ubuntu):
status: New → Confirmed
Changed in systemd (Ubuntu):
status: New → Confirmed
Ryan Harper (raharper) wrote :

iscsid.service: Failed to read PID from file /run/iscsid.pid: Invalid argument

When runnig iscsid -f -d7, we see the issue:

root@x1:~# iscsid -f -d 7
iscsid: sysfs_init: sysfs_path='/sys'

iscsid: InitiatorName=iqn.1993-08.org.debian:01:32a765bb043
iscsid: InitiatorName=iqn.1993-08.org.debian:01:32a765bb043
iscsid: InitiatorAlias=x1
iscsid: Max file limits 65536 65536

iscsid: Could not increase process priority: Operation not permitted
iscsid: Could not set oom score to -16: Permission denied
iscsid: Could not set oom score to -17: Permission denied
iscsid: failed to mlockall, exiting...

It doesn't handle non-root usernamespace; expects to be able to write to oom_adjst. similar to tgtd.

Ryan Harper (raharper) wrote :

Actually, ooms are non-fatal, but the mlockall is.

strace shows:

[pid 521] mlockall(MCL_CURRENT|MCL_FUTURE <unfinished ...>
[pid 522] <... getdents resumed> /* 2 entries */, 32768) = 48
[pid 522] getdents(5, /* 0 entries */, 32768) = 0
[pid 522] close(5) = 0
[pid 522] exit_group(0) = ?
[pid 521] <... mlockall resumed> ) = -1 ENOMEM (Cannot allocate memory)

Ryan Harper (raharper) wrote :

Unpriv containers don't have CAP_IPC_LOCK at this time; we need to determine if that's requirement , or if it's actually non-fatal.

Martin Pitt (pitti) wrote :

These four units belong to the systemd package itself:

> dev-hugepages.mount loaded failed failed Huge Pages File System
> systemd-journald-audit.socket loaded failed failed Journal Audit Socket

These units attempt to not start in containers with less privileges with ConditionCapability=CAP_SYS_ADMIN and CAP_AUDIT_READ. This does work in nspawn, but it seems the LXD unprivileged containers pretend to have all these caps:

Capabilities for `1': = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_syslog,cap_wake_alarm,cap_block_suspend,37+ep

Which is misleading. Can we start containers with only those capabilities which are actually namespace aware and available to the container, and hide the rest?

> systemd-sysctl.service loaded failed failed Apply Kernel Variables

This is supposed to not start via ConditionPathIsReadWrite=/proc/sys/, but tries anyway, and with debug logging I get

  systemd-sysctl.service: ConditionPathIsReadWrite=/proc/sys/ succeeded.

This is wrong as both "touch /proc/sys/foo" and "test -w /proc/sys" fail. I'll look into this.

> systemd-remount-fs.service loaded failed failed Remount Root and Kernel File Systems

This is has "ConditionPathExists=/etc/fstab", but that's true for lxd containers because they have a dummy /etc/fstab with no entries, just a comment (thus ConditionFileNotEmpty= would not work either). Checking for the CAP_SYS_ADMIN capability would be appropriate (which is required for mounting), but that wouldn't work because of the above issue.

This service does succeed in a container without apparmor restrictions (--config raw.lxc=lxc.aa_profile=unconfined).

Adding ConditionPathIsReadWrite=!/ may be the simplest and most straightforward solution here.

Martin Pitt (pitti) wrote :

> ● systemd-remount-fs.service loaded failed failed Remount Root and Kernel File Systems

Actually, I cannot reproduce this bit. I launched a xenial lxd container with the default lxd config on xenial host, and this unit succeeded. It's also supposed to be a no-op as there are no actual fstab entries. Scott, can you please run

    strace /lib/systemd/systemd-remount-fs

in your container and paste the output here? Thanks!

> Adding ConditionPathIsReadWrite=!/ may be the simplest and most straightforward solution here.

It's actually not. You actually *may* have an fstab, and systemd-remount-fs is then necessary to (re)mount file systems with the options in fstab.

Stéphane Graber (stgraber) wrote :

LXC doesn't drop many capabilities, we only really drop mac_admin, mac_override, sys_time, sys_module and sys_rawio.

That's because we do run workloads which do need the other capabilities, including cap_sys_admin.

Now in an unprivileged container, having those capabilities will only do you good against resources owned by the container and will (obviously) not let you gain any more rights than you had as the owning uid prior to entering the container.

So you absolutely do have cap_sys_admin and it will let you do a bunch of things against the network devices owned by your container or mount entries owned by the container, ... but it will not let you mess with things that aren't namespaced and that you wouldn't be allowed to touch as a normal unprivileged user.

The kernel has a nice ns_capable(ns, CAP) function which lets you check whether you do have the named capability against a given resource, I'm not aware of a userspace equivalent though.

Having us drop a bunch of capabilities is the wrong answer though and we won't be doing that.

Changed in lxd (Ubuntu):
status: New → Invalid
Stéphane Graber (stgraber) wrote :

I closed the lxd task as our current behavior wrt capabilities is correct. But I also subscribed the ubuntu-lxc team to this bug so we can keep an eye on it.

Serge Hallyn (serge-hallyn) wrote :

Right you can check whether you have CAP_X targeted at your own user ns, and you can check whether you are in an init_user_ns (by checking /proc/self/uid_map). The manpages currently are rarely clear, when they say you need CAP_X, about which namespace that must be targeted against. (I just corrected one instance in a branch). And as you can see, if the manpages were, they woudl be quickly out of date, since the process of (a) deducing which capability checks can be namespaced, (b) converting those, or (c) improving the target's namespaces so that the checks can be namespaced (if possible) is ongoing, and will be for a long time.

Martin Pitt (pitti) wrote :

> systemd-sysctl.service loaded failed failed Apply Kernel Variables

I filed this as https://github.com/lxc/lxcfs/issues/111 . I'll stop treating this here now, as there are already too many unrelated issues here for one bug report.

Martin Pitt (pitti) wrote :

So would a namespace aware check for CAP_SYS_AUDIT say "no" then? (The audit subsystem isn't namespace aware right now). How would such a check look like in userspace?

CAP_SYS_ADMIN is a different beast, as this contains a lot of different and unrelated issues. It's also not fine-grained enough anyway for the above purpose of "can we mount", as this can't/doesn't consider MACs. So with the statement above (keeping all caps in a container) this means that the failing dev-hugepages.mount is not easily fixable. It's also mostly cosmetical, so not urgent for now. I guess the same goes for iscsi/lvm2 etc.

Quoting Martin Pitt (<email address hidden>):
> So would a namespace aware check for CAP_SYS_AUDIT say "no" then? (The
> audit subsystem isn't namespace aware right now). How would such a check
> look like in userspace?

I suppose a namespace aware check for CAP_SYS_AUDIT would look like an
fcntl or something funky against an nsfs inode for a user namespace.
Going from an instantiated or abstract object (like an fd, a pathname,
a process id) to the relevant nsfs inode would be interesting. I.e.
if one day we allow unpriv users to mknod /dev/null, then a check
for CAP_MKNOD against /dev/null might return true, while a check for
CAP_MKNOD against /dev/sda might return false.

This is interesting, but not likely to be ever implemented :)

Changed in systemd (Ubuntu):
importance: Undecided → High
Changed in open-iscsi (Ubuntu):
importance: Undecided → High
Changed in lxd (Ubuntu):
importance: Undecided → High
Changed in lvm2 (Ubuntu):
importance: Undecided → High

Any progress with regards to this bug?

Luis Felipe Marzagao (dulinux) wrote :

I can confirm this on recently installed system.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

$ lxc launch ubuntu:xenial testct
Creating testct
Starting testct

$ lxc exec testct -- systemctl --state=failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
● dev-hugepages.mount loaded failed failed Huge Pages File System
● iscsid.service loaded failed failed iSCSI initiator daemon (iscsid)
● open-iscsi.service loaded failed failed Login to default iSCSI targets
● setvtrgb.service loaded failed failed Set console scheme
● systemd-remount-fs.service loaded failed failed Remount Root and Kernel File Systems
● systemd-sysctl.service loaded failed failed Apply Kernel Variables
● lvm2-lvmetad.socket loaded failed failed LVM2 metadata daemon socket
● systemd-journald-audit.socket loaded failed failed Journal Audit Socket

LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.

8 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

Hamy (hamy-public1) wrote :

i can also confirm this. i noticed it when an update for open-iscsi came along and i tried to update the container:

...
...
...
Setting up open-iscsi (2.0.873+git0.3b4b4500-14ubuntu8.2) ...
Job for open-iscsi.service failed because the control process exited with error code.
See "systemctl status open-iscsi.service" and "journalctl -xe" for details.
invoke-rc.d: initscript open-iscsi, action "start" failed.
● open-iscsi.service - Login to default iSCSI targets
   Loaded: loaded (/lib/systemd/system/open-iscsi.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Fri 2017-01-13 11:35:10 UTC; 16ms ago
     Docs: man:iscsiadm(8)
           man:iscsid(8)
  Process: 4334 ExecStartPre=/bin/systemctl --quiet is-active iscsid.service (code=exited, status=3)

Jan 13 11:35:10 testi systemd[1]: open-iscsi.service: Failed to reset devices.list: Operation not permitted
Jan 13 11:35:10 testi systemd[1]: Starting Login to default iSCSI targets...
Jan 13 11:35:10 testi systemd[1]: open-iscsi.service: Control process exited, code=exited status=3
Jan 13 11:35:10 testi systemd[1]: Failed to start Login to default iSCSI targets.
Jan 13 11:35:10 testi systemd[1]: open-iscsi.service: Unit entered failed state.
Jan 13 11:35:10 testi systemd[1]: open-iscsi.service: Failed with result 'exit-code'.

Serge Hallyn (serge-hallyn) wrote :

Seems like just adding

ConditionVirtualization=!container

to debian//open-iscsi.service should fix it.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package open-iscsi - 2.0.873+git0.3b4b4500-14ubuntu14

---------------
open-iscsi (2.0.873+git0.3b4b4500-14ubuntu14) zesty; urgency=medium

  * Make systemd job not run in containers (LP: #1576341)

 -- Serge Hallyn <email address hidden> Sun, 15 Jan 2017 23:08:29 -0600

Changed in open-iscsi (Ubuntu):
status: Confirmed → Fix Released
Nish Aravamudan (nacc) on 2017-03-15
description: updated
description: updated
Nish Aravamudan (nacc) wrote :
Download full text (3.3 KiB)

16.04:

$ lxc launch xenial x1
$ lxc file pull x1/etc/cloud/build.info -
build_name: server
serial: 20160211-034510
$ lxc exec x1 systemctl is-system-running
degraded
$ lxc exec x1 -- systemctl --state=failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
● dev-hugepages.mount loaded failed failed Huge Pages File System
● iscsid.service loaded failed failed iSCSI initiator daemon (iscs
● systemd-remount-fs.service loaded failed failed Remount Root and Kernel File
● systemd-sysctl.service loaded failed failed Apply Kernel Variables
● lvm2-lvmetad.socket loaded failed failed LVM2 metadata daemon socket
● systemd-journald-audit.socket loaded failed failed Journal Audit Socket

16.10:

$ lxc launch ubuntu:yakkety y1
$ lxc file pull y1/etc/cloud/build.info -
build_name: server
serial: 20170307
$ lxc exec y1 systemctl is-system-running
degraded
$ lxc exec y1 -- systemctl --state=failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
● dev-hugepages.mount loaded failed failed Huge Pages File System
● iscsid.service loaded failed failed iSCSI initiator daemon (iscs
● open-iscsi.service loaded failed failed Login to default iSCSI targe
● systemd-remount-fs.service loaded failed failed Remount Root and Kernel File
● systemd-sysctl.service loaded failed failed Apply Kernel Variables
● lvm2-lvmetad.socket loaded failed failed LVM2 metadata daemon socket
● systemd-journald-audit.socket loaded failed failed Journal Audit Socket

17.04:

$ lxc launch ubuntu-daily:zesty z1
$ lxc file pull z1/etc/cloud/build.info -
build_name: server
serial: 20170315.1
$ lxc exec z1 systemctl is-system-running
degraded
$ lxc exec enormous-quetzal -- systemctl --state=failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
● iscsid.service loaded failed failed iSCSI initiator daemon (iscsid)
● systemd-remount-fs.service loaded failed failed Remount Root and Kernel File Systems
● lvm2-lvmetad.socket loaded failed failed LVM2 metadata daemon socket
● systemd-journald-audit.socket loaded failed failed Journal Audit Socket

LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.

4 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

$ lxc exec enormous-quetzal apt policy open-iscsi
open-iscsi:
  Installed: 2.0.873+git0.3b4b4500-14ubuntu17

So things are better generally. I'm not sure if this is high enough priority where we might want to consider SRUing some of the changes back to 16.04?

--

Also, Serge, I was looking at your change from c#17 -- does that mean even privileged containers would no longer get their iSCSI devices to start? (iscsiadm mentions that iscsid is often needed) And shouldn't we do the same (to be consistent) for iscsid.service? Actually, it seems like if we had done it in iscsid.service, the followi...

Read more...

Nish Aravamudan (nacc) wrote :

Did some digging on the mlockall failure:

        /* we don't want our active sessions to be paged out... */
        if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
                log_error("failed to mlockall, exiting...");
                log_close(log_pid);
                exit(ISCSI_ERR);
        }

so I think it's a real issue for iscsid (and I'm not sure we want to debug random failures in the code if it can't ensure it's 'active sessions') don't stay in memory.

That change, fwiw, was originally introduced in 2005:

https://github.com/open-iscsi/open-iscsi/commit/6f37c861162157f4a6e28c2fa3cf50e61726c8f3

so it's unlikely to have been tested anytime recently without it :)

Nish Aravamudan (nacc) on 2017-03-16
summary: - fails in lxd container
+ systemd in degraded state on startup in LXD containers
Nish Aravamudan (nacc) wrote :
Download full text (3.6 KiB)

Wanted to level-set (and subscribing pitti and hallyn for their advice):

1) LXD unprivileged containers:

4 services in the Zesty daily are failed at start:

1.a) iscsid.service

  This is because iscsid needs CAP_IPC_LOCK to run mlockall(). Unprivileged containers end up failing in the host kernel.

  I believe the right way about this is to make the change hallyn did to open-iscsi.service in iscsid.service and make open-iscsi.service properly depend on iscsid.service. But I also think the change made by hallyn is too broad and means even privileged containers cannot use iscsi, which does not seem to be strictly true.

1.a.1) Proposed solution: http://paste.ubuntu.com/24196051/

  Effectively, only run iscsid if not in a user namespace (which is where the capabilities get dropped, aiui). And open-iscsi service adds conditions (adapter from Fedora's service file) to check that nodes are defined (which would imply some configuration has been done) and a session exists (which I think means that /etc/iscsi/iscsid.conf contains node.startup=automatic and iscsid has started up a session therefore).

  If we are worried about the potential breakage (I need to of course test all this in the various iSCSI configurations), we might consider just making the first change (ConditionVirtualization=!private-users) to both .service files, but I feel like that is mostly a workaround for not being able to express cleanly the dependency between the two services: open-iscsi.service can only run if iscsid.service is running; but if iscsid.service is not running because of a Condition, then open-iscsi.service should not be in a failed state.

1.b) systemd-remount-fs.service

  z2 systemd-remount-fs[50]: mount: can't find LABEL=cloudimg-rootfs

  /etc/fstab:
    LABEL=cloudimg-rootfs / ext4 defaults 0 0

  This doesn't really make sense in the context of LXD containers afaict, because they don't have a /dev/disk/by-label necessarily? Also, the / is all configured by LXD in practice, not by how the cloud-image is configured?

1.b.1) Proposed solution, comment out the entry in /etc/fstab in the LXD images.

1.c) lvm2-lvmetad.socket

  lvm[61]: Daemon lvmetad returned error 104
  lvm[61]: WARNING: Failed to connect to lvmetad. Falling back to device scanning.
  ...
  lvm2-lvmetad.socket: Trigger limit hit, refusing further activation.

  But manually running `systemctl start lvm2-lvmetad.socket` at `lxc exec z1 bash`, works. That seems confusing and implies some sort of ordering issue? (Note that confusingly `systemctl restart lvm2-lvmetad.socket` does *not* work.)

1.c.1) I don't have a proposed solution for this.

1.d) systemd-journal-audit.socket

  I found this older thread: https://lists.freedesktop.org/archives/systemd-devel/2015-May/032113.html on this topic. Specifically, https://lists.freedesktop.org/archives/systemd-devel/2015-May/032126.html.

  Looking at the socket file, though, I see:

  ConditionCapability=CAP_AUDIT_READ

  which I do not believe is the same as CAP_ADMIN_READ. I don't know if the ML post or the change are incorrect, but I did verify that using CAP_ADMIN_READ in the container instead of CAP_AUDIT_READ did correctly conditionalize ...

Read more...

Nish Aravamudan (nacc) wrote :

heh, after a few more sips of coffee and actually reading the manpage, my 1.d.1 is obviously incorrect because CAP_ADMIN_READ is not a capability. So in effect it's masking out the audit socket :)

Nish Aravamudan (nacc) wrote :

Ok, the audit stuff was 'resolved' in LP: #1457054, where I think everyone decided to agree that unprivileged containers didn't matter...

Serge Hallyn (serge-hallyn) wrote :

Thanks, Nish. My thoughts:

1.a sounds good

1.b i'd like another way to do that, but not sure what a better way would
be.

1.c does lvm also fail in privileged containers? I can see no use to
running it (for now) in an unprivileged container, so the same solution
as 1.a seems reasonable.

1.d
CAP_ADMIN_READ is not a real capability. So if 1.d is fixed by that,
then something else is wrong.

On 29.03.2017 [03:19:16 -0000], Serge Hallyn wrote:
> Thanks, Nish. My thoughts:
>
> 1.a sounds good

Ack.

> 1.b i'd like another way to do that, but not sure what a better way would
> be.

Yeah, I spent some time looking at the CPC generater and it seems like
this is pretty hard-coded:

999-cpc-fixes.chroot:
## --------------
# for maverick and newer, use LABEL= for the '/' entry in fstab
if [ -n "${root_fs_label}" ]; then
   bl="[:blank:]"
   lstr="LABEL=${root_fs_label}"
   sed -i "s,^[^#${bl}]*\([${bl}]*/[${bl}].*\),${lstr}\1," "${rootd}/etc/fstab"
fi
cat > /etc/fstab << EOM
LABEL=cloudimg-rootfs / ext4 defaults 0 0
EOM

> 1.c does lvm also fail in privileged containers? I can see no use to
> running it (for now) in an unprivileged container, so the same solution
> as 1.a seems reasonable.

It also fails in privileged containers in the same way (see 2.b in
comment 20). Note that it works if I manually start the socket after
boot.

> 1.d
> CAP_ADMIN_READ is not a real capability. So if 1.d is fixed by that,
> then something else is wrong.

Right, follow-on comments indicated it was a thinko on my part. I think
it make sense, based upon the context in the audit bug that perhaps we
just don't do auditing in unprivileged containers (similar to the 1.a
change)?

Balint Reczey (rbalint) wrote :

>> 1.c does lvm also fail in privileged containers? I can see no use to
>> running it (for now) in an unprivileged container, so the same solution
>> as 1.a seems reasonable.
>
> It also fails in privileged containers in the same way (see 2.b in
> comment 20). Note that it works if I manually start the socket after
> boot.

It seems /lib/systemd/system/lvm2-monitor.service also needs the "ConditionVirtualization=!container" line.

Balint Reczey (rbalint) wrote :

Patch for lvm2, tested in zesty lxc container and VM (for regressions).

The attachment "lvm2_2.02.167-1ubuntu7.patch" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Balint Reczey (rbalint) wrote :

Fixed the format of the open-iscsi conditions, it works nicely in (privileged and not privileged) artful containers.

Balint Reczey (rbalint) wrote :

>> 1.b i'd like another way to do that, but not sure what a better way would
>> be.
>
> Yeah, I spent some time looking at the CPC generater and it seems like
> this is pretty hard-coded:
>
> 999-cpc-fixes.chroot:
> ## --------------
> # for maverick and newer, use LABEL= for the '/' entry in fstab
> if [ -n "${root_fs_label}" ]; then
> bl="[:blank:]"
> lstr="LABEL=${root_fs_label}"
> sed -i "s,^[^#${bl}]*\([${bl}]*/[${bl}].*\),${lstr}\1," "${rootd}/etc/fstab"
> fi
> cat > /etc/fstab << EOM
> LABEL=cloudimg-rootfs / ext4 defaults 0 0
> EOM

I think the cleanest solution would be providing images for containers without this invalid fstab entry.
The second cleanest seems to be not starting systemd-remount-fs.service in containers, or at least not in lxc.

Nish Aravamudan (nacc) wrote :

On 08.05.2017 [11:25:03 -0000], Balint Reczey wrote:
> >> 1.b i'd like another way to do that, but not sure what a better way would
> >> be.
> >
> > Yeah, I spent some time looking at the CPC generater and it seems like
> > this is pretty hard-coded:
> >
> > 999-cpc-fixes.chroot:
> > ## --------------
> > # for maverick and newer, use LABEL= for the '/' entry in fstab
> > if [ -n "${root_fs_label}" ]; then
> > bl="[:blank:]"
> > lstr="LABEL=${root_fs_label}"
> > sed -i "s,^[^#${bl}]*\([${bl}]*/[${bl}].*\),${lstr}\1," "${rootd}/etc/fstab"
> > fi
> > cat > /etc/fstab << EOM
> > LABEL=cloudimg-rootfs / ext4 defaults 0 0
> > EOM
>
> I think the cleanest solution would be providing images for containers
> without this invalid fstab entry.

I *think* containers and VMs use the same cloud image, so I don't think
bifurcating for this one change is reasonable.

> The second cleanest seems to be not starting
> systemd-remount-fs.service in containers, or at least not in lxc.

Except it's possible that a user might have other entries that should be
remounted, possibly? When, say, passing real disks into the container?

Nish Aravamudan (nacc) wrote :

I'm going to upload rbalint's fixes in the merges for open-iscsi and lvm2 I plan on doing this week.

Changed in open-iscsi (Ubuntu):
status: Fix Released → Triaged
status: Triaged → In Progress
assignee: nobody → Nish Aravamudan (nacc)
Changed in lvm2 (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Nish Aravamudan (nacc)
Balint Reczey (rbalint) wrote :

Adding patch for systemd to skip starting systemd-remount-fs.service in containers.

This is the last piece of the puzzle to see systemd in running state in an Artful container, comments are welcome! :-)

Balint Reczey (rbalint) wrote :

Regarding the systemd patch there can be configurations where systemd-remount-fs is needed and does useful work. In those configurations the .service file can be overridden by a local one to start.

One other - not too clean - option is locally diverting /lib/systemd/systemd-remount-fs in image generation to check "mount -f /" before actually doing the remount and reporting success when "mount -f /" fails.

Nish Aravamudan (nacc) on 2017-05-09
Changed in lvm2 (Ubuntu):
status: In Progress → Fix Committed
Nish Aravamudan (nacc) wrote :

lvm2 fix is in 2.02.168-2ubuntu1 in artful.

Changed in lvm2 (Ubuntu):
status: Fix Committed → Fix Released
Changed in open-iscsi (Ubuntu):
status: In Progress → Fix Committed
Changed in systemd (Ubuntu):
status: Confirmed → In Progress
status: In Progress → Fix Committed
milestone: none → ubuntu-17.05
assignee: nobody → Dimitri John Ledkov (xnox)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 233-6ubuntu2

---------------
systemd (233-6ubuntu2) artful; urgency=medium

  [ Michael Biebl ]
  * basic/journal-importer: Fix unaligned access in get_data_size()
    (Closes: #862062)

  [ Dimitri John Ledkov ]
  * ubuntu: disable dnssec on any ubuntu releases (LP: #1690605)
  * Cherrypick upstream patch for vio predictable interface names.
  * Cherrypick upstream patch for platform predictable interface names.
    (LP: #1686784)

  [ Balint Reczey ]
  * Skip starting systemd-remount-fs.service in containers
    even when /etc/fstab is present.
    This allows entering fully running state even when /etc/fstab
    lists / to be mounted from a device which is not present in the
    container. (LP: #1576341)

 -- Dimitri John Ledkov <email address hidden> Wed, 17 May 2017 19:24:03 +0100

Changed in systemd (Ubuntu):
status: Fix Committed → Fix Released
Nish Aravamudan (nacc) wrote :

In Artful, we have (running with -proposed for open-iscsi and lvm2):

● snapd.service loaded failed failed Snappy daemon
● snapd.socket loaded failed failed Socket activation for snapp
● systemd-journald-audit.socket loaded failed failed Journal Audit Socket

snapd team probably need to fix the first two.

Christian Reis (kiko) wrote :

I've also noticed that nfs-common triggers a failure:

root@sendmail:~# systemctl --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
● run-rpc_pipefs.mount loaded failed failed RPC Pipe File System
● systemd-remount-fs.service loaded failed failed Remount Root and Kernel File Systems
● lvm2-lvmetad.socket loaded failed failed LVM2 metadata daemon socket

So I've added a task for nfs-common. The remaining failures I believe have been handled by this bug and are just unfixed in Xenial (and will likely remain unfixed right?)

Christian Reis (kiko) wrote :

And added a snapd task based on Nish's last comment.

no longer affects: lxd (Ubuntu)
Dimitri John Ledkov (xnox) wrote :

snapd is invalid, and will be fixed with https://github.com/systemd/systemd/pull/6503/files

basically systemd did not ignore failure to set Nice on the service in a container.

Changed in systemd (Ubuntu):
status: Fix Released → In Progress
Changed in snapd (Ubuntu):
status: New → Invalid
no longer affects: snapd (Ubuntu)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 234-2ubuntu6

---------------
systemd (234-2ubuntu6) artful; urgency=medium

  * Disable KillUserProcesses, yet again, with meson this time.
  * Re-enable reboot tests.

 -- Dimitri John Ledkov <email address hidden> Thu, 17 Aug 2017 15:22:35 +0100

Changed in systemd (Ubuntu):
status: In Progress → Fix Released
Mathew Hodson (mathew-hodson) wrote :

open-iscsi (2.0.874-4ubuntu1) artful; urgency=medium

  * Merge with Debian unstable. Remaining changes:
    - debian/tests: Add Ubuntu autopkgtests.
    - debian/iscsi-network-interface.rules, debian/net-interface-handler,
      debian/open-iscsi.install:
      Prevent network interface that contains iscsi root from bouncing
      during boot or going down during shutdown.
      Integrates with resolvconf and initramfs code that writes
      /run/initramfs/open-iscsi.interface
    - debian/open-iscsi.maintscript: clean up the obsolete
      iscsi-network-interface upstart job, file on upgrade.
    - Let iscsid systemd job run in privileged containers but not in
      unprivileged ones
    - Start open-iscsi systemd job when either /etc/iscsi/nodes or
      /sys/class/iscsi_session have content
      Based on patch by Nish Aravamudan, thanks! (LP #1576341)
    - add IPv6 support
      + add support for IPV6{DOMAINSEARCH,DNS0,DNS1} to net-interface-handler
        LP #1621507
      + Source /run/net6-*.conf when needed.
      + debian/extra/initramfs.local-top: handle IPv6 configs being
        shipped in DEVICE6 or /run/net6-*.conf in the initramfs, so we
        can fill in /run/initramfs/open-iscsi.interface (LP #1621507)
  * Drop:
    - d/extra/initramfs.local-top: When booting from iBFT,
      set the PROTO= entry in /run/net-*.conf accordingly,
      so that other tools, such as cloud-init, can use that
      information. (cloud-init fails if the current PROTO=none
      is used.) (LP: #1684039) (Closes: #866213)
      [ Fixed in Debian 2.0.874-4 ]
  * d/t/test-open-iscsi.py: drop test_daemon test
    - With the updates to the systemd units, the services do not run
      unless iSCSI is configured.

 -- Nishanth Aravamudan <email address hidden> Tue, 08 Aug 2017 16:16:27 -0700

Changed in open-iscsi (Ubuntu):
status: Fix Committed → Fix Released
Mathew Hodson (mathew-hodson) wrote :

This bug was fixed in the systemd with 234-2ubuntu6 - Ignore failures to set Nice priority on services in containers.

---
systemd (234-2ubuntu2) artful; urgency=medium

  * Ignore failures to set Nice priority on services in containers.
  * Disable execute test on armhf.
  * units: set ConditionVirtualization=!private-users on journald audit socket.
    It fails to start in unprivileged containers.
  * boot-smoke: refactor ADT test.
    Wait for system to settle down and get to either running or degraded state,
    then collect all metrics, and exit with an error if any of the tests failed.

 -- Dimitri John Ledkov <email address hidden> Wed, 02 Aug 2017 03:02:03 +0100

Nish Aravamudan (nacc) wrote :

With a fully updated daily image (lxc launch ubuntu-daily:artful; lxc exec <container> apt update; lxc exec <container> apt full-upgrade; lxc exec <container> reboot; lxc exec <container> systemctl status):

    State: running
     Jobs: 0 queued
   Failed: 0 units

Nice work everyone!

Now, ideally, any package that shows up in the container image by default is checked that this doesn't regress going forward :)

Changed in lvm2 (Ubuntu):
assignee: Nish Aravamudan (nacc) → nobody
Changed in open-iscsi (Ubuntu):
assignee: Nish Aravamudan (nacc) → nobody
Mathew Hodson (mathew-hodson) wrote :

I assume the failures in comment #37 for nfs-common were also caused by systemd failing to set the Nice priority.

no longer affects: nfs-utils (Ubuntu)
Mathew Hodson (mathew-hodson) wrote :

systemd SRU to ignore failures to set Nice priority on services in containers for Xenial in bug 1709536.

no longer affects: systemd (Ubuntu)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers