These units attempt to not start in containers with less privileges with ConditionCapability=CAP_SYS_ADMIN and CAP_AUDIT_READ. This does work in nspawn, but it seems the LXD unprivileged containers pretend to have all these caps:
Capabilities for `1': = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_syslog,cap_wake_alarm,cap_block_suspend,37+ep
Which is misleading. Can we start containers with only those capabilities which are actually namespace aware and available to the container, and hide the rest?
This is wrong as both "touch /proc/sys/foo" and "test -w /proc/sys" fail. I'll look into this.
> systemd-remount-fs.service loaded failed failed Remount Root and Kernel File Systems
This is has "ConditionPathExists=/etc/fstab", but that's true for lxd containers because they have a dummy /etc/fstab with no entries, just a comment (thus ConditionFileNotEmpty= would not work either). Checking for the CAP_SYS_ADMIN capability would be appropriate (which is required for mounting), but that wouldn't work because of the above issue.
This service does succeed in a container without apparmor restrictions (--config raw.lxc=lxc.aa_profile=unconfined).
Adding ConditionPathIsReadWrite=!/ may be the simplest and most straightforward solution here.
These four units belong to the systemd package itself:
> dev-hugepages.mount loaded failed failed Huge Pages File System journald- audit.socket loaded failed failed Journal Audit Socket
> systemd-
These units attempt to not start in containers with less privileges with ConditionCapabi lity=CAP_ SYS_ADMIN and CAP_AUDIT_READ. This does work in nspawn, but it seems the LXD unprivileged containers pretend to have all these caps:
Capabilities for `1': = cap_chown, cap_dac_ override, cap_dac_ read_search, cap_fowner, cap_fsetid, cap_kill, cap_setgid, cap_setuid, cap_setpcap, cap_linux_ immutable, cap_net_ bind_service, cap_net_ broadcast, cap_net_ admin,cap_ net_raw, cap_ipc_ lock,cap_ ipc_owner, cap_sys_ chroot, cap_sys_ ptrace, cap_sys_ pacct,cap_ sys_admin, cap_sys_ boot,cap_ sys_nice, cap_sys_ resource, cap_sys_ tty_config, cap_mknod, cap_lease, cap_audit_ write,cap_ audit_control, cap_setfcap, cap_syslog, cap_wake_ alarm,cap_ block_suspend, 37+ep
Which is misleading. Can we start containers with only those capabilities which are actually namespace aware and available to the container, and hide the rest?
> systemd- sysctl. service loaded failed failed Apply Kernel Variables
This is supposed to not start via ConditionPathIs ReadWrite= /proc/sys/ , but tries anyway, and with debug logging I get
systemd- sysctl. service: ConditionPathIs ReadWrite= /proc/sys/ succeeded.
This is wrong as both "touch /proc/sys/foo" and "test -w /proc/sys" fail. I'll look into this.
> systemd- remount- fs.service loaded failed failed Remount Root and Kernel File Systems
This is has "ConditionPathE xists=/ etc/fstab" , but that's true for lxd containers because they have a dummy /etc/fstab with no entries, just a comment (thus ConditionFileNo tEmpty= would not work either). Checking for the CAP_SYS_ADMIN capability would be appropriate (which is required for mounting), but that wouldn't work because of the above issue.
This service does succeed in a container without apparmor restrictions (--config raw.lxc= lxc.aa_ profile= unconfined) .
Adding ConditionPathIs ReadWrite= !/ may be the simplest and most straightforward solution here.