Which raises the question whether this capability needs to be effective (see man 7 capabilities) in the user namespace of the unprivileged container or in the initial user namespace.
The "capable" function in the kernel checks the presence of a capability for the **initial** user namespace (the function comments seem to refer to that as having a "superior capability")
5. in #10 I referred to an ability of a process with CAP_IPC_LOCK to bypass the RLIMIT_MEMLOCK:
https:/ /elixir. bootlin. com/linux/ v4.15.18/ source/ mm/mmap. c#L1300 (mlock_ future_ check) CAP_IPC_ LOCK))
if (locked > lock_limit && !capable(
return -EAGAIN;
Which raises the question whether this capability needs to be effective (see man 7 capabilities) in the user namespace of the unprivileged container or in the initial user namespace.
Based on what I see, CAP_IPC_LOCK is not dropped for unprivileged containers (also based on a comment from Stephane here https:/ /discuss. linuxcontainers .org/t/ how-to- add-cap- ipc-lock- capabilities- to-container/ 484/2):
$ ps 17228
PID TTY STAT TIME COMMAND
17228 ? Ss 0:00 /sbin/init
$ grep Cap /proc/17228/status
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
$ capsh --decode= 0000003ffffffff f fff=cap_ chown,cap_ dac_override, cap_dac_ read_search, cap_fowner, cap_fsetid, cap_kill, cap_setgid, cap_setuid, cap_setpcap, cap_linux_ immutable, cap_net_ bind_service, cap_net_ broadcast, cap_net_ admin,cap_ net_raw, cap_ipc_ lock,cap_ ipc_owner, cap_sys_ module, cap_sys_ rawio,cap_ sys_chroot, cap_sys_ ptrace, cap_sys_ pacct,cap_ sys_admin, cap_sys_ boot,cap_ sys_nice, cap_sys_ resource, cap_sys_ time,cap_ sys_tty_ config, cap_mknod, cap_lease, cap_audit_ write,cap_ audit_control, cap_setfcap, cap_mac_ override, cap_mac_ admin,cap_ syslog, cap_wake_ alarm,cap_ block_suspend, cap_audit_ read
0x0000003ffffff
The "capable" function in the kernel checks the presence of a capability for the **initial** user namespace (the function comments seem to refer to that as having a "superior capability")
https:/ /elixir. bootlin. com/linux/ v4.15.18/ source/ kernel/ capability. c#L429 (capable) /elixir. bootlin. com/linux/ v4.15.18/ source/ kernel/ user.c# L26
https:/
struct user_namespace init_user_ns = {
As opposed to the ns_capable function, for example: /elixir. bootlin. com/linux/ v4.15.18/ source/ kernel/ capability. c#L395 (ns_capable)
https:/
Therefore, we will not be able to use CAP_IPC_LOCK for users in unprivileged LXD containers to bypass RLIMIT_MEMLOCK.