seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Christian Brauner | ||
Disco |
Fix Released
|
Medium
|
Unassigned | ||
Eoan |
Fix Released
|
Medium
|
Unassigned |
Bug Description
SRU Justification
Impact: Recently we landed seccomp support for SECCOMP_
This feature is heavily used in some userspace workloads. For example, it is currently used to intercept mknod() syscalls in user namespaces aka in containers. The mknod() syscall can be easily filtered based on dev_t. This allows us to only intercept a very specific subset of mknod() syscalls. Furthermore, mknod() is not possible in user namespaces toto coelo and so intercepting and denying syscalls that are not in the whitelist on accident is not a big deal. The watchee won't notice a difference.
In contrast to mknod(), a lot of other syscall we intercept (e.g. setxattr()) cannot be easily filtered like mknod() because they have pointer arguments. Additionally, some of them might actually succeed in user namespaces (e.g. setxattr() for all "user.*" xattrs). Since we currently cannot tell seccomp to continue from a user notifier we are stuck with performing all of the syscalls in lieu of the container. This is a huge security liability since it is extremely difficult to correctly assume all of the necessary privileges of the calling task
such that the syscall can be successfully emulated without escaping other additional security restrictions (think missing CAP_MKNOD for mknod(), or MS_NODEV on a filesystem etc.). This can be solved by telling seccomp to resume the syscall.
Fix: Allow the seccomp notifier to continue a syscall. A positive discussion about this feature was triggered by a post to the ksummit-discuss mailing list (cf. [3]) and took place during KSummit (cf. [1]) and again at the containers/
Regression Potential: Limited to seccomp. The patchset also comes with proper selftests in addition to the large set of seccomp selftests that are already there. This further reduces regression potential.
Test Case:
Compile a kernel with the patch applied and run the selftests or trap a syscall via the notifier fd and set the newly introduced flag. The syscall should then have continued.
Target Kernels: All current LTS kernels.
Patches:
https:/
https:/
/* References */
[1]: https:/
[2]: https:/
[3]: https://<email address hidden>
[4]: commit 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Changed in linux (Ubuntu): | |
assignee: | nobody → Christian Brauner (cbrauner) |
status: | New → In Progress |
Changed in linux (Ubuntu Disco): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Eoan): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Disco): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu Eoan): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-disco verification-done-eoan removed: verification-needed-disco verification-needed-eoan |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification- needed- disco' to 'verification- done-disco' . If the problem still exists, change the tag 'verification- needed- disco' to 'verification- failed- disco'.
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/ /wiki.ubuntu. com/Testing/ EnableProposed for documentation how to enable and use -proposed. Thank you!