seccomp argument filtering not working properly on any distro that uses upstream golang-seccomp 0.9.0

Bug #1825052 reported by Jamie Strandboge
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Fix Released
Low
Jamie Strandboge

Bug Description

In diagnosing another issue, I discovered that on Fedora 28, seccomp works for simple syscalls but not with argument filtering.

On Ubuntu, notice:
$ snap version
snap 2.38+19.04
snapd 2.38+19.04
series 16
ubuntu 19.04
kernel 5.0.0-8-generic

$ sudo snap install test-snapd-tools

$ sudo snap run --strace test-snapd-tools.cmd nice -n -10 uptime 2>&1 | grep setpriority
[pid 25015] setpriority(PRIO_PROCESS, 0, -10) = -1 EPERM (Operation not permitted)

On Fedora 28:

$ snap version
snap 2.38-2.fc28
snapd 2.38-2.fc28
series 16
fedora 28
kernel 5.0.7-100.fc28.x86_64

$ sudo snap install test-snapd-tools

$ sudo snap run --strace test-snapd-tools.cmd nice -n -10 uptime 2>&1 | grep setpriority
[pid 11788] setpriority(PRIO_PROCESS, 0, -10) = 0

If we adjust /var/lib/snapd/seccomp/bpf/snap.test-snapd-tools.cmd.src to have:
#setpriority PRIO_PROCESS 0 <=19

then do, we correctly see it is denied:
$ sudo /usr/libexec/snapd/snap-seccomp compile /var/lib/snapd/seccomp/bpf/snap.test-snapd-tools.cmd.src /var/lib/snapd/seccomp/bpf/snap.test-snapd-tools.cmd.bin && sudo snap run --strace test-snapd-tools.cmd nice -n -10 uptime 2>&1 | grep setpriority
[pid 12348] setpriority(PRIO_PROCESS, 0, -10) = -1 EPERM (Operation not permitted)

then if we add back just the syscall without argument filtering, it succeeds, as expected:
$ sudo /usr/libexec/snapd/snap-seccomp compile /var/lib/snapd/seccomp/bpf/snap.test-snapd-tools.cmd.src /var/lib/snapd/seccomp/bpf/snap.test-snapd-tools.cmd.bin && sudo snap run --strace test-snapd-tools.cmd nice -n -10 uptime 2>&1 | grep setpriority
[pid 13127] setpriority(PRIO_PROCESS, 0, -10) = 0

If we try to allow a specific value (eg, '9'), then we can see it correctly allows a nice of '9' but incorrectly also allows both '8' and '10':

$ sudo /usr/libexec/snapd/snap-seccomp compile /var/lib/snapd/seccomp/bpf/snap.test-snapd-tools.cmd.src /var/lib/snapd/seccomp/bpf/snap.test-snapd-tools.cmd.bin && sudo snap run --strace test-snapd-tools.cmd nice -n 9 uptime 2>&1 | grep setpriority
[pid 14272] setpriority(PRIO_PROCESS, 0, 9) = 0

$ sudo snap run --strace test-snapd-tools.cmd nice -n 8 uptime 2>&1 | grep setpriority
[pid 15096] setpriority(PRIO_PROCESS, 0, 8) = 0

$ sudo snap run --strace test-snapd-tools.cmd nice -n 10 uptime 2>&1 | grep setpriority
[pid 15422] setpriority(PRIO_PROCESS, 0, 10) = 0

It seems that fedora-28 is not using mvo's seccomp-golang. Not sure if this has anything to do with this.

The same behavior happens on Debian:

$ SNAP_REEXEC=0 snap version
snap 2.38
snapd 2.38
series 16
debian -
kernel 4.19.0-4-amd64

$ SNAP_REEXEC=0 sudo snap run --strace test-snapd-tools.cmd nice -n -10 uptime 2>&1 | grep setpriority
[pid 1484] setpriority(PRIO_PROCESS, 0, -10) = 0

Interestingly, re-enabling reexec it starts to work on sid with 2.38:
$ snap version
snap 2.38
snapd 2.38
series 16
debian -
kernel 4.19.0-4-amd64

$ sudo snap run --strace test-snapd-tools.cmd nice -n -10 uptime 2>&1 | grep setpriority
[pid 1643] setpriority(PRIO_PROCESS, 0, -10) = -1 EPERM (Operation not permitted)

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

This is technically a security bug but not marking it as such because without file mediation the seccomp policy provides no real protection (the security technologies are meant to work together).

description: updated
summary: - seccomp argument filtering not working on Fedora
+ seccomp argument filtering not working on Fedora and Debian
description: updated
summary: - seccomp argument filtering not working on Fedora and Debian
+ seccomp argument filtering not working on Fedora with 2.38 and Debian
+ with 2.37.4
Revision history for this message
Jamie Strandboge (jdstrand) wrote : Re: seccomp argument filtering not working on Fedora with 2.38 and Debian with 2.37.4

The difference between Debian 2.37.4 and 2.38 under reexec probably has to do with the reexec'd 2.38 are from Ubuntu, which doesn't have this problem.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I looked at this a little more and the pfc (generate with 'SNAP_SECCOMP_DEBUG=1 /usr/libexec/snapd/snap-seccomp compile /var/lib/snapd/seccomp/bpf/snap.hello-world.sh.src /tmp/snap.hello-world.sh.bin > /tmp/snap.hello-world.sh.pfc) on Fedora (libseccomp 2.3.3) is simply wrong for our 'setpriority PRIO_PROCESS 0 <=19' rule (note, on amd64, sepriority is '141' as seen with scmp_sys_resolver) since it isn't ANDing the arguments:

# filter for arch x86 (1073741827)
if ($arch == 1073741827)
...
  # filter for syscall "setpriority" (141) [priority: 65529]
  if ($syscall == 141)
    if ($a0.hi32 == 0)
      if ($a0.lo32 == 0)
        action ALLOW;
    if ($a1.hi32 == 0)
      if ($a1.lo32 == 0)
        action ALLOW;
    if ($a2.hi32 >= 0)
      if ($a2.lo32 > 19)
      else
        action ALLOW;
    else
      action ALLOW;

Debian unstable (libseccomp 2.3.3) is also wrong:

# filter for arch x86_64 (3221225534)
if ($arch == 3221225534)
...
  # filter for syscall "setpriority" (141) [priority: 65529]
  if ($syscall == 141)
    if ($a0.hi32 == 0)
      if ($a0.lo32 == 0)
        action ALLOW;
    if ($a1.hi32 == 0)
      if ($a1.lo32 == 0)
        action ALLOW;
    if ($a2.hi32 >= 0)
      if ($a2.lo32 > 19)
      else
        action ALLOW;
    else
      action ALLOW;

on Ubuntu 16.04 (libseccomp 2.3.1), for example, this rule is:
# filter for arch x86_64 (3221225534)
if ($arch == 3221225534)
...
  # filter for syscall "setpriority" (141) [priority: 65529]
  if ($syscall == 141)
    if ($a0.hi32 == 0)
      if ($a0.lo32 == 0)
        if ($a1.hi32 == 0)
          if ($a1.lo32 == 0)
            if ($a2.hi32 >= 0)
              if ($a2.lo32 > 19)
              else
                action ALLOW;
            else
              action ALLOW;

which is correctly ANDing the args (though we should be masking the high bits since a high negative number will be allowed; will fix in a PR). On Ubuntu 18.10 (libseccomp 2.3.3), it is the same as on Ubuntu 16.04:

# filter for arch x86_64 (3221225534)
if ($arch == 3221225534)
...
  # filter for syscall "setpriority" (141) [priority: 65529]
  if ($syscall == 141)
    if ($a0.hi32 == 0)
      if ($a0.lo32 == 0)
        if ($a1.hi32 == 0)
          if ($a1.lo32 == 0)
            if ($a2.hi32 >= 0)
              if ($a2.lo32 > 19)
              else
                action ALLOW;
            else
              action ALLOW;

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I looked at the differences between Debian unstable and 18.10 and found that Ubuntu is carrying the ACTLOG patch. I didn't think that this would actually make a difference in the resulting pfc, but just to be sure, I rebuilt libseccomp without that patch on 18.10, then install those packages, then built snapd with those installed (since the snapd packages will statically link libseccomp). Indeed, the pfc was the same as the distro libseccomp. This makes me suspect the difference is between the golang-seccomp that Ubuntu uses and the one everyone else uses.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Ok, as it happens, upstream golang-seccomp 0.9.0 has a bug where it will incorrectly OR. The following commit is in upstream master but not a new release of golang-seccomp:

https://github.com/seccomp/libseccomp-golang/commit/06e7a29f36a34b8cf419aeb87b979ee508e58f9e

In the upstream library, when added with a single API call,
multiple syscall argument rules should be matched with AND
logic - if all of them match, the rule matches.

At present, the Golang bindings apply OR logic to this case.
This commit resolves this and reverts to the behavior of the
main library.

Signed-off-by: Matthew Heon <email address hidden>

Ubuntu uses a git snapshot: https://github.com/mvo5/libseccomp-golang/commits/master which has this fix. Any distro that doesn't use snapd's vendored golang-seccomp is not correctly mediating arg filtered rules, which explains why Ubuntu has proper ANDing while Debian and Fedora do not.

Changed in snapd:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

FYI, at this point the seccomp sandbox should be disabled on any distro that doesn't have the upstream https://github.com/seccomp/libseccomp-golang/commit/06e7a29f36a34b8cf419aeb87b979ee508e58f9e (either via distro patch to libseccomp or using snapd's vendored golang-seccomp) because otherwise snap-seccomp generates invalid bpf.

summary: - seccomp argument filtering not working on Fedora with 2.38 and Debian
- with 2.37.4
+ seccomp argument filtering not working on and distro that uses upstream
+ golang-seccomp 0.9.0
Revision history for this message
Jamie Strandboge (jdstrand) wrote :
summary: - seccomp argument filtering not working on and distro that uses upstream
+ seccomp argument filtering not working on any distro that uses upstream
golang-seccomp 0.9.0
summary: - seccomp argument filtering not working on any distro that uses upstream
- golang-seccomp 0.9.0
+ seccomp argument filtering not working properly on any distro that uses
+ upstream golang-seccomp 0.9.0
Revision history for this message
Samuele Pedroni (pedronis) wrote :

@jdstrand what is the status of this one?

Changed in snapd:
assignee: nobody → Jamie Strandboge (jdstrand)
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

@pedronis - this was filed when working on system-usernames. Older golang-seccomp will not generate proper bpfs when using argument filtering in various ways. Ie, in non-system-username policy, using '>=' and '<=' comparisons, but this only affects one rule: setpriority (there is also a corresponding 'nice' rule, but this maps to setpriority in the kernel).

On systems with apparmor, this is not a problem because you also need CAP_SYS_NICE for more favorable prioritization which is not granted unless a snap plugs 'process-control' or slots audio-playback or pulseaudio, but when doing that, we grant a bare 'setpriority' rule anyway (this is an argument in support of our current philosophy of security in depth with the various subsystems ;)

On systems without apparmor, as mentioned in comment #1, the seccomp sandbox alone provides no meaningful defense since a snap can, for example, simply change its security policy on disk to have full setpriority. While the bug advises disabling the entire sandbox on these systems, we've more recently taken the positions that snapd will enable the security systems that are available rather than forcing devmode to simplify the implementation, and in that spirit I don't think that adding complexity for the sake of the setpriority rule that can readily be bypassed justifies disabling the seccomp sandbox (though I'll add a policy comment).

For context, since this bug was filed, we added more extensive argument filtering with the system-usernames feature, but we also now have checks for 'robust argument filtering' in snapd that will fail the install of a snap that uses system-usernames on systems that don't meet the requirements for robust argument filtering (eg, 'require a snapd built against (libseccomp >= 2.4|golang-seccomp >= 0.9.1)'). Furthermore, as seen in tests/main/system-usernames/task.yaml only a few systems from our CI test systems these days don't meet the requirements: centos-7-64 debian-9-64 fedora-29-64 fedora-30-64 opensuse-15.0-64 ubuntu-14.

Altogether, I'm going to mark this as Fix Released since snapd is doing all we plan to do (there is an argument for Won't Fix I suppose, but it is not as strong as Fix Released). Adjusting the priority to 'Low' since the system-usernames feature is not blocked on it and due to the impact of the setpriority rule.

Changed in snapd:
importance: High → Low
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.