> I disagree. This, in itself, is working normally. If you pass
> a file descriptor to some file, the recipient can use the
> file descriptor just like the sender could; and if you pass
> a file descriptor to a directory, the recipient gains the
> ability to look up surrounding paths.
> 
> I think the difference here is that I expect systemd's
> controls on this to be enforced in a way that is kind of
> like DAC - systemd prevents the service from directly
> accessing the filesystem, but allows it to gain access if
> that access is granted by another process with access -
> while you expect Mandatory Access Control that can not be
> overridden by other unprivileged userspace processes.

I understand what you mean and you are right, everything is working as it
should, but is that what we really want? As you mention in the initial report
"DynamicUser doesn't isolate the service from the rest of the system in terms of
UNIX domain sockets". Assuming the attack scenario of a vulnerable and
compromised local service, this would mean all mount-related protections are
actually useless in local exploitation and can be easily bypassed.

If the problem 1) is not really a problem, another PoC may be:

$ cat > service.sh
#!/bin/sh

cp /usr/bin/id /var/lib/accessible/mysuid
chmod +x /var/lib/accessible/mysuid
$ sudo systemd-run --property=DynamicUser=yes --property=ReadWritePaths=/var/lib/accessible ./service.sh

Without really the need of another process to help the service, as long as there
is a writable path (with no nosuid flag) somewhere accessible by both the local
attacker and the service (including using UNIX domain sockets, which is, at this
point, out of the flaw' scope, being just a mean to an end).

> 
> > 2) a service can create a setuid/setgid executable file
> > that can be used to get the temporary service UID even
> > after the service is terminated (thus with all the
> > problems of the UID recycling and accessing resources
> > that may be owned by a completely different service in
> > the future).
> 
> Yes. (But note that I pointed out that setgid files can
> also be created by processes that don't belong to the
> service if the service has set up a namespace's GID map
> appropriately.)

I was finally able to look at the sgid PoC you provided, and we could say it is
a separate flaw, do you agree? It seems to me it's related to a separate issue
and it's a separate fix.

> 
> > I think blocking "chmod()/fchmod() calls with modes that
> > include setuid/setgid bits" is just going to prevent the 2)
> > flaw, but not 1).
> 
> (Actually, in case someone's going to turn this into a
> syscall list, I think you'd want to filter all the
> following syscalls: open(), openat(), creat(), chmod(),
> fchmod(), fchmodat().)
> 
> Yes, blocking such syscalls will block most of the
> setuid/setgid creation problem, and it won't prevent
> accessing the filesystem through directory FDs that have
> been received over unix sockets. But I don't think that's
> a problem. If someone sends you a directory FD over a
> unix socket, they're just giving you access that is
> mostly equivalent to what you'd get if that service
> offered APIs for reading and writing arbitrary files
> (except that your ephemeral service doesn't have
> special access to anything, whereas the service or user
> giving you the FD might).
> 

However the service may have additional permission that the service providing
the FD didn't have (e.g. the service belongs to sshd group)