I added a "WARN(1, "downgrading in subprocess %d %d\n", bprm->unsafe, (int)capable(CAP_SETUID))" which revealed that bprm->unsafe is 1 aka LSM_UNSAFE_SHARE.
So I think (and here it gets a bit sketchy) we're racing with copy_process in kernel/fork.c: that calls copy_fs (which is what increments p->fs->users) some way before it does the stuff necessary to make the new thread be included in the while_each_thread(p, t) loop. So n_fs is too low, the check triggers and the setuid bits get ignored.
I had a bit of a stare at the kernel source and suspected that the downgrade of uid is happening here: https:/ /github. com/torvalds/ linux/blob/ v4.4/security/ commoncap. c#L547- L548
I added a "WARN(1, "downgrading in subprocess %d %d\n", bprm->unsafe, (int)capable( CAP_SETUID) )" which revealed that bprm->unsafe is 1 aka LSM_UNSAFE_SHARE.
The only place (I can find) that bprm->unsafe is set to LSM_UNSAFE_SHARE is this check in check_unsafe_exec here (from https:/ /github. com/torvalds/ linux/blob/ v4.4/fs/ exec.c# L1281):
t = p; &p->fs- >lock); each_thread( p, t) {
n_fs = 1;
spin_lock(
rcu_read_lock();
while_
if (t->fs == p->fs)
n_fs++;
}
rcu_read_unlock();
if (p->fs->users > n_fs) &p->fs- >lock);
bprm->unsafe |= LSM_UNSAFE_SHARE;
else
p->fs->in_exec = 1;
spin_unlock(
So I think (and here it gets a bit sketchy) we're racing with copy_process in kernel/fork.c: that calls copy_fs (which is what increments p->fs->users) some way before it does the stuff necessary to make the new thread be included in the while_each_ thread( p, t) loop. So n_fs is too low, the check triggers and the setuid bits get ignored.
No idea at all how to fix this of course.