There is a race condition on the fs->users counter and fs->in_exec flag that impacts exec() on a suid program when there are multiple threads being created and destroyed by a parent process. This in_exec flag was introduced way back in 2009 with commit:
commit 498052bba55ecaff58db6a1436b0e25bfd75a7ff
Author: Al Viro <email address hidden>
Date: Mon Mar 30 07:20:30 2009 -0400
When performing an exec() on a suid program, check_unsafe_exec() checks to see we have more f->fs->users than the count of child threads that share the same fs with the parent.
However, there are two race conditions that this seems to fail on:
1. The parent may be creating a new thread and copy_fs in kernel/fork.c bumps the fs->users count before the thread is attached to the parent hence causing the check p->fs->users > n_fs in check_unsafe_exec() to be true in check_unsafe_exec() and cause the execution a the suid program by the parent to fail to marked as unsafe and so it executes as a non-suid executable. The crux of the matter is that the fs->user count temporarily ahead by 1 with the number of threads linked to the process and we don't have any locking mechanism to protect us from this in the exec phase.
2. A thread my be dying and the associated fs pointer is nullified before it is removed from the parent and this also causes issues in the check_unsafe_exec() fs sanity check. I believe this can be fixed checking for t->fs being NULL, e.g.:
You will see "Failed, got euid 1000 (expecting 0)" errors whenever the suid program being exec'd fails to exec as a suid process because of the race and failure in check_unsafe_exec()
There is a race condition on the fs->users counter and fs->in_exec flag that impacts exec() on a suid program when there are multiple threads being created and destroyed by a parent process. This in_exec flag was introduced way back in 2009 with commit:
commit 498052bba55ecaf f58db6a1436b0e2 5bfd75a7ff
Author: Al Viro <email address hidden>
Date: Mon Mar 30 07:20:30 2009 -0400
When performing an exec() on a suid program, check_unsafe_exec() checks to see we have more f->fs->users than the count of child threads that share the same fs with the parent.
However, there are two race conditions that this seems to fail on:
1. The parent may be creating a new thread and copy_fs in kernel/fork.c bumps the fs->users count before the thread is attached to the parent hence causing the check p->fs->users > n_fs in check_unsafe_exec() to be true in check_unsafe_exec() and cause the execution a the suid program by the parent to fail to marked as unsafe and so it executes as a non-suid executable. The crux of the matter is that the fs->user count temporarily ahead by 1 with the number of threads linked to the process and we don't have any locking mechanism to protect us from this in the exec phase.
2. A thread my be dying and the associated fs pointer is nullified before it is removed from the parent and this also causes issues in the check_unsafe_exec() fs sanity check. I believe this can be fixed checking for t->fs being NULL, e.g.:
diff --git a/fs/exec.c b/fs/exec.c .ebfd9b76b69f 100644 exec(struct linux_binprm *bprm)
spin_lock( &p->fs- >lock);
rcu_read_ lock();
while_ each_thread( p, t) {
n_fs+ +;
rcu_read_ unlock( );
index 72934df68471.
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1447,7 +1447,7 @@ static void check_unsafe_
- if (t->fs == p->fs)
+ if (t->fs == p->fs || !t->fs)
}
The reproducer is quite simple and always easy to reproduce:
$ cat Makefile
ALL=a b
all: $(ALL)
a: LDFLAGS=-pthread
b: b.c
$(CC) b.c -o b
sudo chown root:root b
sudo chmod u+s b
test:
for I in $$(seq 1000); do echo $I; ./a ; done
clean:
rm -vf $(ALL)
$ cat a.c
#include <unistd.h>
#include <stdio.h>
#include <pthread.h>
#include <time.h>
void *nothing(void *p)
{
return NULL;
}
void *target(void *p) {
for (;;) {
pthread_t t;
if (pthread_create(&t, NULL, nothing, NULL) == 0)
pthread_join(t, NULL);
}
return NULL;
}
int main(void)
{
struct timespec tv;
int i;
for (i = 0; i < 10; i++) { create( &t, NULL, target, NULL);
pthread_t t;
pthread_
}
tv.tv_sec = 0;
tv.tv_nsec = 100000;
nanosleep(&tv, NULL);
if (execl("./b", "./b", NULL) < 0)
perror("execl");
return 0;
}
$ cat b.c
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
int main(void)
{
const uid_t euid = geteuid();
if (euid != 0) {
printf("Failed, got euid %d (expecting 0)\n", euid);
return 1;
}
return 0;
}
To reproduce:
make
make test
You will see "Failed, got euid 1000 (expecting 0)" errors whenever the suid program being exec'd fails to exec as a suid process because of the race and failure in check_unsafe_exec()