Comment 35 for bug 1672819

Revision history for this message
In , colin.king (colin.king-linux-kernel-bugs) wrote :

There is a race condition on the fs->users counter and fs->in_exec flag that impacts exec() on a suid program when there are multiple threads being created and destroyed by a parent process. This in_exec flag was introduced way back in 2009 with commit:

commit 498052bba55ecaff58db6a1436b0e25bfd75a7ff
Author: Al Viro <email address hidden>
Date: Mon Mar 30 07:20:30 2009 -0400

When performing an exec() on a suid program, check_unsafe_exec() checks to see we have more f->fs->users than the count of child threads that share the same fs with the parent.

However, there are two race conditions that this seems to fail on:

1. The parent may be creating a new thread and copy_fs in kernel/fork.c bumps the fs->users count before the thread is attached to the parent hence causing the check p->fs->users > n_fs in check_unsafe_exec() to be true in check_unsafe_exec() and cause the execution a the suid program by the parent to fail to marked as unsafe and so it executes as a non-suid executable. The crux of the matter is that the fs->user count temporarily ahead by 1 with the number of threads linked to the process and we don't have any locking mechanism to protect us from this in the exec phase.

2. A thread my be dying and the associated fs pointer is nullified before it is removed from the parent and this also causes issues in the check_unsafe_exec() fs sanity check. I believe this can be fixed checking for t->fs being NULL, e.g.:

diff --git a/fs/exec.c b/fs/exec.c
index 72934df68471..ebfd9b76b69f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1447,7 +1447,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
        spin_lock(&p->fs->lock);
        rcu_read_lock();
        while_each_thread(p, t) {
- if (t->fs == p->fs)
+ if (t->fs == p->fs || !t->fs)
                        n_fs++;
        }
        rcu_read_unlock();

The reproducer is quite simple and always easy to reproduce:

$ cat Makefile
ALL=a b
all: $(ALL)

a: LDFLAGS=-pthread

b: b.c
 $(CC) b.c -o b
 sudo chown root:root b
 sudo chmod u+s b

test:
 for I in $$(seq 1000); do echo $I; ./a ; done

clean:
 rm -vf $(ALL)

$ cat a.c
#include <unistd.h>
#include <stdio.h>
#include <pthread.h>
#include <time.h>

void *nothing(void *p)
{
 return NULL;
}

void *target(void *p) {
 for (;;) {
  pthread_t t;
  if (pthread_create(&t, NULL, nothing, NULL) == 0)
   pthread_join(t, NULL);
     }
 return NULL;
}

int main(void)
{
 struct timespec tv;
 int i;

 for (i = 0; i < 10; i++) {
  pthread_t t;
  pthread_create(&t, NULL, target, NULL);
 }
 tv.tv_sec = 0;
 tv.tv_nsec = 100000;
 nanosleep(&tv, NULL);
 if (execl("./b", "./b", NULL) < 0)
  perror("execl");
 return 0;
}

$ cat b.c
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>

int main(void)
{
 const uid_t euid = geteuid();
 if (euid != 0) {
  printf("Failed, got euid %d (expecting 0)\n", euid);
         return 1;
 }
 return 0;
}

To reproduce:

make
make test

You will see "Failed, got euid 1000 (expecting 0)" errors whenever the suid program being exec'd fails to exec as a suid process because of the race and failure in check_unsafe_exec()