I can reproduce this with the simple pthreads-only reproducer (loop of ./a_p running setuid binary ./b) running 4.4.0-57-generic on bare metal.
$ for i in `seq 10`; do ./a_p; done GOT 1000 GOT 1000
$ for i in `seq 1000`; do ./a_p; done | wc -l 117
I can reproduce this with the simple pthreads-only reproducer (loop of ./a_p running setuid binary ./b) running 4.4.0-57-generic on bare metal.
$ for i in `seq 10`; do ./a_p; done
GOT 1000
GOT 1000
$ for i in `seq 1000`; do ./a_p; done | wc -l
117