posix_spawn usage in gnu make causes failures on s390x

Bug #1886814 reported by Dimitri John Ledkov on 2020-07-08
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Medium
Unassigned
flatpak (Ubuntu)
Undecided
Unassigned
glibc (Ubuntu)
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned
make-dfsg (Ubuntu)
Undecided
Unassigned

Bug Description

posix_spawn usage in gnu make causes failures on s390x

Recently in gnu-make v4.3 https://paste.ubuntu.com/p/tYhbJFKN76/ it started to use posix_spawn, instead of fork()/exec().

This has caused failure of an unrelated package flatpak-builder autopkgtests on s390x only, like so

  echo Building
  make: echo: Operation not permitted
  make: *** [Makefile:2: all] Error 127

Julian Klaude investigated this in-depth. His earlier research also indicated that this is a heisenbug, if one tries to print to stderr before printing to stdout, no issue occurs.

We are configuring GNU make to be build with --disable-posix-spawn on s390x only. We passed these details to Debian https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=964541 too.

But I do wonder, if there is something different or incorrect about posix_spawn() implementation in either glibc, or linux kernel, on s390x. Or gnu-make's usage of posix_spawn().

As otherise, using posix_spawn() in gnu-make works on other architectures, and flatpak-builder autopkgtests pass too.

It seems very weird that stdout does not appear to be functional, unless stderr was opened/written to, from gnu-make execution compiled with posix-spawn feature.

CVE References

Frank Heimes (fheimes) on 2020-07-08
tags: added: reverse-proxy-bugzilla s390x
Changed in ubuntu-z-systems:
importance: Undecided → Medium

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1886814

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Florian Weimer (fweimer) wrote :

Is this a native s390x build, or something qemu-user? Thanks.

bugproxy (bugproxy) on 2020-07-08
tags: added: architecture-s39064 bugnameltc-186737 severity-high targetmilestone-inin2004
Dimitri John Ledkov (xnox) wrote :

> Is this a native s390x build, or something qemu-user? Thanks.

That's a very good question.

The failing autopkgtest, was run on an LPAR, which is running OpenStack Nova, which launched qemu-system kvm, v5.4 Ubuntu kernel, and then run make inside that.

I will double check if those old builds of make & autopkgtest reproduced the issue just on an LPAR without qemu in between. I believe they did, but don't have the logs anymore.

------- Comment From <email address hidden> 2020-07-16 01:28 EDT-------
Is it possible to turn this into a testcase we can run in isolation? I don't see what we can do here right now.

Julian Andres Klode (juliank) wrote :

Andreas, I've not gotten it more isolated, the minimum I had was running debian/tests/gnome-desktop-testing in a flatpak-builder source tree (apt source/pull-lp-source flatpak-builder).

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-07-21 08:57 EDT-------
Stefan Liebler (our Glibc guy) is having a look. He managed to reproduce it on his system.

Frank Heimes (fheimes) on 2020-07-21
Changed in ubuntu-z-systems:
status: New → Triaged
bugproxy (bugproxy) wrote :
Download full text (10.2 KiB)

------- Comment From <email address hidden> 2020-07-23 11:06 EDT-------
Hi,

I was able to reproduce the "make: echo: Operation not permitted" on my Ubuntu 20.04 s390x machine.
I've installed build and installed the mentioned make-dfsg_4.3-4ubuntu1 package without the "--disable-posix-spawn" configure flag.
I've build flatpak-builder_1.0.11-1 which executes the test which is triggering the "Operation not permitted".

Then I've adjusted the tests, thus I can also run them without building the package itself.
This test runs flatpak-builder which prepares some stuff (e.g. a root-directory with all needed files / binaries / libraries).
flatpak-builder then creates a container with bwrap and calls a configure skript, which generates a Makefile.
In a second invocation, make is invoked.

I've adjusted the configure script which now executed an own small program.
This program is first waiting some time, which I use to deterine its PID. Then I can either attach strace or gdb.
After the timeout, the program just execve's to make. Thus in the end I have a process-chain like:
flatpak-builder--bwrap---bwrap---configure---make

The strace output shows, that the clone syscall is failing with EPERM:
4269 17:08:47.914142 stat("/usr/bin/echo", {st_mode=S_IFREG|0755, st_size=39136, ...}) = 0 <0.000003>
4270 17:08:47.914155 geteuid() = 1001 <0.000001>
4271 17:08:47.914167 getegid() = 1001 <0.000002>
4272 17:08:47.914175 getuid() = 1001 <0.000001>
4273 17:08:47.914182 getgid() = 1001 <0.000001>
4274 17:08:47.914189 access("/usr/bin/echo", X_OK) = 0 <0.000005>
4275 17:08:47.914203 mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x3ff9c86b000 <0.000002>
4276 17:08:47.914214 rt_sigprocmask(SIG_BLOCK, ~[], [HUP INT QUIT TERM CHLD XCPU XFSZ], 8) = 0 <0.000001>
4277 17:08:47.914224 clone(child_stack=0x3ff9c874000, flags=CLONE_VM|CLONE_VFORK|SIGCHLD) = -1 EPERM (Operation not permitted) <0.000001>
4278 17:08:47.914235 munmap(0x3ff9c86b000, 36864) = 0 <0.000004>
4279 17:08:47.914245 rt_sigprocmask(SIG_SETMASK, [HUP INT QUIT TERM CHLD XCPU XFSZ], NULL, 8) = 0 <0.000001>

A gdb session showed that posix_spawn is called by make like that (Info: make is using vfork() if configured with "--disable-posix-spawn"):
jobs.c:child_execute_job (struct childbase *child, int good_stdin, char **argv)
posix_spawnattr_t attr;
posix_spawn_file_actions_t fa;
short flags = 0;
posix_spawnattr_init (&attr)
posix_spawn_file_actions_init (&fa)
flags |= POSIX_SPAWN_SETSIGMASK; => 0x08
flags |= POSIX_SPAWN_USEVFORK; => 0x40
fdin=0, fdout=1, fderr=2
flags |= POSIX_SPAWN_RESETIDS; => 0x01
=> flags = 0x49
posix_spawnattr_setflags (&attr, flags)
/* Start the program. */
while ((r = posix_spawn (&pid, cmd, &fa, &attr, argv,
child->environment)) == EINTR)
;

In glibc, the posix_spawn is doing this:
posix_spawn(...) -> __spawni(..., 0) -> __spawnix(..., __execve)
void *stack = __mmap (NULL, stack_size, prot, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
/* Disable asynchronous cancellation. */
__libc_signal_block_all (&args.oldmask);
# define CLONE(__fn, __stack, __stacksize, __flags, __args) \
__clone (__fn, ...

bugproxy (bugproxy) wrote :
Download full text (3.8 KiB)

------- Comment From <email address hidden> 2020-07-24 10:47 EDT-------
I've found the relevant code. It's the flatpak package. For a test on my s390x machine, I've just changed the check from arg 0 to arg 1:
diff -uNr ./flatpak-1.6.3/common/flatpak-run.orig.c ./flatpak-1.6.3/common/flatpak-run.c
--- ./flatpak-1.6.3/common/flatpak-run.orig.c 2020-07-24 15:57:17.583312438 +0200
+++ ./flatpak-1.6.3/common/flatpak-run.c 2020-07-24 16:23:35.880965987 +0200
@@ -2632,7 +2632,7 @@
{SCMP_SYS (unshare)},
{SCMP_SYS (mount)},
{SCMP_SYS (pivot_root)},
- {SCMP_SYS (clone), &SCMP_A0 (SCMP_CMP_MASKED_EQ, CLONE_NEWUSER, CLONE_NEWUSER)},
+ {SCMP_SYS (clone), &SCMP_A1 (SCMP_CMP_MASKED_EQ, CLONE_NEWUSER, CLONE_NEWUSER)},

/* Don't allow faking input to the controlling tty (CVE-2017-5226) */
{SCMP_SYS (ioctl), &SCMP_A1 (SCMP_CMP_MASKED_EQ, 0xFFFFFFFFu, (int) TIOCSTI)},

Note:
I've also looked into the "groovy" flatpak (1.8.1-1) source-code. There the code looks the same.

Afterwards, the seccomp filter looks like:
line CODE JT JF K
=================================
0000: 0x20 0x00 0x00 0x00000004 A = arch
0001: 0x15 0x00 0x1f 0x80000016 if (A != ARCH_S390X) goto 0033
0002: 0x20 0x00 0x00 0x00000000 A = sys_number
0003: 0x15 0x1c 0x00 0x00000015 if (A == mount) goto 0032
0004: 0x15 0x1b 0x00 0x00000033 if (A == acct) goto 0032
0005: 0x15 0x1a 0x00 0x00000056 if (A == uselib) goto 0032
0006: 0x15 0x19 0x00 0x00000067 if (A == syslog) goto 0032
0007: 0x15 0x18 0x00 0x00000083 if (A == quotactl) goto 0032
0008: 0x15 0x17 0x00 0x000000d9 if (A == pivot_root) goto 0032
0009: 0x15 0x16 0x00 0x0000010c if (A == mbind) goto 0032
0010: 0x15 0x15 0x00 0x0000010d if (A == get_mempolicy) goto 0032
0011: 0x15 0x14 0x00 0x0000010e if (A == set_mempolicy) goto 0032
0012: 0x15 0x13 0x00 0x00000116 if (A == add_key) goto 0032
0013: 0x15 0x12 0x00 0x00000117 if (A == request_key) goto 0032
0014: 0x15 0x11 0x00 0x00000118 if (A == keyctl) goto 0032
0015: 0x15 0x10 0x00 0x0000011f if (A == migrate_pages) goto 0032
0016: 0x15 0x0f 0x00 0x0000012f if (A == unshare) goto 0032
0017: 0x15 0x0e 0x00 0x00000136 if (A == move_pages) goto 0032
0018: 0x15 0x00 0x05 0x00000036 if (A != ioctl) goto 0024
0019: 0x20 0x00 0x00 0x00000018 A = cmd # ioctl(fd, cmd, arg)
0020: 0x54 0x00 0x00 0x00000000 A &= 0x0
0021: 0x15 0x00 0x09 0x00000000 if (A != 0) goto 0031
0022: 0x20 0x00 0x00 0x0000001c A = cmd >> 32 # ioctl(fd, cmd, arg)
0023: 0x15 0x08 0x07 0x00005412 if (A == 0x5412) goto 0032 else goto 0031
0024: 0x15 0x00 0x06 0x00000078 if (A != clone) goto 0031
0025: 0x20 0x00 0x00 0x00000018 A = newsp # clone(clone_flags, newsp, parent_tidptr, child_tidptr, tls)
0026: 0x54 0x00 0x00 0x00000000 A &= 0x0
0027: 0x15 0x00 0x03 0x00000000 if (A != 0) goto 0031
0028: 0x20 0x00 0x00 0x0000001c A = newsp >> 32 # clone(clone_flags, newsp, parent_tidptr, child_tidptr, tls)
=> Now argument 1 (on s390x: flags; on x86_64: stack-pointer) is checked and clone works as expected.
0029: 0x54 0x00 0x00 0x10000000 A &= 0x10000000
0030: 0x15 0x01 0x00 0x10000000 if (A == 268435456) goto 0032
0031: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0032: 0x06 0x00 0x00 0x00050001 return ERRNO(...

Read more...

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-08-05 09:16 EDT-------
Hi xnox,
have you had the chance to retest flatpak-builder autopkgtests with an adjusted flatpak package like mentioned in my previous comment?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-08-26 09:11 EDT-------
@Canonical, any update available? Many thx

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-09-01 12:36 EDT-------
This is now fixed on flatpak master, see merged pull-request:
"Fix argument order of clone() for s390x in seccomp filter #3777"
https://github.com/flatpak/flatpak/pull/3777

Dimitri John Ledkov (xnox) wrote :

All tests run automatically, and yes flatpak-builder was automatically tested and new flatpak with this fix migrated fine.

See http://autopkgtest.ubuntu.com/packages/flatpak-builder/groovy/s390x

Changed in flatpak (Ubuntu):
status: New → Fix Released
Changed in glibc (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in make-dfsg (Ubuntu):
status: New → Invalid
Frank Heimes (fheimes) on 2020-09-03
Changed in ubuntu-z-systems:
status: Triaged → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-09-04 03:04 EDT-------
IBM Bugzilla status->closed, Fix Released

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers