s390x autopkgtest regression of libflame vs glibc in Jammy

Bug #2024207 reported by Simon Chopin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
Invalid
Undecided
Simon Chopin
Jammy
Invalid
Critical
Simon Chopin

Bug Description

The libflame autopgktests on Jammy are now failing on s390x against glibc 2.35-0ubuntu3.2.

It's triggering a timeout in the numpy-with-libflame test suite. To reproduce, you need python3-numpy, libflame1 and libflame-dev install.

The issue seems to be in numpy/f2py/tests/test_compile_function.py::test_f2py_init_compile. To be able to investigate this, I had to change /usr/lib/python3/dist-packages/numpy/_pytesttester.py, line 183:

- pytest_args += ["-m", label]
+ pytest_args += ["-k", label]

and then I used the following Python script to reproduce:

#!/usr/bin/python3

import numpy as np
np.test("test_f2py_init_compile", verbose=3)

I haven't managed to go further yet, except that I know that the bug doesn't seem to trigger if running under strace.

Simon Chopin (schopin)
Changed in glibc (Ubuntu Jammy):
importance: Undecided → Critical
Changed in glibc (Ubuntu):
importance: Critical → Undecided
Changed in glibc (Ubuntu Jammy):
status: New → Triaged
Changed in glibc (Ubuntu):
status: Triaged → Fix Released
tags: added: regression-proposed update-excuse
Simon Chopin (schopin)
Changed in glibc (Ubuntu Jammy):
assignee: nobody → Simon Chopin (schopin)
Simon Chopin (schopin)
description: updated
Revision history for this message
Simon Chopin (schopin) wrote :

TL;DR: Now the tests pass, but I didn't do a thing.

Long follow up on this: I was investigating this on a fairly beefy VM (8 cores, 16G RAM), and managed to reproduce the issue quickly with a ~60% hit rate.

The test that times out is basically a thin wrapper around a subprocess invocation (via subprocess.run) of a Python interpreter, which itself uses the Python multiprocessing system to execute the Fortran compiler.

When the issue occurs, the entire pool of the mp subprocess is waiting for new tasks, except for a single thread that waits on a kernel semaphore. Since the Python stack for that thread is entirely in the CPython codebase and is in a finalizer, I would guess there's a race condition on freeing up a lock on a shared resource, which I'd wager is stdout or similar.

Removing the pthread-related patch from the glibc SRU didn't improve the situation, despite being the most likely culprit (bug 2007796), so I figured I'd try to reproduce on a VM with similar capabilities as the ones on the autopkgtest infra (4c/8G as libflame is marked as big) before trying anything else.

Lo and behold, on that new VM I was unable to reproduce the issue. Puzzled, I asked the nice folks in the QA team if by any chance the doc for the VM sizing was out-of-date. It's not, and they even kindly gave me access to a VM directly on the infra. I still was unable to reproduce.

Finally I just re-ran the tests, and now they pass. Comparing the logs, the only difference I could spot is the upgrade linux-libc-dev 5.15.0-73.80 -> 5.15.0-75.82.

Also of note, it turns out those tests have been disabled in subsequent versions in Debian as they're flaky and don't provide much value since numpy isn't compiled with libflame support, so, if the issue comes back, I'll probably ask for them to be hinted.

Changed in glibc (Ubuntu):
status: Fix Released → Invalid
Changed in glibc (Ubuntu Jammy):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.