openjdk-17-jre-headless 17.0.10+7-1~22.04.1: segfault in jspawnhelper

Bug #2055280 reported by Dimitry Andric
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
openjdk-17 (Ubuntu)
Confirmed
Undecided
Unassigned
unattended-upgrades (Ubuntu)
New
Undecided
Unassigned

Bug Description

We recently upgraded a bunch of Jenkins build machines that run Ubuntu 22.04.04 LTS to openjdk-17-jre-headless_17.0.10+7-1~22.04.1. Shortly after, all the Jenkins agents running on these machines were getting segfaults in jspawnhelper, when the JRE tried to spawn an external shell to run build jobs:

$ /bin/sh -xe /tmp/jenkins12814566742512325555.sh
FATAL: command execution failed
java.io.IOException: error=0, Failed to exec spawn helper: pid: 1291715, signal: 11
 at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
 at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
 at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
 at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to ubuntu22-amd64-docker-3
  at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1787)
  at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
  at hudson.remoting.Channel.call(Channel.java:1003)
  at hudson.Launcher$RemoteLauncher.launch(Launcher.java:1121)
  at hudson.Launcher$ProcStarter.start(Launcher.java:506)
  at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:144)
  at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
  at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
  at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
  at hudson.model.Build$BuildExecution.build(Build.java:199)
  at hudson.model.Build$BuildExecution.doRun(Build.java:164)
  at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
  at hudson.model.Run.execute(Run.java:1895)
  at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
  at hudson.model.ResourceController.execute(ResourceController.java:101)
  at hudson.model.Executor.run(Executor.java:442)

I'm unsure how Jenkins's agent exactly invokes jspawnhelper, but I assume it just uses Java's regular API to run external processes, as shown in the above stack trace. I.e. it uses java.lang.ProcessBuilder, which implicitly invokes its java.lang.ProcessImpl.forkAndExec method. And finally I assume that java.lang.ProcessImpl.forkAndExec somehow forks jspawnhelper to do its thing.

In any case, with 17.0.10+7-1~22.04.1 this almost always results in a segfault now, which the previous version of openjdk-17-jre-headless never did. So it is some sort of regression.

For now, I have downgraded to the release version, which is 17.0.2+8-1, since the previous security update version has disappeared from the Ubuntu mirrors.

I have not yet dug any deeper since I have a lack of time to spend on it, but I wanted to make this bug report so other people experiencing this might be able to find it on Launchpad.

Revision history for this message
Vladimir Petko (vpa1977) wrote :

Hi,

process spawning works (both jtreg tests and a small reproducer below):
---
public class Test {
    public static void main(String[] args) throws Throwable {
        Process p = new ProcessBuilder("ls", "-alrt", "/tmp").start();
        p.waitFor();
    }
}
---
but there were changes in jspawnhelper that might be triggering the crash.
Now it expects the child data to be passed through argv[1] - see [1].

I will try to set up jenkins environment to see if I can reproduce it.

[1] https://github.com/openjdk/jdk17u/commit/cd6cb730c934d8e16d4bd8e3342e59e806f158f9

Revision history for this message
Dimitry Andric (dimitry.unified-streaming.com) wrote :

Possibly related: https://issues.jenkins.io/browse/JENKINS-72665

So the question is what the Jenkins agent does: I'm not sure it invokes jspawnhelper directly, I assume it is going via the Java API, but there could be some other bug that causes it to pass incorrect arguments to jspawnhelper. It looks like the command line interface of jspawnhelper is not very bullet proof, which is understandable since it's not meant to be run directly. But still, segfaulting is bad :)

Revision history for this message
Vladimir Petko (vpa1977) wrote :

I agree - they do not check argc there, causing the startup segfault.

I have done the test in lxc container:

$lxc launch ubuntu-daily:noble

There i have installed jenkins weekly release:
---
sudo wget -O /usr/share/keyrings/jenkins-keyring.asc \
  https://pkg.jenkins.io/debian/jenkins.io-2023.key
echo deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc] \
  https://pkg.jenkins.io/debian binary/ | sudo tee \
  /etc/apt/sources.list.d/jenkins.list > /dev/null
sudo apt-get update
sudo apt-get install jenkins
--

and configured a pipeline to build a maven project[1]

I have configured an agent to connect and ran it as following:

JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64/ java -jar agent.jar -url http://<ip>:8080/ -secret @secret-file -name "self-server" -workDir "/home/test/work"

The build successfully clones the project, starts maven and builds it.

I wonder if there is something specific in your setup (e.g. a docker container used as an agent) that may contribute to the issue?

[1] https://github.com/vpa1977/spring-petclinic/tree/spring-boot-2.7.3

Revision history for this message
Dimitry Andric (dimitry.unified-streaming.com) wrote :

There isn't anything special as far as I know, it's just a plain Ubuntu 22.04 VM which is accessed by Jenkins over SSH. Note that we're using the stable branch Jenkins, which is at 2.440.1, so it may be possible that only that version is buggy.

I noticed a `_usr_lib_jvm_java-17-openjdk-amd64_lib_jspawnhelper.1007.crash` file in `/var/crash`, so I could unpack that and throw it in gdb:

```
Core was generated by `41:44'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __rawmemchr_evex () at ../sysdeps/x86_64/multiarch/memchr-evex.S:111
Download failed: Invalid argument. Continuing without source file ./string/../sysdeps/x86_64/multiarch/memchr-evex.S.
111 ../sysdeps/x86_64/multiarch/memchr-evex.S: No such file or directory.
(gdb) bt
#0 __rawmemchr_evex () at ../sysdeps/x86_64/multiarch/memchr-evex.S:111
#1 0x00007f21c298d9e8 in _IO_str_init_static_internal (sf=sf@entry=0x7fffde73e550, ptr=ptr@entry=0x0, size=size@entry=0, pstart=pstart@entry=0x0) at ./libio/strops.c:41
#2 0x00007f21c2960323 in _IO_strfile_read (string=0x0, sf=0x7fffde73e550) at ../libio/strfile.h:95
#3 __GI___isoc99_sscanf (s=0x0, format=format@entry=0x55f74431f0a1 "%d:%d") at ./stdio-common/isoc99_sscanf.c:28
#4 0x000055f74431d391 in main (argc=<optimized out>, argv=<optimized out>) at src/java.base/unix/native/jspawnhelper/jspawnhelper.c:140
```

So for some reason, it looks like `argv[0]` is actually "41:44", which results in sscanf() being called on argv[1] which is NULL. I have no idea yet whether this is a Jenkins bug or a Java bug.

Revision history for this message
Dimitry Andric (dimitry.unified-streaming.com) wrote :

Okay, I think the mystery might be solved.

The root cause is that unattended-upgrades (or some other apt upgrade) does a openjdk-17 package update, while at the same time a java process is running. After this minor upgrade, the protocol between the JRE's forkAndExec JNI function and the jspawnhelper tool is changed! The jspawnhelper tool now expects argv[0] to be the executable name of itself, argv[1] to be a "%d:%d" format string with two file descriptors, and argv[2] to be NULL.

However, the any already-running java process will still use the old protocol, which invoked jspawnhelper with the "%d:%d" format string in argv[0], and argv[1] set to NULL. This is what makes the new jspawnhelper executable segfault.

Therefore, with this particular openjdk-17 upgrade, even it is a minor 'patch' upgrade, it is vital that _ALL_ java processes that intend to spawn external processes are immediately terminated, and restarted.

I would suggest a BIG PROMINENT note in the upgrade message for this particular update, since it is likely to bite a lot of people...

Some references:

https://bugs.openjdk.org/browse/JDK-8310265 ("(process) jspawnhelper should not use argv[0]") is the bug that eventually changed to the JRE <--> jspawnhelper protocol

https://github.com/openjdk/jdk17u-dev/commit/cd6cb730c934d8e16d4bd8e3342e59e806f158f9 is the corresponding commit for OpenJDK 17.

https://bugs.openjdk.org/browse/JDK-8325567 ("jspawnhelper without args fails with segfault") is a related upstream bug. I also noticed the same after the Ubuntu 17.0.10+7-1~22.04.1 package upgrade, because I tried running jspawnhelper myself, and the very first invocation (without arguments) segfaulted. :)

In that bug, Aleksey Shipilev notes:
> So this would only affect whoever is invoking jspawnhelper directly. But that would also run into problems when jspawnhelper protocol changes like in JDK-8310265.

E.g. it is clear that the jspawnhelper protocol was changed without taking into account that any "old" JRE process would now run the helper tool in a way that makes it segfault. I don't think they thought this through correctly, even though it is an internal JRE implementation detail...

Bottom line, this is not really an Ubuntu bug in the package, so feel free to close this ticket, but I would still suggest adding a visible notice that any running OpenJDK processes should be restarted!

Revision history for this message
Vladimir Petko (vpa1977) wrote :

Thank you for investigating this !!!!

I think we need to discuss the best way to fix it, e.g. maybe offer a compatibility patch that will check if argc == 1 and try to check contents of argv[0] in this case.

Revision history for this message
Vladimir Petko (vpa1977) wrote :

Security team advised that standard USN says
---
"This update uses a new upstream release, which includes additional bug
fixes. After a standard system update you need to restart any Java
applications to make all the necessary changes."
----[1]

I will submit MR upstream to print a warning rather than sigsegv in this case.

[1] https://ubuntu.com/security/notices/USN-6660-1

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in openjdk-17 (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.