ksh segfault on job_chksave () after it receive a SIGCHLD (Signal 17)

Bug #1697501 reported by Eric Desrochers on 2017-06-12
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ksh (Debian)
New
Unknown
ksh (Ubuntu)
Medium
Eric Desrochers
Trusty
Medium
Eric Desrochers
Xenial
Medium
Eric Desrochers
Yakkety
Medium
Eric Desrochers
Zesty
Medium
Eric Desrochers
Artful
Medium
Eric Desrochers

Bug Description

[Impact]

 * The compiler optimization dropped parts from the ksh job
locking mechanism from the binary code. As a consequence, ksh could terminate
unexpectedly with a segmentation fault after it received the SIGCHLD signal.

[Test Case]

 Unfortunately, there is no clear and easy way to reproduce the segfault.

 * But the original reporter of this bug can randomly reproduce the problem using an in-house ksh script that only works inside his infrastructure as follow : "ksh <in-house-script.ksh>" and then once in a while ksh will segfault as follow :

(gdb) bt
#0 job_chksave (pid=pid@entry=19003) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:1948
#1 0x00000000004282ab in job_reap (sig=17) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:428
#2 <signal handler called>
...

[Regression Potential]

* Regression risk : low/none expected, the package has been highly/intensively tested by a user who run over 18M ksh scripts a day on each of their clusters.

+

* Secondly, I doubt ksh has much traction nowadays, so if a regression occurs... It will most likely be limited to a small amount of users IMHO.
For instance, the bug has been reported 3 years ago for Red Hat, and we, Ubuntu, only heard about this same situation for the first time a few weeks ago.

+

* The fix has been written by RH and has been proven to work for them for the last 3 years.

Note that the RH fix has never been merged upstream (ksh is a unmaintained project) and/or possibly never been proposed to upstream (to be verified).

+

* A test package including the RH fix has been intensively tested and verified (pre-SRU) by an affected user with positive feedbacks using a
reproducer that segfault without the RH patch.

+

* Test package (pre-SRU) feedbacks :
https://bugs.launchpad.net/ubuntu/xenial/+source/ksh/+bug/1697501/comments/7

[Other Info]

 * ksh project is unmaintained nowadays [https://github.com/att/ast], thus no new development is made upstream nor in debian upstream.

 * Details about the RH bug :
--
   - https://bugzilla.redhat.com/show_bug.cgi?id=1123467
   - https://bugzilla.redhat.com/show_bug.cgi?id=1112306
   - https://access.redhat.com/solutions/1253243
   - http://rhn.redhat.com/errata/RHBA-2014-1015.html

  # ksh.spec
      Fri Jul 25 2014 Michal Hlavinka <email address hidden> - 20120801-10.8
    - job locking mechanism did not survive compiler optimization (#1123467)

  # patch
    - ksh-20120801-locking.patch
--

 * Debian bug:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867181

[Original Description]

# gdb
[New LWP 3882]
Core was generated by `/bin/ksh <KSH_SCRIPT>.ksh'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 job_chksave (pid=pid@entry=19385) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:1948
1948 if(jp->pid==pid)

(gdb) p *jp
Cannot access memory at address 0xb

(gdb) p *jp->pid
Cannot access memory at address 0x13

(gdb) p pid
$2 = 19385

(gdb) p *jpold
$1 = {next = 0xb, pid = -604008960, exitval = 11124}

The struct is corrupted at some point looking at the next,pid and exitval struct members values which isn't valid data.

# assembly code
=> 0x0000000000427159 <+41>: cmp %edi,0x8(%rdx)

(gdb) p $edi ## pid variable
$1 = 19385

(gdb) p *($rdx + 8) ## jp->pid struct
Cannot access memory at address 0x13
--

ksh is segfaulting because it can't access struct "jp" ($rdx) thus cannot de-reference the struct member "jp>pid" ($rdx + 8) at line : src/cmd/ksh93/sh/jobs.c:1948 when looking if jp->pid is equal to pid ($edi) variable.

I have looked at the github project "att/ast" upstream repo and some patches here and there, and nothing seems to apply.

Note that the project seems unmaintained nowadays.

Eric Desrochers (slashd) on 2017-06-12
Changed in ksh (Ubuntu):
importance: Undecided → Low
description: updated
Eric Desrochers (slashd) wrote :

I found a similar RH bugfix[1], not upstream[2].

Both backtrace from what I have in Ubuntu[3] and RH bug[4] correlate.

ksh could terminate unexpectedly with a segfault after it received the SIGCHLD signal.
See (sig=17) at frame 1 in job_reap.

[1] https://access.redhat.com/solutions/1253243

[2] - https://github.com/att/ast

[3] (gdb) bt
(gdb) bt
#0 job_chksave (pid=pid@entry=19003) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:1948
#1 0x00000000004282ab in job_reap (sig=17) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:428
#2 <signal handler called>

[4] - (gdb) bt
#0 job_chksave (pid=5066) at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/jobs.c:1949
#1 0x0000000000429240 in job_reap (sig=17) at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/jobs.c:428
#2 <signal handler called>

Eric Desrochers (slashd) wrote :

Red hat fix :

ksh.spec:
--
- job locking mechanism did not survive compiler optimization (#1123467)
--

ksh-20120801-locking.patch
--
diff -up ksh-20120801/src/cmd/ksh93/include/jobs.h.locking ksh-20120801/src/cmd/ksh93/include/jobs.h
--- ksh-20120801/src/cmd/ksh93/include/jobs.h.locking 2014-06-27 15:51:07.144923719 +0200
+++ ksh-20120801/src/cmd/ksh93/include/jobs.h 2014-06-27 15:52:56.463272276 +0200
@@ -149,15 +149,18 @@ extern struct jobs job;
 #define vmbusy() 0
 #endif

-#define job_lock() (job.in_critical++)
+#define asoincint(p) __sync_fetch_and_add(p,1)
+#define asodecint(p) __sync_fetch_and_sub(p,1)
+
+#define job_lock() asoincint(&job.in_critical)
 #define job_unlock() \
  do { \
   int sig; \
- if (!--job.in_critical && (sig = job.savesig)) \
+ if (asodecint(&job.in_critical)==1 && (sig = job.savesig)) \
   { \
- if (!job.in_critical++ && !vmbusy()) \
+ if (!asoincint(&job.in_critical) && !vmbusy()) \
     job_reap(sig); \
- job.in_critical--; \
+ asodecint(&job.in_critical); \
   } \
  } while(0)
--

tags: added: sts-sru-needed
Changed in ksh (Ubuntu Artful):
importance: Low → Medium
Changed in ksh (Ubuntu Zesty):
importance: Undecided → Medium
Changed in ksh (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in ksh (Ubuntu Xenial):
importance: Undecided → Medium
Changed in ksh (Ubuntu Trusty):
importance: Undecided → Medium
Eric Desrochers (slashd) on 2017-06-23
summary: - ksh segfault on job_chksave ()
+ ksh segfault on job_chksave () after it receive a SIGCHLD (Signal 17)
Eric Desrochers (slashd) wrote :

A test package is now available for testing (pre-sru) to determine if this fixes the problem in Ubuntu as well.

# Instructions
sudo add-apt-repository ppa:slashd/sf143225ksh
sudo apt-get update
sudo apt-get install ksh

# Validation
dpkg -l | grep -i ksh should reveal version : ksh - 93u+20120801-1+

Please provide feedbacks.

- Eric

description: updated
Eric Desrochers (slashd) wrote :

^^ should reveal version: ksh - 93u+20120801-1+sf143225b2

I installed the test package this morning, and we will try to reproduce for the coming week.

Thanks.
--
Walter

Eric Desrochers (slashd) on 2017-06-29
description: updated
Eric Desrochers (slashd) on 2017-07-04
Changed in ksh (Ubuntu Artful):
assignee: nobody → Eric Desrochers (slashd)
Changed in ksh (Ubuntu Zesty):
assignee: nobody → Eric Desrochers (slashd)
Changed in ksh (Ubuntu Yakkety):
assignee: nobody → Eric Desrochers (slashd)
Changed in ksh (Ubuntu Xenial):
assignee: nobody → Eric Desrochers (slashd)
Changed in ksh (Ubuntu Trusty):
assignee: nobody → Eric Desrochers (slashd)
Changed in ksh (Ubuntu Artful):
status: New → In Progress
Changed in ksh (Ubuntu Zesty):
status: New → In Progress
Changed in ksh (Ubuntu Yakkety):
status: New → In Progress
Changed in ksh (Ubuntu Xenial):
status: New → In Progress
Changed in ksh (Ubuntu Trusty):
status: New → In Progress
Eric Desrochers (slashd) wrote :

I have also reported a bug / submitted the patch to Debian upstream via :

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867181

- Eric

Eric Desrochers (slashd) on 2017-07-04
description: updated

The suite of scripts that had the issue were run over 3000 times over the last week.

No error have been reported.

Looks good to me!

Thanks
--
Walter

Changed in ksh (Debian):
status: Unknown → New
Eric Desrochers (slashd) on 2017-07-04
description: updated
description: updated
Eric Desrochers (slashd) wrote :

Artful (Development Release) debdiff

tags: added: patch
tags: added: sts
description: updated
Eric Desrochers (slashd) wrote :

For "Ubuntu Sponsors Team",

Could you please sponsor the Devel Release - Artful debdiff ?

I have the privileges to sponsor in Stable Release so once Artful is completed, I'll do the sponsor the SRU myself.

- Eric

description: updated
Eric Desrochers (slashd) on 2017-07-04
description: updated
description: updated
Eric Desrochers (slashd) wrote :

Zesty debdiff

Eric Desrochers (slashd) wrote :

Yakkety debdiff

Eric Desrochers (slashd) wrote :

Xenial debdiff

Eric Desrochers (slashd) wrote :

Trusty debdiff

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ksh - 93u+20120801-3.1ubuntu1

---------------
ksh (93u+20120801-3.1ubuntu1) artful; urgency=medium

  * d/p/locking.patch: Fix job locking mechanism to prevent
    ksh to segfaults in job_chksave after receiving SIGCHLD. (LP: #1697501)

 -- Eric Desrochers <email address hidden> Tue, 04 Jul 2017 12:42:47 -0400

Changed in ksh (Ubuntu Artful):
status: In Progress → Fix Released
Eric Desrochers (slashd) wrote :

Patch uploaded for the following releases : T/X/Y/Z.

Now waiting for the SRU verification team to approve the uploads for the new ksh packages to start building in $RELEASE-proposed.

- Eric

tags: removed: patch

Hello Eric, or anyone else affected,

Accepted ksh into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ksh/93u+20120801-1ubuntu0.14.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-trusty to verification-done-trusty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-trusty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ksh (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-trusty
Łukasz Zemczak (sil2100) wrote :

Hello Eric, or anyone else affected,

Accepted ksh into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ksh/93u+20120801-2ubuntu0.16.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-yakkety to verification-done-yakkety. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-yakkety. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ksh (Ubuntu Yakkety):
status: In Progress → Fix Committed
tags: added: verification-needed-yakkety
Changed in ksh (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Łukasz Zemczak (sil2100) wrote :

Hello Eric, or anyone else affected,

Accepted ksh into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ksh/93u+20120801-2ubuntu0.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Łukasz Zemczak (sil2100) wrote :

Hello Eric, or anyone else affected,

Accepted ksh into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ksh/93u+20120801-2ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ksh (Ubuntu Zesty):
status: In Progress → Fix Committed
tags: added: verification-needed-zesty

Bonjour,

preliminary tests were started yesterday with trusty-proposed.

We will start running in loop over the weekend with the script that mostly had the issue.

On Monday morning, this will be installed on our dev cluster which usually run 18M ksh a day.

I will try to start a few tests on xenial too some time next week.

Thanks
--
Walter

Eric Desrochers (slashd) wrote :

Thanks Walter.

Looking forward to hearing back from you with the outcome of your test.

- Eric

Bonjour,

we have been running tests for over a week on trusty. No issue to report.

I did not manage to get much tests going on xenial though.
--
Walter

Eric Desrochers (slashd) wrote :

# VERIFICATION TRUSTY #

I've been in direct communication with Walter, the reporter of this bug.
Walter is intensively using ksh using Trusty lxc container.

Before the proposed package, when running their in-house ksh scripts, ksh could unexpectedly segfaulting, but since they are using this proposed package there is no new occurrence of segfaults for approximately 2 weeks of intensive testing.

We can then conclude the proposed package is ready to be release in -updates.

To see Walter's comment :
https://bugs.launchpad.net/ubuntu/+source/ksh/+bug/1697501/comments/22

- Eric

tags: added: verification-done-trusty
removed: verification-needed-trusty
Eric Desrochers (slashd) wrote :

# VERIFICATION FOR XENIAL/YAKKETY/ZESTY #

For Xenial, Yakkety Zesty, there is no easy way for me to reproduce the situation.
Most of the troubleshooting has been done through coredump analysis and code inspection on my side.

It is clear for me that all supported stable releases (Xenial/Yakkety/Zesty) are affected by this bug at code inspection.

Walter, the reporter of this bug, has a in-house reproducer, but so far he only has tested on Trusty because it was his main interest.

Unless Walter is amenable to test on Xenial/Yakkety/Zesty as well, there is no clear reproducer as of today except his in-house ksh script, but again based on code inspection I'm confident that this need to be part of ksh package for Xenial/Yakkety/Zesty.

Note that there isn't much change between ksh on Trusty and Artful, meaning the package is pretty much the same for every stable releases. ksh is a unmaintained project with no new development nowadays.

- Eric

Eric Desrochers (slashd) on 2017-07-14
tags: added: verification-done-xenial verification-done-yakkety verification-done-zesty
removed: verification-needed verification-needed-xenial verification-needed-yakkety verification-needed-zesty
Łukasz Zemczak (sil2100) wrote :

I'll release the trusty part right now (since the target-of-interest for the issue is trusty). For the rest, as we discussed privately: please at least give a basic sanity-spin of the new ksh for each of the other stable series, at least to make sure those are not completely broken.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ksh - 93u+20120801-1ubuntu0.14.04.1

---------------
ksh (93u+20120801-1ubuntu0.14.04.1) trusty; urgency=medium

  * d/p/locking.patch: Fix job locking mechanism to prevent
    ksh to segfaults in job_chksave after receiving SIGCHLD. (LP: #1697501)

 -- Eric Desrochers <email address hidden> Fri, 23 Jun 2017 16:01:04 -0400

Changed in ksh (Ubuntu Trusty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for ksh has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Eric Desrochers (slashd) wrote :

Hi Łukasz Zemczak (sil2100),

With regard to the basic sanity-spin of the new ksh for each of the other stable series to make sure those are not completely broken.

I have tested ksh against a number of regression tests offered in the ksh source package itself.

--
To run all these tests with the shell you just built, go to the tests directory and run the command:
SHELL=$dir/ksh $dir/ksh shtests
where dir is the directory of the ksh you want to test.
--

The output the regression tests are generating is very long, but to summarize it test the binary in depth. Testing alias, append, arrays, attributes, bracket, enum, exit, expand, functions, grep, io, leaks, locale, math, namespace, path, options, quoting sigchld, signal, variables, tilde, and many more.

Output example :

test sigchld passed at 2017-07-18+15:31:19 [ 14 tests 0 errors ]
test sigchld(C.UTF-8) begins at 2017-07-18+15:31:19
test sigchld(C.UTF-8) passed at 2017-07-18+15:31:35 [ 14 tests 0 errors ]
test sigchld(shcomp) begins at 2017-07-18+15:31:35
test sigchld(shcomp) passed at 2017-07-18+15:31:50 [ 14 tests 0 errors ]

I'm showing this above regression test output as an example and because the bug was about a segfault after receiving a sigchld.

I won't hide that the test is sometime generating errors here and there, even if the huge majority of the regression tests passed without errors. Amongst the reported errors I can't find anything obvious that could be related to the patch I have submitted.

I have strong believe these errors existed pre-SRU, and are probably there for a while, since there is no new development nowadays for ksh project.

Example of errors :
test bracket(C.UTF-8) begins at 2017-07-18+15:27:58
 bracket.sh[86]: -r: /tmp/tmp.HapRrWwjIk/original should not be readable
 bracket.sh[92]: -w: /tmp/tmp.HapRrWwjIk/original should not be writable
 bracket.sh[95]: -x: /tmp/tmp.HapRrWwjIk/original should not be executable
 bracket.sh[98]: -rw: /tmp/tmp.HapRrWwjIk/original should not be readable/writable
test bracket(C.UTF-8) failed at 2017-07-18+15:28:03 with exit code 1 [ 118 tests 1 error ]

Hope the above verification is enough for you to have confidence in the -proposed package for X/Y/Z.

- Eric

Robie Basak (racb) wrote :

@Eric

That sounds like that exceeds what sil2000 requested to me. But please can you state how are you sure these tests ran against the ksh binary in proposed, rather than from a build in a source tree, for example? Also please confirm the package version supplying the ksh binary that you tested.

Eric Desrochers (slashd) wrote :

@Robie,

>How are you sure these tests ran against the ksh binary in proposed, rather than from a build in a source tree, for example?

Yes, the test was ran against the ksh binaries in proposed.

I have test both binaries with the following :

#SHELL=/bin/ksh93 /bin/ksh93 shtests
#SHELL=/bin/ksh /bin/ksh shtests

>Also please confirm the package version supplying the ksh binary that you tested.

xenial | ksh | 93u+20120801-2ubuntu0.16.04.1
yakkety | ksh | 93u+20120801-2ubuntu0.16.10.1
zesty | ksh | 93u+20120801-2ubuntu1

- Eric

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ksh - 93u+20120801-2ubuntu1

---------------
ksh (93u+20120801-2ubuntu1) zesty; urgency=medium

  * d/p/locking.patch: Fix job locking mechanism to prevent
    ksh to segfaults in job_chksave after receiving SIGCHLD. (LP: #1697501)

 -- Eric Desrochers <email address hidden> Tue, 04 Jul 2017 15:08:52 -0400

Changed in ksh (Ubuntu Zesty):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ksh - 93u+20120801-2ubuntu0.16.10.1

---------------
ksh (93u+20120801-2ubuntu0.16.10.1) yakkety; urgency=medium

  * d/p/locking.patch: Fix job locking mechanism to prevent
    ksh to segfaults in job_chksave after receiving SIGCHLD. (LP: #1697501)

 -- Eric Desrochers <email address hidden> Tue, 04 Jul 2017 15:08:52 -0400

Changed in ksh (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ksh - 93u+20120801-2ubuntu0.16.04.1

---------------
ksh (93u+20120801-2ubuntu0.16.04.1) xenial; urgency=medium

  * d/p/locking.patch: Fix job locking mechanism to prevent
    ksh to segfaults in job_chksave after receiving SIGCHLD. (LP: #1697501)

 -- Eric Desrochers <email address hidden> Tue, 04 Jul 2017 15:08:52 -0400

Changed in ksh (Ubuntu Xenial):
status: Fix Committed → Fix Released
tags: added: sts-sru-done
removed: sts-sru-needed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.