ksh segfault on job_chksave () after it receive a SIGCHLD (Signal 17)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ksh (Debian) |
Fix Released
|
Unknown
|
|||
ksh (Ubuntu) |
Fix Released
|
Medium
|
Eric Desrochers | ||
Trusty |
Fix Released
|
Medium
|
Eric Desrochers | ||
Xenial |
Fix Released
|
Medium
|
Eric Desrochers | ||
Yakkety |
Fix Released
|
Medium
|
Eric Desrochers | ||
Zesty |
Fix Released
|
Medium
|
Eric Desrochers | ||
Artful |
Fix Released
|
Medium
|
Eric Desrochers |
Bug Description
[Impact]
* The compiler optimization dropped parts from the ksh job
locking mechanism from the binary code. As a consequence, ksh could terminate
unexpectedly with a segmentation fault after it received the SIGCHLD signal.
[Test Case]
Unfortunately, there is no clear and easy way to reproduce the segfault.
* But the original reporter of this bug can randomly reproduce the problem using an in-house ksh script that only works inside his infrastructure as follow : "ksh <in-house-
(gdb) bt
#0 job_chksave (pid=pid@
#1 0x00000000004282ab in job_reap (sig=17) at /build/
#2 <signal handler called>
...
[Regression Potential]
* Regression risk : low/none expected, the package has been highly/intensively tested by a user who run over 18M ksh scripts a day on each of their clusters.
+
* Secondly, I doubt ksh has much traction nowadays, so if a regression occurs... It will most likely be limited to a small amount of users IMHO.
For instance, the bug has been reported 3 years ago for Red Hat, and we, Ubuntu, only heard about this same situation for the first time a few weeks ago.
+
* The fix has been written by RH and has been proven to work for them for the last 3 years.
Note that the RH fix has never been merged upstream (ksh is a unmaintained project) and/or possibly never been proposed to upstream (to be verified).
+
* A test package including the RH fix has been intensively tested and verified (pre-SRU) by an affected user with positive feedbacks using a
reproducer that segfault without the RH patch.
+
* Test package (pre-SRU) feedbacks :
https:/
[Other Info]
* ksh project is unmaintained nowadays [https:/
* Details about the RH bug :
--
- https:/
- https:/
- https:/
- http://
# ksh.spec
Fri Jul 25 2014 Michal Hlavinka <email address hidden> - 20120801-10.8
- job locking mechanism did not survive compiler optimization (#1123467)
# patch
- ksh-20120801-
--
* Debian bug:
https:/
[Original Description]
# gdb
[New LWP 3882]
Core was generated by `/bin/ksh <KSH_SCRIPT>.ksh'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 job_chksave (pid=pid@
1948 if(jp->pid==pid)
(gdb) p *jp
Cannot access memory at address 0xb
(gdb) p *jp->pid
Cannot access memory at address 0x13
(gdb) p pid
$2 = 19385
(gdb) p *jpold
$1 = {next = 0xb, pid = -604008960, exitval = 11124}
The struct is corrupted at some point looking at the next,pid and exitval struct members values which isn't valid data.
# assembly code
=> 0x0000000000427159 <+41>: cmp %edi,0x8(%rdx)
(gdb) p $edi ## pid variable
$1 = 19385
(gdb) p *($rdx + 8) ## jp->pid struct
Cannot access memory at address 0x13
--
ksh is segfaulting because it can't access struct "jp" ($rdx) thus cannot de-reference the struct member "jp>pid" ($rdx + 8) at line : src/cmd/
I have looked at the github project "att/ast" upstream repo and some patches here and there, and nothing seems to apply.
Note that the project seems unmaintained nowadays.
Changed in ksh (Ubuntu): | |
importance: | Undecided → Low |
description: | updated |
summary: |
- ksh segfault on job_chksave () + ksh segfault on job_chksave () after it receive a SIGCHLD (Signal 17) |
description: | updated |
Changed in ksh (Ubuntu Artful): | |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in ksh (Ubuntu Zesty): | |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in ksh (Ubuntu Yakkety): | |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in ksh (Ubuntu Xenial): | |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in ksh (Ubuntu Trusty): | |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in ksh (Ubuntu Artful): | |
status: | New → In Progress |
Changed in ksh (Ubuntu Zesty): | |
status: | New → In Progress |
Changed in ksh (Ubuntu Yakkety): | |
status: | New → In Progress |
Changed in ksh (Ubuntu Xenial): | |
status: | New → In Progress |
Changed in ksh (Ubuntu Trusty): | |
status: | New → In Progress |
description: | updated |
Changed in ksh (Debian): | |
status: | Unknown → New |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
tags: |
added: verification-done-xenial verification-done-yakkety verification-done-zesty removed: verification-needed verification-needed-xenial verification-needed-yakkety verification-needed-zesty |
tags: |
added: sts-sru-done removed: sts-sru-needed |
Changed in ksh (Debian): | |
status: | New → Fix Released |
I found a similar RH bugfix[1], not upstream[2].
Both backtrace from what I have in Ubuntu[3] and RH bug[4] correlate.
ksh could terminate unexpectedly with a segfault after it received the SIGCHLD signal.
See (sig=17) at frame 1 in job_reap.
[1] https:/ /access. redhat. com/solutions/ 1253243
[2] - https:/ /github. com/att/ ast
[3] (gdb) bt entry=19003) at /build/ ksh-6IEHIC/ ksh-93u+ 20120801/ src/cmd/ ksh93/sh/ jobs.c: 1948 ksh-6IEHIC/ ksh-93u+ 20120801/ src/cmd/ ksh93/sh/ jobs.c: 428
(gdb) bt
#0 job_chksave (pid=pid@
#1 0x00000000004282ab in job_reap (sig=17) at /build/
#2 <signal handler called>
[4] - (gdb) bt debug/ksh- 20120801/ src/cmd/ ksh93/sh/ jobs.c: 1949 debug/ksh- 20120801/ src/cmd/ ksh93/sh/ jobs.c: 428
#0 job_chksave (pid=5066) at /usr/src/
#1 0x0000000000429240 in job_reap (sig=17) at /usr/src/
#2 <signal handler called>