hadoop crash: /bin/kill in ubuntu16.04 has bug in killing process group
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
procps (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
when i run hadoop in ubuntu 16.04, ssh will exit, all process which belong to hadoop user will be killed ,through debug ,i found the /bin/kill in ubuntu16.04 has a bug , it has bug in killing process group .
Ubuntu version is:
Description: Ubuntu 16.04.1 LTS
Release: 16.04
(1)The way to repeat this bug
It is easy to repeat this bug , run “/bin/kill -15 -12345” or any like “/bin/kill -15 -1xxxx” in ubuntu16.04 , it will kill all the process .
(2)Cause analysis
The code of /bin/kill in ubuntu16.04 come from procps-3.3.10 , when I run “/bin/kill -15 -1xxxx” , it actually send signal 15 to -1 ,
-1 mean it will kill all the process .
(3)The bug in procps-
static void __attribute__ ((__noreturn__)) kill_main(int argc, char **argv)
{
case '?':
}
(4) the cause
sometimes when the resource is tight or a hadoop container lost connection in sometime, the nodemanager will kill this container , it send a signal to kill this jvm process ,it is a normal behavior for hadoop to kill a task and then reexecute this task. but with this kill bug ,it kill all the process belong to a hadoop user .
(5) The way to workaround
I copy /bin/kill in ubuntu14.04 to override /bin/kill in ubuntu16.04, it is ok in this way . I also think it is better to ask procps-3.3.10 maintainer to solve their bug,but i don't know how to contact them .
description: | updated |
description: | updated |
description: | updated |
summary: |
- /bin/kill in ubuntu16.04 has bug in killing process group + hadoop crash: /bin/kill in ubuntu16.04 has bug in killing process group |
affects: | alsa-driver (Ubuntu) → procps (Ubuntu) |
Status changed to 'Confirmed' because the bug affects multiple users.