bad signal mask of ssh sessions

Bug #412972 reported by Michael Helmling
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
openssh (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

On one of my systems that runs the karmic alpha already, the "kill" command has no effect unless used with the -9 option. I assume this might be related to https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/407428

My steps to reproduce on this machine: "sleep 100" on one terminal, "killall sleep" on another. System is up to date, running kernel 2.6.31-4-generic. Sorry for having no idea which package this corresponds to!

Revision history for this message
In , Damien Miller (djm) wrote :

We already reset SIGCHLD to SIG_DFL in main(), maybe we don't do it
early enough...

Revision history for this message
In , Damien Miller (djm) wrote :

Could you verify this with a recent version, preferably including diag
output.

Revision history for this message
In , Damien Miller (djm) wrote :

20 months + no reply == closed bug

Revision history for this message
In , Darren Tucker (dtucker) wrote :

Change all RESOLVED bug to CLOSED with the exception of the ones fixed
post-4.4.

Revision history for this message
Michael Helmling (supermihi) wrote : can only kill processes with -9 in karmic

On one of my systems that runs the karmic alpha already, the "kill" command has no effect unless used with the -9 option. I assume this might be related to https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/407428

My steps to reproduce on this machine: "sleep 100" on one terminal, "killall sleep" on another. System is up to date, running kernel 2.6.31-4-generic. Sorry for having no idea which package this corresponds to!

Revision history for this message
Jonathan Marsden (jmarsden) wrote :

I cannot reproduce this here.

  sleep 100 &
  killall sleep

works fine for me in Ubuntu Karmic Alpha 4 amd64 (with updates as of 2009-08-16):

  Script started on Sun 16 Aug 2009 06:54:43 PM PDT
  jonathan@black:~$ sleep 100 &
  [1] 28262
  jonathan@black:~$ killall sleep
  [1]+ Terminated sleep 100
  jonathan@black:~$ exit
  exit

  Script done on Sun 16 Aug 2009 06:55:00 PM PDT

Doing it with the sleep 100 in a separate terminal also works fine for me.

Revision history for this message
Jonathan Marsden (jmarsden) wrote :

In case it makes a difference, I was using a virtualbox VM rather than a physical machine to run Karmic Alpha 4 for the above test.

Revision history for this message
Victor Vargas (kamus) wrote :

Same here, I try to reproduce this in Karmic Alpha up to date (running under VirtualBox) and works fine for me too.

affects: ubuntu → procps (Ubuntu)
Revision history for this message
Michael Helmling (supermihi) wrote :

Strange .. could you give me any hints how I could examine further why this happens?

Revision history for this message
Jonathan Marsden (jmarsden) wrote :

Michael wrote: "could you give me any hints how I could examine further why this happens?"

The first thing might be to see if it is replicable -- can you install Karmic Alpha 4 on another machine and check whether you get the same issue? If not, then either it relates to the current install you have, or to the hardware you installed it on. If you can duplicate it, document *exactly* how you are doing the install process, what you are choosing at every step of the way, so that others can follow your documented steps and reproduce this.

Until someone other than you can duplicate the issue, it is unlikely we'll be able to provide an explanation (or even hints) as to why it is happening. So far, it has happened exactly one on one install on one piece of hardware... we need to determine if that was "just a one off", or if there is a reproduceable issue here.

Revision history for this message
Matti Hiljanen (matti-hiljanen) wrote :

I'm seeing this too, clean up-to-date install of karmic alpha4 amd64 (netinst with mini.iso) on a physical machine.

This happens only when logged in via ssh, doing the test locally (over a serial console) works. Definitely seems to be related to https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/407428

summary: - can only kill processes with -9 in karmic
+ can only kill processes with -9 in karmic from SSH sessions
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: can only kill processes with -9 in karmic from SSH sessions

Pick on a process to kill, then run:

   strace kill -TERM pid-of-process

And attach the output from strace

Changed in procps (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Michael Helmling (supermihi) wrote :
Download full text (4.2 KiB)

Hi,
here's what I get (23473 is a "sleep 100" process):

helmling@menk:~$ strace kill -TERM 23473
execve("/bin/kill", ["kill", "-TERM", "23473"], [/* 18 vars */]) = 0
brk(0) = 0x762000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6fff412000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6fff410000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=125575, ...}) = 0
mmap(NULL, 125575, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6fff3f1000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/libproc-3.2.8.so", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240A\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=76696, ...}) = 0
mmap(NULL, 2249496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6ffefcf000
mprotect(0x7f6ffefe0000, 2097152, PROT_NONE) = 0
mmap(0x7f6fff1e0000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11000) = 0x7f6fff1e0000
mmap(0x7f6fff1e2000, 74520, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f6fff1e2000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\353\1\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1490312, ...}) = 0
mmap(NULL, 3598344, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6ffec60000
mprotect(0x7f6ffedc6000, 2093056, PROT_NONE) = 0
mmap(0x7f6ffefc5000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x165000) = 0x7f6ffefc5000
mmap(0x7f6ffefca000, 18440, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f6ffefca000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6fff3f0000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6fff3ef000
arch_prctl(ARCH_SET_FS, 0x7f6fff3ef6f0) = 0
mprotect(0x7f6ffefc5000, 16384, PROT_READ) = 0
mprotect(0x7f6fff1e0000, 4096, PROT_READ) = 0
mprotect(0x603000, 4096, PROT_READ) = 0
mprotect(0x7f6fff413000, 4096, PROT_READ) = 0
munmap(0x7f6fff3f1000, 125575) = 0
brk(0) = 0x762000
brk(0x783000) = 0x783000
open("/proc...

Read more...

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

The key line here is:

kill(23473, SIGTERM) = -1 ESRCH (No such process)

Could you do the same again, this time run both "strace kill -TERM pid" and then immediately after "strace kill -9 pid" to prove that -9 works

summary: - can only kill processes with -9 in karmic from SSH sessions
+ can only kill processes with -9 in karmic from SSH sessions, -TERM does
+ not work
Revision history for this message
Michael Helmling (supermihi) wrote : Re: can only kill processes with -9 in karmic from SSH sessions, -TERM does not work
Download full text (9.7 KiB)

Here you go...

root@menk:~# ps -A|grep sleep
 3644 pts/0 00:00:00 sleep

root@menk:~# strace kill -TERM 3644
execve("/bin/kill", ["kill", "-TERM", "3644"], [/* 18 vars */]) = 0
brk(0) = 0x8ee000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff12fa8d000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff12fa8b000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=124031, ...}) = 0
mmap(NULL, 124031, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff12fa6c000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/libproc-3.2.8.so", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240A\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=76696, ...}) = 0
mmap(NULL, 2249496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff12f64a000
mprotect(0x7ff12f65b000, 2097152, PROT_NONE) = 0
mmap(0x7ff12f85b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11000) = 0x7ff12f85b000
mmap(0x7ff12f85d000, 74520, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ff12f85d000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\353\1\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1490312, ...}) = 0
mmap(NULL, 3598344, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff12f2db000
mprotect(0x7ff12f441000, 2093056, PROT_NONE) = 0
mmap(0x7ff12f640000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x165000) = 0x7ff12f640000
mmap(0x7ff12f645000, 18440, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ff12f645000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff12fa6b000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff12fa6a000
arch_prctl(ARCH_SET_FS, 0x7ff12fa6a6f0) = 0 ...

Read more...

Revision history for this message
Michael Helmling (supermihi) wrote :

Actually the "no such process" line is not present anymore here, but the symptoms are the same -- sleep runs until I kill with -9.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 412972] Re: can only kill processes with -9 in karmic from SSH sessions, -TERM does not work

On Wed, 2009-08-26 at 08:22 +0000, Michael Helmling wrote:

> Actually the "no such process" line is not present anymore here, but the
> symptoms are the same -- sleep runs until I kill with -9.
>
Right, I suspect that you just weren't quick enough that time ;-)

Could you confirm a few things for me:

 - what kind of machine is this? PC, Mac, emulated? virtual machine?
   etc.

 - how are you running the "sleep" and "kill" commands? from X? from a
   getty? from ssh?

Could you run a "sleep" as before, and this time "cat /proc/PID/status"
from another terminal and attach that.

Thanks,

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
Michael Helmling (supermihi) wrote : Re: can only kill processes with -9 in karmic from SSH sessions, -TERM does not work

Hi,
the machine is a PC (amd64), not virtual. All I said before was regarding SSH sessions only, but right now I happen to be at the site, so I'm able to do some further testing. The following holds:

sleep on SSH, kill on SSH: not working
sleep on SSH, kill on getty: not working
sleep on getty, kill on SSH: working
sleep on getty, kill on getty: working

So this is somewhat SSH related, and I even more suspect a relation to bug #407428. However, the mentioned bug is gone after the last update+reboot, but this one is NOT.

Here is your /proc/status from a sleep running at SSH:

# cat /proc/4781/status
Name: sleep
State: S (sleeping)
Tgid: 4781
Pid: 4781
PPid: 3695
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 256
Groups: 0
VmPeak: 9436 kB
VmSize: 9436 kB
VmLck: 0 kB
VmHWM: 804 kB
VmRSS: 804 kB
VmData: 192 kB
VmStk: 84 kB
VmExe: 32 kB
VmLib: 1676 kB
VmPTE: 40 kB
Threads: 1
SigQ: 7/16382
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffe7ffadeff
SigIgn: 0000000000000000
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: ffffffffffffffff
CapEff: ffffffffffffffff
CapBnd: ffffffffffffffff
Cpus_allowed: 3
Cpus_allowed_list: 0-1
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 2
nonvoluntary_ctxt_switches: 2

Revision history for this message
Colin Watson (cjwatson) wrote :

Might any of you have restarted sshd from within a su session? I don't have definite proof that it's related, but I note that the quoted blocked signal mask corresponds exactly to those signals that su blocks. I wonder if it's due to some PAM session module ...

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Michael: could you also give us the /proc/PID/status of a sleep after you've sent it the TERM signal

summary: - can only kill processes with -9 in karmic from SSH sessions, -TERM does
- not work
+ bad signal mask of ssh sessions
Revision history for this message
Michael Helmling (supermihi) wrote :

Here you go ...

# cat /proc/12218/status
Name: sleep
State: S (sleeping)
Tgid: 12218
Pid: 12218
PPid: 12202
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 256
Groups: 0
VmPeak: 9436 kB
VmSize: 9436 kB
VmLck: 0 kB
VmHWM: 804 kB
VmRSS: 804 kB
VmData: 192 kB
VmStk: 84 kB
VmExe: 32 kB
VmLib: 1676 kB
VmPTE: 40 kB
Threads: 1
SigQ: 10/16382
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffe7ffadeff
SigIgn: 0000000000000000
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: ffffffffffffffff
CapEff: ffffffffffffffff
CapBnd: ffffffffffffffff
Cpus_allowed: 3
Cpus_allowed_list: 0-1
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 1
nonvoluntary_ctxt_switches: 1
root@menk:~# ps -A|grep sleep
12218 pts/3 00:00:00 sleep
root@menk:~# kill 12218
root@menk:~# cat /proc/12218/status
Name: sleep
State: S (sleeping)
Tgid: 12218
Pid: 12218
PPid: 12202
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 256
Groups: 0
VmPeak: 9436 kB
VmSize: 9436 kB
VmLck: 0 kB
VmHWM: 804 kB
VmRSS: 804 kB
VmData: 192 kB
VmStk: 84 kB
VmExe: 32 kB
VmLib: 1676 kB
VmPTE: 40 kB
Threads: 1
SigQ: 11/16382
SigPnd: 0000000000000000
ShdPnd: 0000000000004000
SigBlk: fffffffe7ffadeff
SigIgn: 0000000000000000
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: ffffffffffffffff
CapEff: ffffffffffffffff
CapBnd: ffffffffffffffff
Cpus_allowed: 3
Cpus_allowed_list: 0-1
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 1
nonvoluntary_ctxt_switches: 1

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Thanks, this confirms what we expected.

The TERM signal is being delivered to the process, it's just that the process's mask is set to block that signal and it's still pending.

We're not entirely sure yet what causes this bug; it's certainly not a procps or kernel bug, so I'm reassigning to openssh for the time being since that seems to be the key process.

affects: procps (Ubuntu) → openssh (Ubuntu)
Changed in openssh (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Ken Bloom (kbloom) wrote :

This looks related to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=549376, as I (the original reporter of that bug) am seeing the same SigBlk mask on my root sshd process. It may also be related to https://bugzilla.mindrot.org/show_bug.cgi?id=271 where sshd was inheriting a bad signal mask from somewhere else.

Revision history for this message
Ken Bloom (kbloom) wrote :

I'm running Debian Unstable booting with file-rc, and there are several system daemons that have screwy SigBlk masks, of which sshd is one. They are listed in the attached "commands" file.

You can get the data for a similar report on your own machine by running the following commands (as root):

grep SigBlk /proc/*/status | grep -v 0000000000000000 > commands
for x in $(grep SigBlk /proc/*/status | grep -v 000000000000 | sed -e 's@/proc/\(.*\)/status.*$@\1@g'); do echo -n "$x: "; cat /proc/$x/cmdline; echo; done >> commands

After running this, I cleaned things up a bit in vim to better organize the report.

Perhaps people experiencing this bug on Ubuntu could run a similar report?

Revision history for this message
Ken Bloom (kbloom) wrote :

Bingo. The bug is at https://bugs.launchpad.net/ubuntu/+source/udev/+bug/407428 and it's already been fixed in udev.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.