ps segfault when users have large numbers of group memberships (procps 3.2.8)

Bug #1174444 reported by Robert Beaty on 2013-04-29
44
This bug affects 7 people
Affects Status Importance Assigned to Milestone
procps (Ubuntu)
Undecided
Unassigned

Bug Description

[Impact]

 * Users with large numbers of groups will cause ps to segfault. This can happen when directory services such as Active directory or possibly others like ldap are in use.

 * The upload expands buffer sizes to be in line with upstream procps.

[Test Case]

 * Using a directory service create a user that belongs to a very large number of groups.
 * run ps. Which will segfault.

 * these should allow someone who is not familiar with the affected
   package to reproduce the bug and verify that the updated package fixes
   the problem.

[Regression Potential]

 * Regressions are highly unlikely as only buffer sizes were changed to be in line with commit 7933435584aa1fd75460f4c7715a3d4855d97c1c of upstream procps.

[Other Info]

 * This fix is not in quantal or raring, but should be available in saucy assuming the version of procps in there is greater than 3.3.4

When a user logs in via ssh with a large number of group memberships it causes a seg fault when running ps (procps version 3.2.8).

Description: Ubuntu 12.04.2 LTS
Release: 12.04

procps:
  Installed: 1:3.2.8-11ubuntu6
  Candidate: 1:3.2.8-11ubuntu6
  Version table:
 *** 1:3.2.8-11ubuntu6 0
        500 http://us.archive.ubuntu.com/ubuntu/ precise/main amd64 Packages
        100 /var/lib/dpkg/status

Expected results: ps completes and returns to prompt

  PID TTY TIME CMD
12707 pts/1 00:00:00 sudo
12708 pts/1 00:00:00 bash

Actual results:

  PID TTY TIME CMD
12707 pts/1 00:00:00 sudo
12708 pts/1 00:00:00 bash

Signal 11 (SEGV) caught by ps (procps version 3.2.8).

Here is the end of a strace on a ps:

mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4da880e000
mremap(0x7f4da880e000, 135168, 266240, MREMAP_MAYMOVE) = 0x7f4da87cd000
mremap(0x7f4da87cd000, 266240, 528384, MREMAP_MAYMOVE) = 0x7f4da929d000
mremap(0x7f4da929d000, 528384, 1052672, MREMAP_MAYMOVE) = 0x7f4da919c000
mremap(0x7f4da919c000, 1052672, 2101248, MREMAP_MAYMOVE) = 0x7f4da862e000
mremap(0x7f4da862e000, 2101248, 4198400, MREMAP_MAYMOVE) = 0x7f4da822d000
mremap(0x7f4da822d000, 4198400, 8392704, MREMAP_MAYMOVE) = 0x7f4da7a2c000
mremap(0x7f4da7a2c000, 8392704, 16781312, MREMAP_MAYMOVE) = 0x7f4da6a2b000
mremap(0x7f4da6a2b000, 16781312, 33558528, MREMAP_MAYMOVE) = 0x7f4da4a2a000
mremap(0x7f4da4a2a000, 33558528, 67112960, MREMAP_MAYMOVE) = 0x7f4da0a29000
mremap(0x7f4da0a29000, 67112960, 134221824, MREMAP_MAYMOVE) = 0x7f4d98a28000
mremap(0x7f4d98a28000, 134221824, 268439552, MREMAP_MAYMOVE) = 0x7f4d88a27000
mremap(0x7f4d88a27000, 268439552, 536875008, MREMAP_MAYMOVE) = 0x7f4d68a26000
mremap(0x7f4d68a26000, 536875008, 1073745920, MREMAP_MAYMOVE) = 0x7f4d28a25000
mremap(0x7f4d28a25000, 1073745920, 2147487744, MREMAP_MAYMOVE) = 0x7f4ca8a24000
mremap(0x7f4ca8a24000, 2147487744, 4096, MREMAP_MAYMOVE) = 0x7f4ca8a24000
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
write(2, "\n\nSignal 11 (SEGV) caught by ps "..., 132

Signal 11 (SEGV) caught by ps (procps version 3.2.8).
Please send bug reports to <email address hidden> or <email address hidden>
) = 132
exit_group(139)

Here is the debian bug report on it.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702965

It looks like the 12.10 repos have the newer version of the procps and libprocps0 packages which address the problem.

My question being can these newer version be put into place for 12.04 or am I stuck trying to manually intall a newer version fro the 12.10 repos or something along those lines to fix this issue.

Bryan Quigley (bryanquigley) wrote :

I'm having trouble reproducing this locally. Can you provide step by step instruction on how you can create this issue on a new install? (Please try reproducing with local users/groups only, no network dependencies).

If you cannot, could you provide the output of these commands to help me reproduce it myself:
getent group > groupoutput
getent passwd > passwdoutput
ps-ng aux > psoutput

If you need to replace some of the names for security reasons thats ok. Thanks!

Changed in procps (Ubuntu):
status: New → Incomplete

Hi Bryan,

I hope this helps a little. I am also affecting by this bug.

Our company are currently in the process of evaluating Ubuntu as a means of offering our empoyees the chance to work on a Linux OS.

The problem occurs after I joined our company's domain, thus getting all my AD groups. I cannot publish the results of the getent group or password, but to give you an idea of the scale:

groupoutput contains 343 entries. Of those entries, around 20 are ~2000 character in length (containing a large number of usernames)
passwdoutput contains 35 entries with nothing spectacular.

I would be very helpful if a solution could be found for 12.04 as the management would like to stay with the LTS Release.

Thanks for all the good work.

Bryan Quigley (bryanquigley) wrote :

If you could confirm that this is a fix for you, that would also be helpful:
http://people.canonical.com/~chiluk/lp1176215/

Thanks for the quick response. Yes, this is a fix. ps now working as expected.

Robert Beaty (beatyrm) wrote :

Sorry for not being able to reply before now. I can confirm that the fix also works for me.

Dave Chiluk (chiluk) wrote :

This patch increases the buffer size in line with upstream commit 7933435584aa1fd75460f4c7715a3d4855d97c1c released in 3.3.4. this means it is not currently in any other series.

Dave Chiluk (chiluk) on 2013-05-09
description: updated
Dave Chiluk (chiluk) on 2013-05-14
Changed in procps (Ubuntu):
status: Incomplete → In Progress

I've been using ISPConfig, a very good software to handle an small ISP, for quite a while. We inherit a couple thousand users from the closure of another partner. Since they had marginal traffic but a lot of legacy sites we used a dedicated web server with ubuntu 12.04 LTS LAMP and the standard ISPConfig setup.
ISPConfig configures an user for every web and a group for every client. The apache user belongs to every "client" group. I think this is the real life situation you were looking for in order to reproduce the bug.
ISPConfig has a php cron script which is used for internal tasks between main server and their slaves. One of the things the script does is checking there is no other instance running. The php code invokes "ps ax" and then searches for itself in the results.
It got us a little while to realize what was happening. Finally we disabled the cron process, so we were able to keep the system stable though not synchronized to user's petitions. After googling for several days since April 28th I started to see the launchpad bug report.

As you can imagine, the fix does not work for us (I wouldn't boring you if so). We can agree that we have so many groups for a user to belong to (exactly 1556) ... that our setup is a bit weird... that system limits must be somewhere... and so on. But in any case I think "ps" should be protected against issuing SEGV. We will be reviewing our configuration, of course, but wouldn't it be right to avoid the SEGV?

We can provide if required strace, example /etc/group and /proc/apache/stat files and, of course, testing as needed.

Thanks in advance

Alex Handle (alexhandle) wrote :

I have the exact same problem as Javier.
I also use ISPConfig and our www-data user is in about 934 groups.

This fix http://people.canonical.com/~chiluk/lp1176215/ did not help in my case.

Jeroen Baten (jbaten) wrote :

I have the same problem. If needed I can submit strace output and the the /proc/<pid>/status file that has the large list of groups in it.

I've seen this is very related to Bug #1150413 where it appears to be solved. But I can't apply the patch which is attached to that bug,

Dave Chiluk (chiluk) wrote :

bug #1150413 does contain a very similar patch.

Additionally 4096 was chosen for the buffer size because that is what is in upstream. Alex Handle if that is not enough for you, you may have to compile your own version of procps with a larger number or convince upstream procps to use a larger number.

Alex Handle (alexhandle) wrote :

Thank you Dave for your help!
I compiled the package with your patch with a buffer size of 8192 and now it works.

I am new to this. Is it possible to provide detailed instruction to compile procops specifying buffer size? thanks much!

Dave Chiluk (chiluk) wrote :

cisconetgineer please see bug #1150413

A test ppa is available with a proposed fix. Please test and report back in 1150413

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers