Wrong cpu-time calculation for long-running multi-threaded processes

Bug #859311 reported by Michael Ott
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
procps (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

There seems to be a problem with the calculation of cpu-time for long-running multi-threaded processes. On a 2-way Xeon X5660 system, ps reports a cpu-time of 1184018577 days for a process that has been running for 5 days with 11 threads:

ps -f -C ustacks
UID PID PPID C STIME TTY TIME CMD
cul07b 13246 10866 99 Sep21 pts/0 1184018577-00:27:06 /home/cul07b/stacks/bin/ustacks -t fasta -f BN_pair1_mod.fasta -o ../results -i 2 -d -r -m 5 -M 2 -p 11

The corresponding /proc/13246/stat file looks like this:
13246 (ustacks) R 10866 10866 10861 34816 10866 4202496 2989827 0 0 0 85771819844 4611685996976182811 0 0 20 0 11 0 9806481 12440649728 2894242 18446744073709551615 1 1 0 0 0 0 0 0 0 18446744073709551615 0 0 17 3 0 0 0 0 0

top reports the same process with 0% cpu load although it is still running full-throttle with 11 threads:
13246 cul07b 20 0 11.6g 11g 1412 R 0 23.4 178668,22 ustacks

I saw similar issues with a process running for a couple of days with 48 threads on a 4-way AMD Opteron 6172 system.

I think the same bug has already been reported on bugs.debian.org: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=641905

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: procps 1:3.2.8-1ubuntu4
ProcVersionSignature: Ubuntu 2.6.32-32.62-server 2.6.32.38+drm33.16
Uname: Linux 2.6.32-32-server x86_64
Architecture: amd64
Date: Mon Sep 26 11:32:31 2011
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=en_AU.UTF-8
SourcePackage: procps

Revision history for this message
Michael Ott (michaelott) wrote :
description: updated
Revision history for this message
Matthew L. Dailey (matthew-l-dailey) wrote :

Just a me too on this one. I ran into this on a 10.04.3 system with some Matlab jobs that have been running for a couple of weeks.

# ps -f -C MATLAB
UID PID PPID C STIME TTY TIME CMD
1234 21267 21256 99 Jan09 pts/3 1184016086-14:43:06 /opt/matlabR2011b/bin
1234 21578 21567 97 Jan09 pts/5 22-23:06:31 /opt/matlabR2011b/bin/glnxa64
1234 21688 21666 96 Jan09 pts/6 22-17:46:41 /opt/matlabR2011b/bin/glnxa64
1234 21786 21775 89 Jan09 pts/7 21-05:12:21 /opt/matlabR2011b/bin/glnxa64
1234 21884 21873 99 Jan09 pts/9 1184016084-11:35:28 /opt/matlabR2011b/bin

# cat /proc/21267/stat
21267 (MATLAB) S 21256 21267 21256 34819 21267 4202496 48728480602 10279 1 0 85619631648 4611685975611267009 16 1 20 0 49 0 176768180 3195682816 527162 18446744073709551615 4194304 4206714 140735315194304 140735315193832 140009299867740 0 134742022 0 151526638 18446744073709551615 0 0 17 15 0 0 0 0 0

# cat /proc/21884/stat
21884 (MATLAB) S 21873 21884 21873 34825 21884 4202496 47007338124 10280 1 0 85595226373 4611685975617266477 6 1 20 0 49 0 176794854 3336249344 558780 18446744073709551615 4194304 4206714 140734232928064 140734232927592 140313580697692 0 134742022 0 151526638 18446744073709551615 0 0 17 15 0 0 0 0 0

For what it's worth, htop seems to work properly and show the CPU usage of these processes.

Here's some system info:

# uname -a
Linux myhost 2.6.32-37-generic #81-Ubuntu SMP Fri Dec 2 20:32:42 UTC 2011 x86_64 GNU/Linux

# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.3 LTS"

Let me know if any other info would be helpful while these jobs are still running. :-)

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in procps (Ubuntu):
status: New → Confirmed
Revision history for this message
Chris J Arges (arges) wrote :

I believe this is a duplicate of this bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1057593

I would recommend upgrading your kernel such that it has this fix.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.