Ubuntu

Wrong cpu-time calculation for long-running multi-threaded processes

Reported by Michael Ott on 2011-09-26
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
procps (Ubuntu)
Undecided
Unassigned

Bug Description

There seems to be a problem with the calculation of cpu-time for long-running multi-threaded processes. On a 2-way Xeon X5660 system, ps reports a cpu-time of 1184018577 days for a process that has been running for 5 days with 11 threads:

ps -f -C ustacks
UID PID PPID C STIME TTY TIME CMD
cul07b 13246 10866 99 Sep21 pts/0 1184018577-00:27:06 /home/cul07b/stacks/bin/ustacks -t fasta -f BN_pair1_mod.fasta -o ../results -i 2 -d -r -m 5 -M 2 -p 11

The corresponding /proc/13246/stat file looks like this:
13246 (ustacks) R 10866 10866 10861 34816 10866 4202496 2989827 0 0 0 85771819844 4611685996976182811 0 0 20 0 11 0 9806481 12440649728 2894242 18446744073709551615 1 1 0 0 0 0 0 0 0 18446744073709551615 0 0 17 3 0 0 0 0 0

top reports the same process with 0% cpu load although it is still running full-throttle with 11 threads:
13246 cul07b 20 0 11.6g 11g 1412 R 0 23.4 178668,22 ustacks

I saw similar issues with a process running for a couple of days with 48 threads on a 4-way AMD Opteron 6172 system.

I think the same bug has already been reported on bugs.debian.org: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=641905

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: procps 1:3.2.8-1ubuntu4
ProcVersionSignature: Ubuntu 2.6.32-32.62-server 2.6.32.38+drm33.16
Uname: Linux 2.6.32-32-server x86_64
Architecture: amd64
Date: Mon Sep 26 11:32:31 2011
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=en_AU.UTF-8
SourcePackage: procps

Michael Ott (michaelott) wrote :
description: updated

Just a me too on this one. I ran into this on a 10.04.3 system with some Matlab jobs that have been running for a couple of weeks.

# ps -f -C MATLAB
UID PID PPID C STIME TTY TIME CMD
1234 21267 21256 99 Jan09 pts/3 1184016086-14:43:06 /opt/matlabR2011b/bin
1234 21578 21567 97 Jan09 pts/5 22-23:06:31 /opt/matlabR2011b/bin/glnxa64
1234 21688 21666 96 Jan09 pts/6 22-17:46:41 /opt/matlabR2011b/bin/glnxa64
1234 21786 21775 89 Jan09 pts/7 21-05:12:21 /opt/matlabR2011b/bin/glnxa64
1234 21884 21873 99 Jan09 pts/9 1184016084-11:35:28 /opt/matlabR2011b/bin

# cat /proc/21267/stat
21267 (MATLAB) S 21256 21267 21256 34819 21267 4202496 48728480602 10279 1 0 85619631648 4611685975611267009 16 1 20 0 49 0 176768180 3195682816 527162 18446744073709551615 4194304 4206714 140735315194304 140735315193832 140009299867740 0 134742022 0 151526638 18446744073709551615 0 0 17 15 0 0 0 0 0

# cat /proc/21884/stat
21884 (MATLAB) S 21873 21884 21873 34825 21884 4202496 47007338124 10280 1 0 85595226373 4611685975617266477 6 1 20 0 49 0 176794854 3336249344 558780 18446744073709551615 4194304 4206714 140734232928064 140734232927592 140313580697692 0 134742022 0 151526638 18446744073709551615 0 0 17 15 0 0 0 0 0

For what it's worth, htop seems to work properly and show the CPU usage of these processes.

Here's some system info:

# uname -a
Linux myhost 2.6.32-37-generic #81-Ubuntu SMP Fri Dec 2 20:32:42 UTC 2011 x86_64 GNU/Linux

# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.3 LTS"

Let me know if any other info would be helpful while these jobs are still running. :-)

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in procps (Ubuntu):
status: New → Confirmed
Chris J Arges (arges) wrote :

I believe this is a duplicate of this bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1057593

I would recommend upgrading your kernel such that it has this fix.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers