Ubuntu
linux package

Bug #245458
Comment #0

Comment 0 for bug 245458

Revision history for this message

T-Bone (varenet) wrote on 2008-07-04: load increases abnormaly when running boinc client

Description: Ubuntu 8.04.1
Release: 8.04

Linux dogma 2.6.24-19-server #1 SMP Wed Jun 18 14:44:47 UTC 2008 x86_64 GNU/Linux

boinc-client:
  Installed: 5.10.45-1ubuntu1
  Candidate: 5.10.45-1ubuntu1
  Version table:
*** 5.10.45-1ubuntu1 0
        500 http://se.archive.ubuntu.com hardy/universe Packages
        100 /var/lib/dpkg/status

Not sure which package is at fault (boinc or maybe the kernel?). Here are the symptoms:

I upgraded from dapper to hardy. In dapper, boinc had been running Rosetta and WCG just fine for years. After the upgrade, I realized that the load was going crazily high (I stopped at 5+, it's a 2 way machine) and the machine became unresponsive when boinc was running. I performed a reset of all projects, started again with just WCG first. At first, things went ok but after a little while (~5/10 minutes) suddenly the load average would go nuts again (i.e: way over the expected 2.00). Here's a capture of top at this time:

top - 11:23:43 up 3 days, 17:42, 2 users, load average: 2.95, 2.40, 1.30
Tasks: 105 total, 4 running, 101 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 12.0%sy, 63.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 75.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1028352k total, 1008036k used, 20316k free, 63276k buffers
Swap: 779144k total, 128k used, 779016k free, 552288k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21262 boinc 39 19 277m 86m 1076 R 100 8.6 4:06.32 wcg_dddt_autodo
21265 boinc 39 19 34316 29m 1736 R 100 2.9 4:06.72 wcg_hcc1_img_6.
21502 root 20 0 10652 1100 824 R 0 0.1 0:00.05 top
    1 root 20 0 4000 928 660 S 0 0.1 0:01.02 init
    2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
    3 root RT -5 0 0 0 S 0 0.0 0:00.27 migration/0
    4 root 15 -5 0 0 0 S 0 0.0 0:00.27 ksoftirqd/0
    5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
    6 root RT -5 0 0 0 S 0 0.0 0:00.28 migration/1
    7 root 15 -5 0 0 0 S 0 0.0 0:00.20 ksoftirqd/1

Note the incoherent "25% idle" on both CPUs by the way.

Then I stopped, and tried with Rosetta, running only one thread. There again, everything went fine for a little while, until the load exploded again (above the expected 1.00):

top - 12:02:19 up 3 days, 18:20, 2 users, load average: 2.63, 1.56, 0.90
Tasks: 107 total, 5 running, 102 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 75.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1028352k total, 1013568k used, 14784k free, 62624k buffers
Swap: 779144k total, 136k used, 779008k free, 326464k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22553 boinc 39 19 312m 209m 12 R 100 20.9 8:08.59 rosetta_beta_5.
    1 root 20 0 4000 928 660 S 0 0.1 0:01.02 init
    2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
    3 root RT -5 0 0 0 S 0 0.0 0:00.27 migration/0
    4 root 15 -5 0 0 0 S 0 0.0 0:00.27 ksoftirqd/0
    5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
    6 root RT -5 0 0 0 S 0 0.0 0:00.28 migration/1
    7 root 15 -5 0 0 0 S 0 0.0 0:00.21 ksoftirqd/1
    8 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/1
    9 root 15 -5 0 0 0 R 0 0.0 0:00.98 events/0
   10 root 15 -5 0 0 0 S 0 0.0 0:00.61 events/1
   11 root 15 -5 0 0 0 S 0 0.0 0:00.00 khelper
   44 root 15 -5 0 0 0 S 0 0.0 0:00.27 kblockd/0

Note again the weird split in the Cpu0 line.

ia32-libs can probably be ruled out as a cause of problem: rosetta is native 64bit (WCG is not, afaicr).

I also tried running boinc client with "SCHEDULE=0", in case this was a bug in the IDLEPRIO policy. Same symptoms.

HTH

Description:    Ubuntu 8.04.1
Release:        8.04

Linux dogma 2.6.24-19-server #1 SMP Wed Jun 18 14:44:47 UTC 2008 x86_64 GNU/Linux

boinc-client:
  Installed: 5.10.45-1ubuntu1
  Candidate: 5.10.45-1ubuntu1
  Version table:
 *** 5.10.45-1ubuntu1 0
        500 http://se.archive.ubuntu.com hardy/universe Packages
        100 /var/lib/dpkg/status

Not sure which package is at fault (boinc or maybe the kernel?). Here are the symptoms:

top - 11:23:43 up 3 days, 17:42,  2 users,  load average: 2.95, 2.40, 1.30
Tasks: 105 total,   4 running, 101 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us, 12.0%sy, 63.0%ni, 25.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy, 75.0%ni, 25.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1028352k total,  1008036k used,    20316k free,    63276k buffers
Swap:   779144k total,      128k used,   779016k free,   552288k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21262 boinc     39  19  277m  86m 1076 R  100  8.6   4:06.32 wcg_dddt_autodo
21265 boinc     39  19 34316  29m 1736 R  100  2.9   4:06.72 wcg_hcc1_img_6.
21502 root      20   0 10652 1100  824 R    0  0.1   0:00.05 top
    1 root      20   0  4000  928  660 S    0  0.1   0:01.02 init
    2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd
    3 root      RT  -5     0    0    0 S    0  0.0   0:00.27 migration/0
    4 root      15  -5     0    0    0 S    0  0.0   0:00.27 ksoftirqd/0
    5 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/0
    6 root      RT  -5     0    0    0 S    0  0.0   0:00.28 migration/1
    7 root      15  -5     0    0    0 S    0  0.0   0:00.20 ksoftirqd/1

Note the incoherent "25% idle" on both CPUs by the way.

Then I stopped, and tried with Rosetta, running only one thread. There again, everything went fine for a little while, until the load exploded again (above the expected 1.00):

top - 12:02:19 up 3 days, 18:20,  2 users,  load average: 2.63, 1.56, 0.90
Tasks: 107 total,   5 running, 102 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.0%sy, 75.0%ni, 25.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1028352k total,  1013568k used,    14784k free,    62624k buffers
Swap:   779144k total,      136k used,   779008k free,   326464k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
22553 boinc     39  19  312m 209m   12 R  100 20.9   8:08.59 rosetta_beta_5.
    1 root      20   0  4000  928  660 S    0  0.1   0:01.02 init
    2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd
    3 root      RT  -5     0    0    0 S    0  0.0   0:00.27 migration/0
    4 root      15  -5     0    0    0 S    0  0.0   0:00.27 ksoftirqd/0
    5 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/0
    6 root      RT  -5     0    0    0 S    0  0.0   0:00.28 migration/1
    7 root      15  -5     0    0    0 S    0  0.0   0:00.21 ksoftirqd/1
    8 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/1
    9 root      15  -5     0    0    0 R    0  0.0   0:00.98 events/0
   10 root      15  -5     0    0    0 S    0  0.0   0:00.61 events/1
   11 root      15  -5     0    0    0 S    0  0.0   0:00.00 khelper
   44 root      15  -5     0    0    0 S    0  0.0   0:00.27 kblockd/0

Note again the weird split in the Cpu0 line.

ia32-libs can probably be ruled out as a cause of problem: rosetta is native 64bit (WCG is not, afaicr).

I also tried running boinc client with "SCHEDULE=0", in case this was a bug in the IDLEPRIO policy. Same symptoms.

HTH

Ubuntulinux package

Comment 0 for bug 245458

Ubuntu
linux package