Comment 0 for bug 245458

Revision history for this message
T-Bone (varenet) wrote : load increases abnormaly when running boinc client

Description: Ubuntu 8.04.1
Release: 8.04

Linux dogma 2.6.24-19-server #1 SMP Wed Jun 18 14:44:47 UTC 2008 x86_64 GNU/Linux

boinc-client:
  Installed: 5.10.45-1ubuntu1
  Candidate: 5.10.45-1ubuntu1
  Version table:
 *** 5.10.45-1ubuntu1 0
        500 http://se.archive.ubuntu.com hardy/universe Packages
        100 /var/lib/dpkg/status

Not sure which package is at fault (boinc or maybe the kernel?). Here are the symptoms:

I upgraded from dapper to hardy. In dapper, boinc had been running Rosetta and WCG just fine for years. After the upgrade, I realized that the load was going crazily high (I stopped at 5+, it's a 2 way machine) and the machine became unresponsive when boinc was running. I performed a reset of all projects, started again with just WCG first. At first, things went ok but after a little while (~5/10 minutes) suddenly the load average would go nuts again (i.e: way over the expected 2.00). Here's a capture of top at this time:

top - 11:23:43 up 3 days, 17:42, 2 users, load average: 2.95, 2.40, 1.30
Tasks: 105 total, 4 running, 101 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 12.0%sy, 63.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 75.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1028352k total, 1008036k used, 20316k free, 63276k buffers
Swap: 779144k total, 128k used, 779016k free, 552288k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21262 boinc 39 19 277m 86m 1076 R 100 8.6 4:06.32 wcg_dddt_autodo
21265 boinc 39 19 34316 29m 1736 R 100 2.9 4:06.72 wcg_hcc1_img_6.
21502 root 20 0 10652 1100 824 R 0 0.1 0:00.05 top
    1 root 20 0 4000 928 660 S 0 0.1 0:01.02 init
    2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
    3 root RT -5 0 0 0 S 0 0.0 0:00.27 migration/0
    4 root 15 -5 0 0 0 S 0 0.0 0:00.27 ksoftirqd/0
    5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
    6 root RT -5 0 0 0 S 0 0.0 0:00.28 migration/1
    7 root 15 -5 0 0 0 S 0 0.0 0:00.20 ksoftirqd/1

Note the incoherent "25% idle" on both CPUs by the way.

Then I stopped, and tried with Rosetta, running only one thread. There again, everything went fine for a little while, until the load exploded again (above the expected 1.00):

top - 12:02:19 up 3 days, 18:20, 2 users, load average: 2.63, 1.56, 0.90
Tasks: 107 total, 5 running, 102 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 75.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1028352k total, 1013568k used, 14784k free, 62624k buffers
Swap: 779144k total, 136k used, 779008k free, 326464k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22553 boinc 39 19 312m 209m 12 R 100 20.9 8:08.59 rosetta_beta_5.
    1 root 20 0 4000 928 660 S 0 0.1 0:01.02 init
    2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
    3 root RT -5 0 0 0 S 0 0.0 0:00.27 migration/0
    4 root 15 -5 0 0 0 S 0 0.0 0:00.27 ksoftirqd/0
    5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
    6 root RT -5 0 0 0 S 0 0.0 0:00.28 migration/1
    7 root 15 -5 0 0 0 S 0 0.0 0:00.21 ksoftirqd/1
    8 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/1
    9 root 15 -5 0 0 0 R 0 0.0 0:00.98 events/0
   10 root 15 -5 0 0 0 S 0 0.0 0:00.61 events/1
   11 root 15 -5 0 0 0 S 0 0.0 0:00.00 khelper
   44 root 15 -5 0 0 0 S 0 0.0 0:00.27 kblockd/0

Note again the weird split in the Cpu0 line.

ia32-libs can probably be ruled out as a cause of problem: rosetta is native 64bit (WCG is not, afaicr).

I also tried running boinc client with "SCHEDULE=0", in case this was a bug in the IDLEPRIO policy. Same symptoms.

HTH