Possible local DoS: load increases abnormaly when running BOINC client
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Description: Ubuntu 8.04.1
Release: 8.04
Linux dogma 2.6.24-19-server #1 SMP Wed Jun 18 14:44:47 UTC 2008 x86_64 GNU/Linux
boinc-client:
Installed: 5.10.45-1ubuntu1
Candidate: 5.10.45-1ubuntu1
Version table:
*** 5.10.45-1ubuntu1 0
500 http://
100 /var/lib/
linux-image-
Installed: 2.6.24-19.41
Candidate: 2.6.24-19.41
Version table:
*** 2.6.24-19.41 0
500 http://
100 /var/lib/
2.6.24-19.36 0
500 http://
Not sure which package is at fault (boinc or maybe the kernel?). Here are the symptoms:
I upgraded from dapper to hardy. In dapper, boinc had been running Rosetta and WCG just fine for years. After the upgrade, I realized that the load was going crazily high (I stopped at 5+, it's a 2 way machine) and the machine became unresponsive when boinc was running. I performed a reset of all projects, started again with just WCG first. At first, things went ok but after a little while (~5/10 minutes) suddenly the load average would go nuts again (i.e: way over the expected 2.00). Here's a capture of top at this time:
top - 11:23:43 up 3 days, 17:42, 2 users, load average: 2.95, 2.40, 1.30
Tasks: 105 total, 4 running, 101 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 12.0%sy, 63.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 75.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1028352k total, 1008036k used, 20316k free, 63276k buffers
Swap: 779144k total, 128k used, 779016k free, 552288k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21262 boinc 39 19 277m 86m 1076 R 100 8.6 4:06.32 wcg_dddt_autodo
21265 boinc 39 19 34316 29m 1736 R 100 2.9 4:06.72 wcg_hcc1_img_6.
21502 root 20 0 10652 1100 824 R 0 0.1 0:00.05 top
1 root 20 0 4000 928 660 S 0 0.1 0:01.02 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:00.27 migration/0
4 root 15 -5 0 0 0 S 0 0.0 0:00.27 ksoftirqd/0
5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
6 root RT -5 0 0 0 S 0 0.0 0:00.28 migration/1
7 root 15 -5 0 0 0 S 0 0.0 0:00.20 ksoftirqd/1
Note the incoherent "25% idle" on both CPUs by the way.
Then I stopped, and tried with Rosetta, running only one thread. There again, everything went fine for a little while, until the load exploded again (above the expected 1.00):
top - 12:02:19 up 3 days, 18:20, 2 users, load average: 2.63, 1.56, 0.90
Tasks: 107 total, 5 running, 102 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 75.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1028352k total, 1013568k used, 14784k free, 62624k buffers
Swap: 779144k total, 136k used, 779008k free, 326464k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22553 boinc 39 19 312m 209m 12 R 100 20.9 8:08.59 rosetta_beta_5.
1 root 20 0 4000 928 660 S 0 0.1 0:01.02 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:00.27 migration/0
4 root 15 -5 0 0 0 S 0 0.0 0:00.27 ksoftirqd/0
5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
6 root RT -5 0 0 0 S 0 0.0 0:00.28 migration/1
7 root 15 -5 0 0 0 S 0 0.0 0:00.21 ksoftirqd/1
8 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/1
9 root 15 -5 0 0 0 R 0 0.0 0:00.98 events/0
10 root 15 -5 0 0 0 S 0 0.0 0:00.61 events/1
11 root 15 -5 0 0 0 S 0 0.0 0:00.00 khelper
44 root 15 -5 0 0 0 S 0 0.0 0:00.27 kblockd/0
Note again the weird split in the Cpu0 line.
ia32-libs can probably be ruled out as a cause of problem: rosetta is native 64bit (WCG is not, afaicr).
I also tried running boinc client with "SCHEDULE=0", in case this was a bug in the IDLEPRIO policy. Same symptoms.
HTH
I could reproduce this with vanilla 6.2.14 x86_64. It arguably took a little longer (about 25 minutes) for the issue to show up, but it eventually did, with the same symptoms (including the bogus 25% idle report on the working cpu(s) in top).
I would thus suspect a kernel bug (considering that I'm not aware of any such severe b0rkenness in the vanilla boinc client which I'm reliably using elsewhere, and also considering the fact that the bogus report from "top" altogether points toward some kernel breakage).
HTH