boinc stops on error after a few days (md5_file: Too many open files) in stderrdae.txt
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
boinc (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Precise |
Fix Released
|
Medium
|
Daniel Hahler |
Bug Description
== SRU Justification ==
Impact : when opened (oneiric), the bug was/is as described: a long time running boinc system would finally fails to compute more boinc work unit as every computed work unit leads to a leak of 1 file descriptor in the boinc main daemon (irrelevant of the kind of project subscribed). The faster the work units were processed, and the lower the limit of file descriptor in the system, the faster the bug happens (usually within one week of uninterrupted uptime, but might go to months on slower systems).
This bug affects all users from Oneiric (boinc 6.12.33+
(well, 7.0.23, 7.0.24 & 7.0.25 have another issue: computation error, no more leak; 7.0.26 not tested)
Test case: easy, but very long: run boinc for at least one complete work unit (according to project, the unit can be 5 minutes to many hours), then use "lsof" on the boinc daemon and check the end of the listing. When more units have been processed, the list reported by "lsof" should not be longer than before. Computation of each unit must succeed.
Regression Potential: I do not know the change, I cannot discuss the impact. But boinc must be able to run unattended for months without such problem, and without reboot, especially on a LTS.
== Original Description ==
There seems to be a file descriptor leaks in the boinc process (client side).
After a few days of fine loading the system, it would suddenly stop working.
Relaunching it is usually ok (but actively managing a system running boinc is rather not a decent solution).
Clue with the following command:
$ sudo lsof -p `pidof boinc`
The number of open file descriptor will keep increasing as boinc tasks are completed. (more visible when the projects have fast tasks for the hardware, such as sudoku or milkyway/nvidia)
A lot of entries are like:
boinc 15348 boinc 623r DIR 8,1 4096 29492116 /var/lib/
boinc 15348 boinc 624r DIR 8,1 4096 29492173 /var/lib/
boinc 15348 boinc 625r DIR 8,1 4096 29492116 /var/lib/
boinc 15348 boinc 626r DIR 8,1 4096 29492084 /var/lib/
boinc 15348 boinc 627r DIR 8,1 4096 29492085 /var/lib/
boinc 15348 boinc 628r DIR 8,1 4096 29492116 /var/lib/
boinc 15348 boinc 629r DIR 8,1 4096 29492173 /var/lib/
boinc 15348 boinc 630r DIR 8,1 4096 29492116 /var/lib/
boinc 15348 boinc 632r DIR 8,1 4096 29492018 /var/lib/
boinc 15348 boinc 633r DIR 8,1 4096 29492040 /var/lib/
boinc 15348 boinc 634r DIR 8,1 4096 29492018 /var/lib/
boinc 15348 boinc 635r DIR 8,1 4096 29492062 /var/lib/
boinc 15348 boinc 636r DIR 8,1 4096 29492116 /var/lib/
ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: boinc 6.12.33+
ProcVersionSign
Uname: Linux 3.0.0-17-generic x86_64
NonfreeKernelMo
ApportVersion: 1.23-0ubuntu4
Architecture: amd64
Date: Thu Mar 29 08:59:09 2012
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Release amd64 (20111012)
PackageArchitec
SourcePackage: boinc
UpgradeStatus: No upgrade log present (probably fresh install)
Changed in boinc (Ubuntu Precise): | |
status: | New → Triaged |
importance: | Undecided → Medium |
description: | updated |
tags: |
added: verification-done removed: verification-needed |
Changed in boinc (Ubuntu): | |
status: | Triaged → Fix Released |
Thank you for this report.
Can you try this with version 7.0.15 of BOINC, which is available in Ubuntu Precise (development branch) or via the pkg-boinc PPA at https:/ /launchpad. net/~pkg- boinc/+ archive/ testing , please?
I cannot confirm this on a box where the BOINC process runs since March 15th - there are only MEM regions matching "slot" in the lsof list.
Maybe this is caused by a specific project in your list? I am neither running sudoku nor milkyway on my host(s).