cfq triggers smbd timeouts
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
I have Samba installed on a 10.4 amd64 system with a single share on an ext4 volume. This share is hit by multiple Windows systems nightly as they store ntbackup dumps to the box. I've been chasing random backup job failures on this machine for awhile now, thinking there was a bug in Samba but couldn't find an error in the logs or ever saw a crash dump. If I run the dumps manually during the day, no problem. The problems only occurred at night, and the time was random. When failures occurred if there were multiple they'd all happen at the same time. Windows would only report an error writing.
Digging around I found some discussions online noting cfq causing IO starvation in some workloads, causing processes to appear to hang for durations of up to and over 2 minutes. My samba logs show 1 minute plus 'pauses' in activity right before Windows logs a failure. Changing to noop has so far cleared up the failures.
I'm currently running 2.6.35-999-generic #201008021608 (mainline kernel) due to bug 474089, and have upgraded to the Maverick samba packages and related libs as part of trying to track this issue down. I'm writing to a 2TB SATA drive behind a cciss controller, no RAID. The problem is definitely load related so I can only really get one viable test in per night, and for obvious reasons I can't stick with a broken config for too many nights in a row, but I'm more than willing to try and gather whatever test data is needed to improve things.
Hi Joshua,
Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http:// cdimage. ubuntu. com/daily/ current/ . If the issue remains, please run the following command from a Terminal (Applications- >Accessories- >Terminal) . It will automatically gather and attach updated debug information to this report.
apport-collect -p linux 627380
Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https:/ /wiki.ubuntu. com/KernelMainl ineBuilds . Once you've tested the upstream kernel, please remove the 'needs- upstream- testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs- upstream- testing' text. Please let us know your results.
Thanks in advance.
[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]