open()ing files takes a long time with low throughput

Bug #492841 reported by Tormod Volden
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned
ureadahead (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

Binary package hint: ureadahead

As can been seen in the bootchart, the throughput is excellent the first couple of seconds ureadahead is running, then it drops to the floor, and for 5 seconds there is close to nothing. Then it goes back to a medium throughput with only a few good peaks.

I can attach a ureadahead dump if needed, but I would need to filter out work stuff filenames that might not be public. Any other option than sed'ing out $HOME paths?

ProblemType: Bug
Architecture: i386
Date: Sat Dec 5 14:00:28 2009
DistroRelease: Ubuntu 10.04
PackDump: Error: command ['ureadahead', '--dump'] failed with exit code 4: ureadahead:/var/lib/ureadahead/pack: Permission denied
Package: ureadahead 0.100.0-2
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-6.8-generic
SourcePackage: ureadahead
Tags: lucid
Uname: Linux 2.6.32-6-generic i686

Revision history for this message
Tormod Volden (tormodvolden) wrote :
Revision history for this message
Tormod Volden (tormodvolden) wrote :

This is an HDD of course: [ 1.410710] ata1.00: ATA-8: SAMSUNG HM160HC, LQ100-10, max UDMA/100

Revision history for this message
Tormod Volden (tormodvolden) wrote :

I just created a test user and reprofiled with that one.

Revision history for this message
Tormod Volden (tormodvolden) wrote :
Revision history for this message
Tormod Volden (tormodvolden) wrote :

So I guess inode preloading is the high phase, "open files" is ~zero throughput, and "readahead" is moderate.

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
$ sudo /sbin/ureadahead --debug
/var/lib/ureadahead/pack: created Sun, 06 Dec 2009 16:26:31 +0000 for hdd 8:5
30 inode groups, 2161 files, 4900 blocks (130092 kB)
Read pack: 0.106s
Preload ext2fs inodes: 1.499s
Open files: 3.246s
Readahead: 11.102s

Revision history for this message
Tormod Volden (tormodvolden) wrote :

This is a CPU and disk graph while I did above test, (400 pixels, 0.1s/update).

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

You're quite correct, the phase in which the throughput goes very low is open()ing all the files - there's little I've been able to do to speed that up.

The current proposal is to drop the requirement to do that by adding a kernel syscall that lets us populate the page cache by inode number, rather than file handle

Changed in ureadahead (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
summary: - throughput goes very low after a first peak
+ open()ing files takes a long time with low throughput
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

For the third part, the throughput is about as high as your disk is capable of given the spread of files. The reason it's not max'd out is that we have to seek over bits of the disk we don't want/need in the page cache. We use the I/O elevator as efficiently as we can to merge reads, but at the end of the day, we have to seek.

Even more interesting is that the obvious "dips" correspond to the on-disk positions of your inode tables and suchlike ;) they take quite a jump to get over

Revision history for this message
Tormod Volden (tormodvolden) wrote :

Yes, for instance when resuming from hibernation it slurps in at ~40MB/s. I guess rearranging things on disk like MacOSX (and windows?) does is a bit too complicated.

Why is everything else blocked while ureadahead is reading? I would think some tasks can find their needed pieces in the cache meanwhile and could start doing something?

Revision history for this message
Rafał Cieślak (rafalcieslak256) wrote :

I have the same issue. As you can see on my bootchart it's exacly the same.

By the way (I don't think it's a bug in ureadahead), may I ask why does ureadahead takes only 50% of 'disk utilisation'? Would it be any faster if it would use it 100%?

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Added a kernel bug task - Kernel folks, this is the "bug" for that "load into the page cache without opening" patch we keep talking about

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Tormod: the problem there is that the "needed pieces" are rarely only at the start of the disk - it's probably better to spend the effort on a defragmenter so that they are!

rafal: ureadahead tries to utilise the disk as much as it can - the fact it can't reach 100% is because the time is lost seeking

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Tormod,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Tormod Volden (tormodvolden) wrote :

Damn robots :)

Changed in linux (Ubuntu):
status: Incomplete → New
tags: removed: needs-upstream-testing
Changed in linux (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Phillip Susi (psusi) wrote :

I have been experimenting with patches to correct this problem. I have split the readahead into two passes where the first pass calls readahead() to load directory blocks so that the open() calls do not block waiting for them, then the second pass reads the normal files as now. I have also been resurrecting the old defrag package and using it to pack the relevant files at the start of the disk with good results.

Phillip Susi (psusi)
Changed in linux (Ubuntu):
status: Triaged → Invalid
Revision history for this message
Rafał Cieślak (rafalcieslak256) wrote :

Any progress on that bug?

Revision history for this message
Phillip Susi (psusi) wrote :

Yes, I have a bzr branch and ppa build that I only got a few people to test during Natty but got positive feedback. I will try to get it uploaded to Oneric soon.

Revision history for this message
Rafał Cieślak (rafalcieslak256) wrote :

Will it work for natty too? I'd love to get that fixed as soon as possible, but I am not brave enough to use oneric just yet :)

Revision history for this message
Phillip Susi (psusi) wrote :

You can install it from my PPA for now.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.