Comment 579 for bug 620074

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to comment #564)
> Thanks to all people involved in solving this bug.

Does anyone have a link to a discussion list post or a technical article detailing the theory behind the solution to this bug? Since this "bug" encompasses so many scenarios, I have doubts about whether all of them have indeed been resolved. I'm glad one person's problem went away, but until a kernel hacker can stand up and explain exactly what was wrong and how they fixed it, I'm going to assume there are still lurking problems in Linux's I/O subsystem.

One problem we've seen and discussed in this thread is that large numbers of dirty blocks waiting to be flushed to disk can cause eviction of "hot" pages of code that are needed by interactive user processes, thus bringing the system to a state of thrashing in which processes continually trigger page faults because their actively executing code keeps being forced out of RAM by the large buffered write to disk. Even if this problem has been solved (presumably by fixing a bug in the code that is supposed to force a process to flush its own dirty pages to disk once dirty_ratio has been reached), there would still be the problem of the kernel's evicting hot pages from RAM so aggressively in low-memory conditions that interactivity of the system is compromised to the point where it's impossible for the user to resolve the memory shortage.

It's pretty easy to reproduce the thrashing scenario: just mount a tmpfs whose max size is close to the amount of physical memory in the system and start writing data to it. Eventually you may find that you are no longer able to do anything, even to give input focus to your terminal emulator so you can interrupt your writing process (or in some setups, even to move your mouse cursor on the screen), because your entire desktop environment and even the X server have been evicted from RAM and are continually paging back in from disk (and being immediately evicted again), hindering your ability to do anything. I've encountered this scenario while compiling Chromium in a tmpfs. I'd expect the OOM killer to activate, but instead I find that all of my running applications are responding at a snail's pace because they have to keep paging in bits of their program code from disk. I should mention that I run without swap.

I would think one way to solve the thrashing problem would be to introduce a kernel knob that would set how much time must elapse between a page being fetched from disk into RAM due to a page fault and that page becoming eligible for eviction from RAM. If set to, say, 30 seconds, then the user's interactive processes could retain a usable degree of interactivity, even under extremely low memory conditions. This would, of course, mean that the OOM killer would activate sooner than it does now, since pages that the kernel would presently choose to evict in order to free up RAM would be ineligible under this new time limit. Setting the knob to zero would yield the behavior we have now, in which the kernel is free to evict all unlocked pages.

I'll reiterate once more, as a refresher, that this was formerly not such a problem on 32-bit x86 systems because most library code there contained relocations that would cause the pages containing the code for libraries to differ from disk, so they could not be evicted (assuming no swap). Now that we use position-independent code on x86_64, most executable pages in RAM are identical to the copies on disk, so they are eligible for evicting since the kernel can just page them back in from disk when they're needed. That convenience turns on us when we find that pages that are needed very frequently (like pages that handle moving the mouse cursor or blinking a cursor) are being evicted aggressively.