Comment 542 for bug 620074

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

@Yaroslav: Your misconception is that having swap disabled means that memory pages are never backed by disk blocks. That is simply not true. All it means is that *anonymous* pages cannot be backed by disk.

All Linux kernels launch processes from disk (via execve(2)) by memory-mapping the executable image on disk and then jumping to the entry point address in the mapped image. Since the entry point address is in a non-resident page, the CPU's attempt to fetch an instruction from it triggers a page fault, which the kernel then handles by loading the needed page (and usually several more) from disk.

When physical memory becomes scarce, the kernel has several tricks it may employ to attempt to free up memory. One of the first of these tricks is dropping cached blocks from the block layer and cached directory entries from the file system layer, which means that those blocks and dentries will have to be fetched from disk the next time they are accessed. One of the last tricks the kernel has is the OOM killer, which selects the "most offending" process and KILLs it in order to reclaim the memory it was using.

Somewhere in between those two tricks, the kernel has another trick it attempts for freeing up physical memory. It can force memory pages out to disk. If the system has swap enabled, the kernel may force anonymous pages (e.g., process heaps and stacks) out to disk. In all cases, however, the kernel may also choose to force memory-mapped pages out to disk. If those memory-mapped pages are read-only (such as is the case with executable images), then "forcing them out to disk" really just means dropping them from physical memory, since they can always be fetched back in later.

So, what does this mean in the context of this bug? The process that's hitting the disk a lot (usually it's dirtying blocks, but maybe it's possible that this happens even if it's just reading blocks) causes RAM to fill up with disk blocks. The kernel starts attempting its tricks to free up physical memory. One of those tricks is dropping memory-mapped pages from RAM, since they can always be fetched back into RAM from disk later. Then you the user switch applications or click on a button in the GUI or try to log into an SSH session, and what happens? Page fault! The code for repainting the X11 window or handling the button click or spawning a login session is not resident in memory because it was forced out by the kernel. That code now must be refetched from disk to satisfy the page fault, but uh oh, the disk is VERY busy and has very long queue depths, so it will be a while before the needed pages can be fetched. And at the same time as those pages are being fetched, the kernel is evicting other memory-mapped pages from RAM, so the responsiveness problem is just going to persist until the pressure on RAM subsides.

Ideally, the kernel should not allow so many blocks to be dirtied that it has to resort to dropping memory-mapped pages from RAM. The dirty_ratio knob is supposed to control how much of RAM a process is allowed to fill with dirty blocks before it's forced to write them to disk itself (synchronously), but that does not appear to be working properly.