Comment 126 for bug 1518457

Revision history for this message
Dan Streetman (ddstreet) wrote :

The problem is a bit complex. The Xen hypervisor uses memory ballooning, to control how many memory pages the guest can use. The kernel enumerates its e820 memory at boot, and since it's only 1G in this case, it all gets placed into the DMA32 zone. Then later during boot when the Xen balloon driver is initialized, it dynamically adds the balloon memory. The kernel always places hot-added memory into the Normal zone however, so the system winds up with the balloon memory, and only the balloon memory, in the Normal zone. Since the balloon driver starts at, or very close to, its memory target, only a very small number of pages are made available, which results in a Normal memory zone that's tiny - only 9 managed pages on the instance I tested. You can read the /proc/zoneinfo file to find the number of managed pages in the Normal zone.

Then when the system encounters memory pressure (i.e. very little free memory left), it wakes up the kswapd daemon to start freeing memory. The kswapd daemon then tries to "balance" memory by "balancing" each zone - DMA, DMA32, and Normal zones. However, it's essentially impossible for it to free pages from the Normal zone, because there are so few pages that whenever one is freed, the next page allocation takes it (because pages are usually allocated from the Normal zone first), and kswapd winds up in a continuous cycle of trying to free pages from the Normal zone forever.

This is also why disabling the udev memory hotadd (see comment 69) works around the problem - it prevents the Xen balloon driver from adding/enabling any of the pages in the Normal zone, so kswapd never has to bother trying to balance it, and thus there's no problem.

This appears to be fixed by Mel Gorman's 34-commit patch series that changes kswapd memory balancing to "per node" instead of "per zone":
https://marc.info/?l=linux-mm&m=146797052519026

That's a rather large patchset to backport to the xenial kernel, but I'll give it a try.