Natty: vmscan: fix a livelock in kswapd

Bug #813797 reported by Tim Gardner on 2011-07-20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Tim Gardner
Tim Gardner
Tim Gardner

Bug Description

Upstream 4746efded84d7c5a9c8d64d4c6e814ff0cf9fb42

I'm running a workload which triggers a lot of swap in a machine with 4 nodes. After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or kswapd2 are keeping running and I can't access filesystem, but most memory is free.

This looks like a regression since commit 08951e545918c159 ("mm: vmscan: correct check for kswapd sleeping in sleeping_prematurely").

Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if all zones have watermark ok, end_zone will keep 0.

Later sleeping_prematurely() always returns true. Because this is an order 3 wakeup, and if classzone_idx is 0, both balanced_pages and present_pages in pgdat_balanced() are 0. We add a special case here. If a zone has no page, we think it's balanced. This fixes the livelock.

Tim Gardner (timg-tpi) wrote :

Will be released in 3.0 final.

Herton R. Krzesinski (herton) wrote :

Please add SRU justification to this bug report.

Also this bug is awaiting verification that the kernel for Natty in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-natty' to 'verification-done-natty'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-natty
Tim Gardner (timg-tpi) wrote :

SRU Justification

Impact: kswapd can encounter a live lock

Patch Description: return an accurate balanced_pages from pgdat_balanced()

Tim Gardner (timg-tpi) wrote :

The attached patch is upstream commit 4746efded84d7c5a9c8d64d4c6e814ff0cf9fb42

Tim Gardner (timg-tpi) wrote :

Verified on a 4-way server not to cause a regression, though I couldn't reproduce the live lock with prior versions.

tags: added: verification-done-natty
removed: verification-needed-natty
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers