Natty: vmscan: fix a livelock in kswapd

Bug #813797 reported by Tim Gardner on 2011-07-20
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu
Undecided
Tim Gardner
Natty
Undecided
Tim Gardner
Oneiric
Undecided
Tim Gardner

Bug Description

Upstream 4746efded84d7c5a9c8d64d4c6e814ff0cf9fb42

I'm running a workload which triggers a lot of swap in a machine with 4 nodes. After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or kswapd2 are keeping running and I can't access filesystem, but most memory is free.

This looks like a regression since commit 08951e545918c159 ("mm: vmscan: correct check for kswapd sleeping in sleeping_prematurely").

Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if all zones have watermark ok, end_zone will keep 0.

Later sleeping_prematurely() always returns true. Because this is an order 3 wakeup, and if classzone_idx is 0, both balanced_pages and present_pages in pgdat_balanced() are 0. We add a special case here. If a zone has no page, we think it's balanced. This fixes the livelock.

Tim Gardner (timg-tpi) wrote :

Will be released in 3.0 final.

Herton R. Krzesinski (herton) wrote :

Please add SRU justification to this bug report.

Also this bug is awaiting verification that the kernel for Natty in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-natty' to 'verification-done-natty'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-natty
Tim Gardner (timg-tpi) wrote :

SRU Justification

Impact: kswapd can encounter a live lock

Patch Description: return an accurate balanced_pages from pgdat_balanced()

Tim Gardner (timg-tpi) wrote :

The attached patch is upstream commit 4746efded84d7c5a9c8d64d4c6e814ff0cf9fb42

Tim Gardner (timg-tpi) wrote :

Verified on a 4-way server not to cause a regression, though I couldn't reproduce the live lock with prior versions.

tags: added: verification-done-natty
removed: verification-needed-natty
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers