[regression] since linux 3.4.0, kswapd0 uses 100% cpu when low memory is available

Bug #1055534 reported by Leuke on 2012-09-24
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

When the system runs with low available memory, kswapd0 starts running consuming 100% of one cpu and persisting in this state until I close some processes to free up a large amount of memory.
I can easily reproduce this problem, as it happens every time the system memory goes up to about 80%, but it's not limited to this case.
The regression was introduced in version 3.4.0 (downloaded directly from kernel.ubuntu.com/~kernel-ppa/mainline) and it's still here in linux 3.6-rc7. It never happens in any prior releases including 3.3.8.

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: linux-image-3.5.0-15-generic 3.5.0-15.22
ProcVersionSignature: Ubuntu 3.5.0-15.22-generic 3.5.4
Uname: Linux 3.5.0-15-generic i686
NonfreeKernelModules: nvidia
ApportVersion: 2.5.2-0ubuntu4
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: claudio 1939 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory: 'iw'
CurrentDmesg:
 [ 47.206985] NVRM: Your system is not currently configured to drive a VGA console
 [ 47.206994] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
 [ 47.207001] NVRM: requires the use of a text-mode VGA console. Use of other console
 [ 47.207007] NVRM: drivers including, but not limited to, vesafb, may result in
 [ 47.207013] NVRM: corruption and stability problems, and is not supported.
Date: Mon Sep 24 16:29:42 2012
HibernationDevice: RESUME=UUID=56591795-2723-4b47-85bb-9c7453be8a09
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Alpha i386 (20120910)
MachineType: SAMSUNG ELECTRONICS CO., LTD. N510
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.5.0-15-generic root=UUID=8b95377a-f68b-4797-8f50-b1cd825aafb8 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-15-generic N/A
 linux-backports-modules-3.5.0-15-generic N/A
 linux-firmware 1.93
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/24/2009
dmi.bios.vendor: Phoenix Technologies Ltd.
dmi.bios.version: 02MU.M004.20090824.LDG
dmi.board.name: N510
dmi.board.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.board.version: Not Applicable
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLtd.:bvr02MU.M004.20090824.LDG:bd08/24/2009:svnSAMSUNGELECTRONICSCO.,LTD.:pnN510:pvrNotApplicable:rvnSAMSUNGELECTRONICSCO.,LTD.:rnN510:rvrNotApplicable:cvnSAMSUNGELECTRONICSCO.,LTD.:ct10:cvrN/A:
dmi.product.name: N510
dmi.product.version: Not Applicable
dmi.sys.vendor: SAMSUNG ELECTRONICS CO., LTD.

Leuke (leuke) wrote :
Brad Figg (brad-figg) on 2012-09-24
Changed in linux (Ubuntu):
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a bisect to figure out what commit caused this regression. It would be very helpful to know the earliest kernel where the issue started happening as well as the latest kernel that did not have this issue.

Can you test the following kernels and report back? We are looking for the first kernel version that doesn't have this bug:

v3.4-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc1-precise/
v3.4-rc2: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc2-precise/
v3.4-rc3: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc3-precise/
...

You don't have to test every kernel, just up until the kernel that first has this bug.

Thanks in advance!

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: performing-bisect
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Leuke (leuke) wrote :

I made some tests and I found out the bug appeared exactly with this release: v3.4-quantal.
In any previous version, including v3.4-precise, kswapd0 works as expected.

Joseph Salisbury (jsalisbury) wrote :

Thanks for testing, Leuke. Now that we know the bug exists in v3.4, we need to find out what release candidate introduced the bug. Can you test the following kernels as well:

v3.4-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc1-precise/
v3.4-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc4-precise/

Leuke (leuke) wrote :

You're welcome! I already tested rc1, rc6 and some kernels between them, I'm not sure about rc4. Anyway in my previous comment I meant every kernel before v3.4-quantal, including v3.4-precise and all the release candidates, I suppose they are all ok because rc6, the latest, is not affected at all.
That said, to be honest I didn't spend much time on testing that releases (only few hours for all of them), because where the bug is present, it occurs very quickly when system has low memory.
Did you mention precisely rc1 and rc4 because you have suspects directly on them?
If needed I can run a kernel on a daily basis to be 100% sure it is not affected.

Joseph Salisbury (jsalisbury) wrote :

I see now. So the latest 3.4 kernel does not have the bug. If that is the case, we will want to test some of the v3.5 release candidates to find the first one that has the bug. Can you test the following kernels and report back the first with the bug:

v3.5-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5-rc1-precise/
v3.5-rc2: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5-rc3-precise/
v3.5-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5-rc4-precise/
etc.

Leuke (leuke) wrote :

I'm sorry, this bug seems to be much more random than I thought. While using the stock version for weeks kswapd0 gets in a loop pretty often, I'm testing v3.5 http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5-quantal/ for an hour and this never happened so far.
What I know for sure is that the stock kernel has been quite unusable since I started using ubuntu 12.10, so I tried different configurations and the latest I'm sure is not affected is 3.3.8. I can confirm this because I used it several days.
On 3.4.0 instead, I saw the bug even though I used it much less time.
I thought it was much easier to reproduce in every configuration, but this is not the case, so finding out exactly when this started is actually much harder than I expected.

Alejandro Martínez (zenitram) wrote :

This happens for me on kernel 3.5.0-17-generic, most often after resuming my laptop from suspend.

Leuke (leuke) wrote :

You're right, also in my case this happens most often after resuming from suspend.

Alejandro Martínez (zenitram) wrote :

I think this patch might be the solution, I'm going to try to apply it:

https://lkml.org/lkml/2012/10/12/206

¿are you also using LVM?

Leuke (leuke) wrote :

No, not using LVM.

Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
cro (cro) wrote :

I've started seeing this problem with the most recent kernel(3.8.0-19), even when swap is turned off.

For example, when resuming from suspend, kwapd0 will start using 99-100% if CPU according to top, even though the reported swap usage is 0.

running swapoff -a has no effect, and the only way to stop kswapd0 from consuming all CPU is to close any process that is currently using a lot of RAM (for example Firefox).

It is currently so bad that at times the entire computer is unusable as kswapd0 is thrashing the disk, blocking all other IO (shell, TTY, UI)and I am currently investigating ways to physically prevent the process from ever starting.

Changed in linux (Ubuntu):
status: Expired → Confirmed
status: Confirmed → Incomplete
cro (cro) wrote :

(also, apologies for changing the status, I wasn't aware I could do that)

Joseph Salisbury (jsalisbury) wrote :

@ZenitraM

Did the patch you mentine in comment #10 resolve the issue for you?

Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers