kswapd pulls server down every 2 weeks.

Bug #1462919 reported by Søren Holm
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Unassigned

Bug Description

in 15.04 even when having plenty of RAM and swap kswpd uses lots of CPU and might bring the machine down to a crawl making is unable to access it over the network.

I say "might", because it's only a very clear feeling that I have. I do not know it for sure.

Anyway the fact is that I have two server that once every 2 week ends up totally unaccessible from the network. They can be pinged though. Logging in locally takes a long time. Rebooting them locally takes a long time. They have absolutely no noticeable disk IO when in this stalled state.

ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: linux-image-3.19.0-16-generic 3.19.0-16.16
ProcVersionSignature: Ubuntu 3.19.0-18.18-generic 3.19.6
Uname: Linux 3.19.0-18-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 jun 8 09:57 seq
 crw-rw---- 1 root audio 116, 33 jun 8 09:57 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.17.2-0ubuntu1.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Mon Jun 8 10:02:44 2015
HibernationDevice: RESUME=UUID=17183f57-6587-47f2-84dc-553f7e5cf3a8
IwConfig:
 eth0 no wireless extensions.

 eth1 no wireless extensions.

 lo no wireless extensions.
MachineType: Intel Corporation S5520UR
PciMultimedia:

ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-18-generic root=/dev/mapper/vg0-root ro nomdmonddf nomdmonisw crashkernel=384M-:128M nomdmonddf nomdmonisw crashkernel=384M-:128M nomdmonddf nomdmonisw crashkernel=384M-:128M nomdmonddf nomdmonisw crashkernel=384M-:128M nomdmonddf nomdmonisw crashkernel=384M-:128M
RelatedPackageVersions:
 linux-restricted-modules-3.19.0-18-generic N/A
 linux-backports-modules-3.19.0-18-generic N/A
 linux-firmware 1.143.1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to vivid on 2015-05-16 (22 days ago)
dmi.bios.date: 12/17/2009
dmi.bios.vendor: Intel Corp.
dmi.bios.version: S5500.86B.01.00.0046.121720091524
dmi.board.asset.tag: ....................
dmi.board.name: S5520UR
dmi.board.vendor: Intel Corporation
dmi.board.version: E22554-606
dmi.chassis.asset.tag: ....................
dmi.chassis.type: 23
dmi.chassis.vendor: ...............................
dmi.chassis.version: ..................
dmi.modalias: dmi:bvnIntelCorp.:bvrS5500.86B.01.00.0046.121720091524:bd12/17/2009:svnIntelCorporation:pnS5520UR:pvr....................:rvnIntelCorporation:rnS5520UR:rvrE22554-606:cvn...............................:ct23:cvr..................:
dmi.product.name: S5520UR
dmi.product.version: ....................
dmi.sys.vendor: Intel Corporation

Revision history for this message
Søren Holm (sgh) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.19.0-15.15)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get dist-upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.19.0-15.15
Revision history for this message
penalvch (penalvch) wrote :

Søren Holm, thank you for reporting this and helping make Ubuntu better. As per https://downloadcenter.intel.com/product/36456/Intel-Server-Board-S5520UR an update to your computer's buggy and outdated BIOS is available (64;64;26;1.12). If you update to this following https://help.ubuntu.com/community/BIOSUpdate does it change anything?

If it doesn't, could you please both specify what happened, and provide the output of the following terminal command:
sudo dmidecode -s bios-version && sudo dmidecode -s bios-release-date

For more on BIOS updates and linux, please see https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette .

Please note your current BIOS is already in the Bug Description, so posting this on the old BIOS would not be helpful. As well, you don't have to create a new bug report.

Once the BIOS is updated, and the information above is provided, then please mark this report Status Confirmed.

Thank you for your understanding.

Changed in linux (Ubuntu):
importance: Critical → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a kernel version where you were not having this particular problem? This will help determine if the problem you are seeing is the result of a regression, and when this regression was introduced. If this is a regression, we can perform a kernel bisect to identify the commit that introduced the problem.

Revision history for this message
Søren Holm (sgh) wrote : Re: [Bug 1462919] Re: kswapd pulls server down every 2 weeks.

It happenend after upgrading to 15.04

For some reason the system still used kernel 3.13 after the upgrade - on which
it failed. Upgrading to 3.19 did not change anything.

I have difficulties believing that and userspace sutff could cause it, but maybe
the systemd-stuff could affect it anyhow.

penalvch (penalvch)
tags: added: bios-outdated-64
Revision history for this message
Søren Holm (sgh) wrote :

On one of the machine suffering from this I run jenkins. Shutting down jenkins immediately gets kswapd back to normal. Starting jenkins again makes kswapd spin again. But a reboot mitigates the problem for several days.

I'll keep investigating this.

Revision history for this message
Søren Holm (sgh) wrote :

I'm very sure that it happens to every 15.04 system running continuously 24/7. BUT it seem to be very much related to the java-stuff involved in running Jenkins or a TeamCity agent (as with the actual affected system).

I have not upgraded the BIOS yet - and I do not think that it could affect it. But I will upgrade it.

Revision history for this message
Søren Holm (sgh) wrote :

Look at the this graph

http://sgh.dk/~sgh/cpustat_month.png

This is a machine that it not doing anything cpu intensive.

Revision history for this message
Søren Holm (sgh) wrote :
Revision history for this message
Viktor Pal (deere) wrote :

I experience the same on desktop (15.04) maybe once a week.
I presume that this happens when some laking application starts to eat up all the memory quickly.
When this happens the whole machine gets unresponsive immediately and only restarting helps.
It only turns out from the atop logs what happened after the machine was restarted. Atop is configured to run every 30 secs.
It is visible from the atop logs that kswapd0 is eating up all the cpu resources.
Strange thing is that this happens despite there is no swap enabled.

I suppose what should happen is that the kernel OOMs and kills some application and everything is back to normal.
What happens though is that the machine dies. Of course every unsaved work and open windows and applications are lost, which is very unfortunate.

Revision history for this message
penalvch (penalvch) wrote :

Viktor Pal, it will help immensely if you filed a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

Revision history for this message
Søren Holm (sgh) wrote :

It's not a problem on 15.10

Revision history for this message
penalvch (penalvch) wrote :

Søren Holm, this bug report is being closed due to your last comment https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1462919/comments/13 regarding this being fixed with an update. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.