Bug #595047 “Frequent swapping causes system to hang” : Bugs : linux package : Ubuntu

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2008-12-27:

#35

This is an attempt at bringing sanity to bug #7372. Please only comment here is you are experiencing high I/O wait times and interactvity on reasonable workloads.

Latest working kernel version: 2.6.18?

Problem Description:
I/O operations on large files tend to produce extremely high iowait times and poor system I/O performance (degraded interactivity). This behavior can be seen to varying degrees in tasks such as,
- Backing up /home (40GB with numerous large files) with diffbackup to external USB hard drive
- Moving messages between large maildirs
- updatedb
- Upgrading large numbers of packages with rpm

Steps to reproduce:
The best synthetic reproduction case I have found is,
$ dd if=/dev/zero of=/tmp/test bs=1M count=1M
During this copy, IO wait times are very high (70-80%) with extremely degraded interactivity although throughput averages about 29MB/s (about the disk's capacity I think). Even starting a new shell takes minutes, especially after letting the machine copy for a while without being actively used. Could this mean it's a caching issue?

Revision history for this message

In Linux Kernel Bug Tracker #12309, bgamari (bgamari-linux-kernel-bugs) wrote on 2008-12-27:

#36

For the record, this is even reproducible with Linus's master.

Revision history for this message

In Linux Kernel Bug Tracker #12309, ozan (ozan-linux-kernel-bugs) wrote on 2009-01-12:

#37

I'm also having this problem.

Latest working kernel version: 2.6.18.8 with config:
http://svn.pardus.org.tr/pardus/2007/kernel/kernel/files/pardus-kernel-config.patch

Currently working on 2.6.25.20 with config:
http://svn.pardus.org.tr/pardus/2008/kernel/kernel/files/pardus-kernel-config.patch

Tested also with 2.6.28 and felt no significant performance improvement.

--

During heavy disk IO's like running 'svn up' hogs the system avoiding the start a new shell, browse on the internet, do some text editing using vim, etc.

For example, after being able to open a text buffer with vim, 4-5 seconds delays happens between consecutive search attempts.

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-14:

#38

Download full text (3.1 KiB)

Hello Ben,

I don't known where to post it exactly. Why Linux Memory Management? Or why -mm and not mainstream? Can you do it for me please?

I have added a second test case, which using threads with pthread_mutex and pthread_cond instead of processes with pipes for communicating, to ensure it is a cpu scheduler issue.

I have repeated the tests with some vanilla kernels again, as there is a remark in the bug report for tainted or distro kernels. As I got a segmentation fault with the 2.6.28 kernel, I added the result of the Ubuntu 9.04 kernel (see attachment). The results are not comparable to the results posted before, as I have changed the time handling (doubles instead of int32_t as some echo messages takes more than one second).
The first three results are 2*100, 2*50 and 2*20 processes exchanging 100k, 200k and 1M messages over a pipe. The last three results are 2*100, 2*50, and 2*20 threads exchanging 100k, 200k and 1M messages with pthread_mutex and pthread_cond. I have added a 10 second pause at the beginning of every thread/process to assure the 2*100 processes or threads are all created and start to exchange the messages nearby at the same time. This was not the case at the old test-case with 2*100 processes, as the first thread was already destroyed before the last was created.

With the second test-case with threads, I got the problems (threads:2*100/msg:1M) immediately with the kernel 2.6.22.19. There kernel 2.6.20.21 was fine with both test-cases.

The meaning of the results:
- min message time
- average message time (80% of the messages)
- message time at median
- maximal message time
- test duration

Hello Ben,

I don't known where to post it exactly. Why Linux Memory Management? Or why -mm and not mainstream? Can you do it for me please?

I have added a second test case, which using threads with pthread_mutex and pthread_cond instead of processes with pipes for communicating, to ensure it is a cpu scheduler issue.

I have repeated the tests with some vanilla kernels again, as there is a remark in the bug report for tainted or distro kernels. As I got a segmentation fault with the 2.6.28 kernel, I added the result of the Ubuntu 9.04 kernel (see attachment). The results are not comparable to the results posted before, as I have changed the time handling (doubles instead of int32_t as some echo messages takes more than one second).
The first three results are 2*100, 2*50 and 2*20 processes exchanging 100k, 200k and 1M messages over a pipe. The last three results are 2*100, 2*50, and 2*20 threads exchanging 100k, 200k and 1M messages with pthread_mutex and pthread_cond. I have added a 10 second pause at the beginning of every thread/process to assure the 2*100 processes or threads are all created and start to exchange the messages nearby at the same time. This was not the case at the old test-case with 2*100 processes, as the first thread was already destroyed before the last was created.

With the second test-case with threads, I got the problems (threads:2*100/msg:1M) immediately with the kernel 2.6.22.19. There kernel 2.6.20.21 was fine with both test-cases.

The meaning of the results:
- min message time
- average message time (80% of the messages)
- message time at median
- maximal message time
- test duration

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-14:

#39

Created attachment 19795
test case with processes and pipes

Revision history for this message

In Linux Kernel Bug Tracker #12309, thomas.pi (thomas.pi-linux-kernel-bugs) wrote on 2009-01-14:

#40

Created attachment 19796
test case with threads and mutexes

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-06-21:

#1

I think this is related to swap, actually. I have decread swappiness to 10, and will see if I encounter this again. Not sure what the next steps are if that works.

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-06-21:

#2

Apparently reducing the swappiness didn't help. My system is currently hopelessly locked (and I'm writing this on another device on my desk). I'm ssh'ed into my netbook and running `iotop -a`. Here are the top two items:

1 be/4 root 10.34 M 0.00 B ?unavailable? init
33 be/4 root 16.00 K 340.00 K ?unavailable? [kswapd0]

The init read numbers keep going up (currently 10.86 MB) and the kswapd write numbers are going up as well.

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-06-21:

#3

To add to the last comment, kswapd has just leaped up to over 7 MB written and rising fast! this is in a span of a couple of minutes.

Chow Loong Jin (hyperair) on 2010-06-29

affects:

ubuntu → linux (Ubuntu)

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-02:

#4

Ok, from what I can tell the system is swapping way too easily. If I'm running Chromium + XChat I'm fine. As soon as I open OpenOffice or Evolution or Rhythmbox that seems to push the memory over the edge and the system starts swapping. If I can manage to get to a terminal in time and kill the last application I started my system will return to normal. If I don't, I might as well just hard power off. I have set the swappiness to 0 in /etc/sysctl.conf.

Afaict there is no one program that is causing the system to swap. It just seems that Ubuntu is over sensitive. There was no problem w/ this workload with Lucid, so I don't think I'm overstressing the system.

Revision history for this message

Jeremy Foshee (jeremyfoshee) wrote on 2010-07-06:

#5

Hi Pete,

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/daily/current/ . If the issue remains, please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 595047

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags:	added: needs-kernel-logs
tags:	added: needs-upstream-testing
tags:	added: kj-triage
Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07:

#6

Per some advice I received on #ubuntu-kernel I'm attaching some debugging output. Here is a list of the files:

vmstat-log-7-July-2010.txt: Output of `vmstat 1 60` once the system started swapping
free.txt: Output of `free -m` a minute later
free-2.txt: Output of `free -m` about 10 seconds after I ran free the first time
top-ouput.txt: Output of `top -b -n 1`

In case it isn't reflected in the top output, I'm running Chromium with four (?) tabs open (no Flash site or anything like that), Xchat connected to two servers and with about seven channels open, Evince viewing a simple pdf document, a Nautilus window, one gnome-terminal window and Banshee playing an mp3. Again, this is a normal load for me that didn't have a problem with previous versions of Ubuntu on the same device.

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07:

#7

vmstat-log-7-July-2010.txt Edit (5.0 KiB, text/plain)

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07:

#8

free.txt Edit (230 bytes, text/plain)

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07:

#9

free-2.txt Edit (230 bytes, text/plain)

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07:

#10

top-output.txt Edit (14.8 KiB, text/plain)

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07:

#11

I made the last comment before I read Jeremy's comments (should have refreshed the bug report). Anyway, I'll attach the output he is looking for and be sure to test the upstream kernel as well.

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: AlsaDevices.txt

#12

AlsaDevices.txt Edit (403 bytes, text/plain)

apport information

tags:	added: apport-collected
description:	updated

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: BootDmesg.txt

#13

BootDmesg.txt Edit (51.7 KiB, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: Card0.Amixer.values.txt

#14

Card0.Amixer.values.txt Edit (1.8 KiB, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: Card0.Codecs.codec.0.txt

#15

Card0.Codecs.codec.0.txt Edit (12.1 KiB, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: CurrentDmesg.txt

#16

CurrentDmesg.txt Edit (2.0 KiB, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: IwConfig.txt

#17

IwConfig.txt Edit (555 bytes, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: Lspci.txt

#18

Lspci.txt Edit (10.7 KiB, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: Lsusb.txt

#19

Lsusb.txt Edit (442 bytes, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: PciMultimedia.txt

#20

PciMultimedia.txt Edit (587 bytes, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: ProcCpuinfo.txt

#21

ProcCpuinfo.txt Edit (1.4 KiB, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: ProcInterrupts.txt

#22

ProcInterrupts.txt Edit (1.3 KiB, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: ProcModules.txt

#23

ProcModules.txt Edit (2.3 KiB, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: RfKill.txt

#24

RfKill.txt Edit (112 bytes, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: UdevDb.txt

#25

UdevDb.txt Edit (99.7 KiB, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: UdevLog.txt

#26

UdevLog.txt Edit (220.1 KiB, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: WifiSyslog.txt

#27

WifiSyslog.txt Edit (107.3 KiB, text/plain)

apport information

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07: Re: massive i/o renders the system unusable

#28

vmstat-upstream-kernel.txt Edit (5.0 KiB, text/plain)

I have reproduced the bug using the mainline kernel. I installed linux-image-2.6.35-999-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/, and rebooted running that kernel. I was running Chromium, Xchat and Banshee w/ no problem, so I decided to try to stress the system a bit more. I started Tomboy and opened one of my notes with no problem. I opened Nautilus and the system started swapping. I tried to get a vmstat by ssh'ing into the system, but by the time I was able to login the swapping had subsided. It started again and I ran vmstat to collect some stats. Don't know if this is useful, but will attach it.

tags:

removed: needs-upstream-testing

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-07:

#29

I noticed there is a 'needs kernel logs' tag. Is there something else you need attached? If so, do you need it attached running both the current maverick kernel and the mainline kernel?

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-09:

#30

Marking as 'New' since I think I have supplied all the required information.

Changed in linux (Ubuntu):
status:	Incomplete → New

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-07-23:

#31

In desperation I tried re-installing to see if I just had something installed with a serious memory leak. Unfortunately, this has not improved the situation. My device is next to useless until this is resolved.

Pete Goodall (pgoodall) on 2010-08-09

summary:

- massive i/o renders the system unusable
+ Frequent swapping causes system to hang

Rick Spencer (rick-rickspencer3) on 2010-08-09

Changed in linux (Ubuntu):
assignee:	nobody → Jeremy Foshee (jeremyfoshee)

Jeremy Foshee (jeremyfoshee) on 2010-08-09

tags:

added: kernel-core kernel-needs-review
removed: needs-kernel-logs

Revision history for this message

Pete Goodall (pgoodall) wrote on 2010-08-10:

#32

Apparently there are some patches for what appears to be this issue. Linked from the upstream kernel bug report: http://www.phoronix.com/scan.php?page=news_item&px=ODQ3Mw

Jeremy Foshee (jeremyfoshee) on 2010-08-11

Changed in linux (Ubuntu):
assignee:	Jeremy Foshee (jeremyfoshee) → nobody

Brad Figg (brad-figg) on 2010-12-03

tags:	added: acpi-namespace-lookup
tags:	added: acpi-parse-exec-fail

Bug Watch Updater (bug-watch-updater) on 2011-01-24

Changed in linux:
status:	Unknown → Confirmed

Bug Watch Updater (bug-watch-updater) on 2011-02-04

Changed in linux:
importance:	Unknown → High

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-10-03:

#33

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status:	New → Confirmed

Bug Watch Updater (bug-watch-updater) on 2012-06-10

Changed in linux:
status:	Confirmed → Fix Released

Revision history for this message

penalvch (penalvch) wrote on 2012-10-15:

#34

Pete Goodall, thank you for reporting this and helping make Ubuntu better. Maverick reached EOL on April 10, 2012.
Please see this document for currently supported Ubuntu releases:
https://wiki.ubuntu.com/Releases

We were wondering if this is still an issue in a supported release? If so, could you please test for this with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the kernel in the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested and remove the tag:
needs-upstream-testing

This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the text:
needs-upstream-testing

If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested.

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested.

If you are unable to test the mainline kernel, please comment as to why specifically you were unable to test it and add the following tags:
kernel-unable-to-test-upstream
kernel-unable-to-test-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested.

Please let us know your results. Thank you for your understanding.

Helpful Bug Reporting Tips:
https://help.ubuntu.com/community/ReportingBugs