System behaved as if OOM when it had plenty to spare

Bug #386554 reported by Scott James Remnant (Canonical) on 2009-06-12
74
This bug affects 7 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

Had the weirdest behaviour today, current karmic kernel. I haven't got a swap partition or swap file configured, because I was playing with swapd and stuff earlier and hadn't turned them back on again.

The system suddenly became extremely slow and unresponsive, with massive amounts of disk activity

The thing is, here was the output of free at the time:

             total used free shared buffers cached
Mem: 1534944 1490160 44784 0 18396 1078164
-/+ buffers/cache: 393600 1141344
Swap: 0 0 0

In other words, while it had actually 44MB free (which is still quite a lot, even though I was doing things in Firefox) - there was 1GB of cached pages sitting there - and I was doing much so most of them can't have been dirty!

So why was the machine even touching the disk? It should have been able to simply purge non-dirty pages from its cache and carry on

Or am I grossly missing something?

Plot thickens a bit... looks like even the OOM killer got involved

is there any way to debug why the kernel was so keen on keeping that 1GB of page cache around that it killed other things instead?

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Jim Lieb (lieb) wrote :

It would be interesting to see what top and other stats report. 44M is only 3% of memory in this case and looking at firefox on my x86_64, it has a vm of 950M and a res set of 434M. It could be hitting the point where it is invalidating text pages only to have them be paged back in, basic VM thrashing. Having no swap means that anon pages have nowhere to go so they stay in the page cache whereas they would "normally" get LRU'd to swap to relieve pressure. w/out swap, the machine is a bit "unbalanced". 3% is getting close to the triggers. This is just a guess. We would have to see what the disk traffic was and a few other vm #s were to have a better understanding. One clue would be a predominance of reads. If the traffic could be broken out between /, /usr, /var, and /home we could also get some understanding *what* the reads were about.

Jim Lieb (lieb) wrote :

@Scott, check https://bugs.launchpad.net/ubuntu/+source/linux/+bug/131094/comments/235. That bug and others show similar symptoms going back to ~2.6.18. I found a number of bugs related to what you have seen last week after responding here.

much ubalanced with i_x86 with beta.. slow swap and memory

Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Triaged a while ago but has not had any updated comments for quite some time. Please let us know if this issue remains in the current Ubuntu release, http://www.ubuntu.com/getubuntu/download . If the issue remains, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

On Thu, 2010-04-22 at 05:57 +0000, Jeremy Foshee wrote:

> This bug report was marked as Triaged a while ago but has not had any
> updated comments for quite some time. Please let us know if this issue
> remains in the current Ubuntu release,
> http://www.ubuntu.com/getubuntu/download . If the issue remains, click
> on the current status under the Status column and change the status back
> to "New". Thanks.
>
> [This is an automated message. Apologies if it has reached you
> inappropriately; please just reply to this message indicating so.]
>

Y'know - we should disable these when the reporter is one of our own
developers ;-)

Yes - this problem still exists ;-)

Scott
--
Scott James Remnant
<email address hidden>

Jeremy Foshee (jeremyfoshee) wrote :

Well then why haven't you fixed it yet in your copious amounts of spare time? :-P
Just Kidding. Thanks for following up. :-)

~JFo

kroq-gar78 (kroq-gar78) wrote :

Since this is for an old bug, should I mark this as invalid?

h1bymask (h1bymask) wrote :

Affects me too.
Linux edge13 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch
Distributor ID: Ubuntu
Description: Ubuntu 12.04.1 LTS

Mem: 3791084k total
Swap: 0k total

System responds very slowly if many applications open. The USB keyboard NumLock LED either responds in 5-10 seconds or blinks rapidly (~5 Hz). Usually the only possible thing to do is to use the SysRq and terminate all processes, than start terminated lightdm. Sometimes the system locks up completely. Sometimes the syslog contains the messages about the OOM. The last time the situation took place, the syslog was containing nothing. The problem with system lags was taking place from 14:31 till 14:35. First 2 messages are a typical syslog spam every ~5 mins.

...
Oct 23 14:30:05 edge13 suhosin[4275]: ALERT - script tried to disable memory_limit by setting it to
a negative value -1 bytes which is not allowed (attacker 'REMOTE_ADDR not set', file 'unknown')
Oct 23 14:35:15 edge13 CRON[4436]: (www-data) CMD ( if [ -f /var/lib/typo3-dummy/typo3/cli_dispatch.
phpsh ]; then /usr/bin/php5 /var/lib/typo3-dummy/typo3/cli_dispatch.phpsh scheduler ; fi)
Oct 23 14:35:42 edge13 kernel: [143394.455337] SysRq : Keyboard mode set to system default
Oct 23 14:36:03 edge13 kernel: imklog 5.8.6, log source = /proc/kmsg started.
Oct 23 14:36:03 edge13 rsyslogd: [origin software="rsyslogd" swVersion="5.8.6" x-pid="4574" x-info="http://www.rsyslog.com"] start
Oct 23 14:36:05 edge13 rsyslogd: rsyslogd's groupid changed to 103
Oct 23 14:36:05 edge13 rsyslogd: rsyslogd's userid changed to 101
Oct 23 14:36:03 edge13 rsyslogd-2039: Could not open output pipe '/dev/xconsole' [try http://www.rsyslog.com/e/2039 ]
Oct 23 14:36:05 edge13 kernel: [143400.221736] init: portmap main process (797) terminated with status 2
Oct 23 14:36:05 edge13 kernel: [143400.221829] init: portmap main process ended, respawning
...

Scott James Remnant, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers