When DMA is disabled system freeze on high memory usage

Bug #159356 reported by Ohad Lutzky
482
This bug affects 96 people
Affects Status Importance Assigned to Milestone
Gentoo Linux
New
Undecided
Unassigned
linux (Arch Linux)
New
Undecided
Unassigned
linux (Ubuntu)
Incomplete
Low
Unassigned

Bug Description

I run a batch matlab job server here at my lab, running Dapper 6.06 (for the LTS). One of the users has submitted a very memory-consuming job, which successfully crashes the server. Upon closer inspection, the crash happens like this:
1. I run matlab with the given file (as an ordinary, unpriveleged user)
2. RAM usage quickly fills up
3. Once the RAM meter hits 100%, the system freezes: All SSH connections freeze up, and while switching VTs directly on the machine works, no new processes run - so one can't log in, or do anything if he is logged in. (Sometimes typing doesn't work at all)

Note that the swap - while 7 gigs of it are available - is never used. (The machine has 7 gigs of RAM as well)

I've tried the same on my Gutsy 32-bit box, and there was no system freezeup - matlab simply notified that the system was out of memory. However, it did this once memory was 100% in use - and still, swap didn't get used at all! (Though it is mounted correctly and shows up in "top" and "free").

So first thing's first - I'd like to eliminate the crash issue. I suppose I could switch the server to 32-bit, but I think that would be a performance loss, considering that it does a lot of heavy computation. There is no reason, however, that this should happen on a 64-bit machine anyway. Why does it?

WORKAROUND: Enabling DMA in the BIOS

Revision history for this message
lcampagn (luke-campagnola) wrote :

I can confirm this in general for every linux distribution I've ever used. Any time I have a process that is both using 100% CPU and eats up memory, the system becomes unusable as soon as it starts using swap. At this point the hard drive starts thrashing and X slows to a crawl (the pointer updates maybe every 30 seconds). My only options at this point are to 1) hope the program finishes and gives some memory back, 2) wait for swap to fill completely so the kernel will kill the program, or 3) reboot the computer. The latter option is usually 5-10 minutes faster. I think this is a very meaningful bug report, and one that I'd love to see some attention given to, although I have no real idea what the solution might be. The only workaround I've found is just to disable swap completely (I'll bet your swap just wasn't enabled on your 32-bit box?).

Of course it's expected that things will perform badly when the system is out of memory, but it's pretty rediculous that as soon as RAM is full there aren't even enough resources for me to get to a console, log in, and kill the program myself. It seems to me that if one program is spending all of its time writing swap pages, there should at least be plenty of CPU left over for me to operate the mouse, so it seems like there's something else going on that causes the system to crawl..

So the question is: can we come up with a reasonable fix for this problem, or do we just accept that any runaway process can crash the machine? For the time being, I'm happy running swapless.

Changed in linux-meta:
status: New → Confirmed
Revision history for this message
Ohad Lutzky (lutzky) wrote :

I was able to resolve my problem by enabling DMA in the BIOS - the machine runs very well under high load now.

Revision history for this message
Ulrich Lukas (ulrich-lukas) wrote :

Sorry, I know this is an older bug report. Because I frequently have the problem (even with 8.10 Intrepid): Ohad, which DMA setting do you mean? For the harddisk?

Revision history for this message
Ulrich Lukas (ulrich-lukas) wrote :
Revision history for this message
mad_m (retomusci) wrote :

I guess my problem is related to this one. I also run a very memory-consuming batch matlab job (through a perl script). The job seems to run without problems when I am still working on my computer. But as soon as a scrrensaver is starting the whole system freezes. I have a frozen screensaver on my desktop and I there is no reaction to the mouse or the keyboard. Even the power button on the front side of the computer is not reacting and the only way to restart the system is to switch off the power on the back of the computer.
I can awoid that the ystem is going into the screensaver mode, but the problem is still annoying.

Revision history for this message
Tom Fields (udzelem) wrote :

I'm pretty sure this is the same but as reported in BUG #283420.

There is a very small, very simple C++ utility posted on that thread that can be used to reproduce the system.

The steps to reproduce are:

1. compile the binary: "g++ memory_overcommit.cc -o memory_overcommit.bin"

2. Start "top" (or any other utility that displays memory/swap/cache usage) and switch to memory centered view (key "M", i.e. uppercase M).

2. run the compiled binary: ./memory_overcommit.bin

3. Eat up the memory using the tool. You can start with chunks of (a few) hundred MiB, but - this is important - when approaching the value of free memory (i.e. when the free memory is reduced to smaller amounts) reduce the chunk size and /slowly/ approach the limit and wait a few seconds between each chunk of a few tens fo MiB. If the memory is allocated in too big chunks, the process gets killed correctly by the kernel OOM-killer (usually after a system stall of several seconds)!

This bug can be observed especially when slowly approaching the hard limit of system memory, and with slow SWAP media (notably encrypted swap).

IMO, this is even a security-issue, because it allows a total DOS in a multiuser environment.
Has anyone observed the Linux kernel mailinglist (LKML) above this matter?

Link to C++ utility (see other bug report):
http://launchpadlibrarian.net/23372552/memory_overcommit.cc

Revision history for this message
Andy Whitcroft (apw) wrote :

This is not a bug in the linux-meta package, moving to the linux package.

affects: linux-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
Timothy Pearson (kb9vqf) wrote :

This has also happened to me on many occasions. I have a Kubuntu 9.04 server at the office that has 24GB of main memory, so there is no swap space defined. When an application goes haywire and takes all the main memory (e.g. VMWare), the offending process is never terminated, the mouse pointer locks up, and all attempts at a remote SSH session fail--the server has to be forcibly restarted.

This is a critical bug!

Revision history for this message
Ulrich Lukas (ulrich-lukas) wrote :

Timothy, do you use a stock Ubuntu kernel or a self-compiled kernel?

If your kernel doesn't use any Ubuntu specific patches,
perhaps you could help speed things up a little by posting your software setup and the problem on the Linux kernel mailinglist.

Revision history for this message
Rocko (rockorequin) wrote :

I found during testing yesterday that I *don't* get this freezing problem when swap is hit with the i386 kernel, only the 64 bit kernel, which is a real pity since 64 bit is supposed to give better performance.

The 2.6.30.4 kernel I'm using is a big improvement over the Jaunty 2.6.28 kernel, which froze completely, but under the right conditions, 2.6.30 can still freeze X so badly that it takes a good 5 minutes just to login using ssh (and then killing the offending process takes another 5 minutes).

I tried turning swap off, but the system really doesn't like it when it runs out of memory.

Also see http://bugzilla.kernel.org/show_bug.cgi?id=13205.

So is this bug about swap or not? The reporter indicated swap wasn't being used, but some subsequent comments have been related to swap.

Revision history for this message
pureblood (freeseek) wrote :

I have had this problem for many years now. I am glad to see that other people are complaining about it. Let me add that at first I have tried to buy more RAM and disable swap, as it seemed that hard drive usage was what killed the machine. Nevertheless, as soon as I do something wrong in matlab, the system freezed really bad. Given a few seconds/minutes/hours, matlab eventually get killed automatically but the loss in productivity is tremendous. I have never understood what is going on, as I have no swap. Also, if large chunks of memory get requested, matlab issues an out of memory statement, but what seems to freeze the machine are little subsequent request of small amounts of memory. Something in the kernel is plain wrong to allow a process to eat up all of the resources. In my personal opinion, you will always regret eventually to have chosen to save money on RAM relying on swap, because of misuses like this. But I hate that something is still crossing my choice of not having swap somehow. I would pay to know exactly what is going on. I use ubuntu lucid with the 2.6.32 kernel.

Revision history for this message
Ulrich Lukas (ulrich-lukas) wrote : Re: [Bug 159356] Re: System freeze on high memory usage

After these years, I would suggest a bug report on the Linux kernel
mailing list or bugzilla. I don't think we can expect any help from the
Ubuntu kernel team.

Revision history for this message
Noto Yota (info-notoyota) wrote : Re: System freeze on high memory usage

I had this bug using Ubuntu Maverick (64 bit) on a AMD64 x2.
I had it installed using the wubi installer so probably the performance of the swap-file on a NTFS partition is too slow.
It crashed all the time on me compiling any larger project (compiling android 2.2 was the last thing i was doing).

However, it seems that the problem is solved using the MEM=nopentium addition to the bootcommand.
Maybe this helps.

Regards,
Patrick

Revision history for this message
Noto Yota (info-notoyota) wrote :

It seemed to be better with the MEM=nopentium option, however it still crashes :(
I can't even get the compilation job to finish.

Revision history for this message
Nuno Sucena Almeida (slug-debian) wrote :

I have this same problem with lucid when using an encrypted swap partition on a 64 bit system.

If a process allocates most of the RAM, the system becomes unresponsive and if within X I have no other option but to force a power cycle.

ctrl+alt+sysrq+f (to invoke the oom killer) doesn't seem to do its job, although the system is not completely frozen, since sysrq still display the help messages (ctrl+alt+sysrq+h) and I see some disk activity.

Trying with a regular swap partition (not encrypted) works fine, so I assume there's some problem with the kcryptd from the linux kernel.

Revision history for this message
LexRiver (lexriver) wrote :

Same bug. System freeze swapping/reading/writing something to disk when high memory. Even cursos is not moving. I've tried different partitions for swap, increasing swap partitions, etc. Maybe it is caused only after sleep/wakeup computer. I'm using Ubuntu 10.10 AMD64.

Revision history for this message
Uhurusurfa (chrisbroderick) wrote :

Same problem. Disk I/O will take off and stop all other processing. Can sometimes run for 2 or 3 minutes then clear itself but sometimes never comes back and requires hard restart. Seems particularly bad on 10.10.

Revision history for this message
Excalibur (dev-arthurk) wrote :

Yes, this is a really obnoxious bug. It's been around longer than Ubuntu has (I've been using Linux since 1999). It seems to me to be a completely unacceptable behavior to have the entire operating system grind to a halt and freeze due to too much RAM being in use.

I have 8GB of RAM and swap TURNED OFF, yet this still happens where I'll have a lot of work tasks and windows open, then all of a sudden the system freezes with the HDD stuck in a fury of swapping, with no other recourse than ALT-SysRQ-SUB emergency shutdown (or hitting the power button).

An issue of this severity being left unsolved for years is one of the things that irks me as a fanatical Linux evangelist. My mother runs Linux and the fear of her having issues like this keeps me up at night...

Revision history for this message
adam jvok (ajvok1) wrote :

Same problem in 11.04 with AMDx64.

I have encrypted swap turned on.
On high mem use, the system freezes (completely: local keyboard/mouse/screen & remote ssh) and the only solution is a hard reboot.
Ugly.

I have 6Gb RAM and similar size swap partition.

If I turn encrypted swap off and use a regular swap partition instead, the test program below works fine (it seg faults when all memory is used as expected). But, with encrypted swap on, the whole system freezes up before the program ends.

I hope that helps locate the problem.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main(int c,char** argv) {
  long l=1073741824; // 1 Gig.
  for (int i=0;i<10;i++) {
    printf("round %d\n",i);
    char* p1=(char*)malloc(l);
    if (p1==NULL) printf("Malloc failed\n");
    printf("try memset...\n");
    memset(p1,0,l);
    printf("memset ok\n");
  }
}

Revision history for this message
Michael Baudino (gornack) wrote :

My computer is freezing like this several times a day (usually for 1 to 5 minutes), and I'd love to know what's going on with this bug... Any news, anyone ?

Revision history for this message
Romain Grandchamp (romain-grandchamp) wrote :

Same bug on Ubuntu 11.10 64 with Intel i7 and 8Go of Ram.

Revision history for this message
Guillaume (guium) wrote :

Same bug with Ubuntu 10.04 LTS Lucid Lynx 64. This is very very boring ! (Workstation with 8 processor and 32Go of RAM)

Revision history for this message
rbhkamal (rbhkamal) wrote :

wow, when I started searching the net I didn't expect to find an open bug from 2007.

Same here, I have 16 GB of ram and 32GB in a swap file (not a partition). I have 6 1TB SCSI drives in raid 5 and I get about 250MB/s write through put and double that in read. So I expect my swap file to be fast enough to use a least a few gigs of it. However, 1.5GB into the swap, X stops working. After using 2GB of swap, ssh stops working... and the system never recovers.

Anyway, its seems like no matter how fast is your swap, the system will crash or freeze.... and just for the record, Windows will happily use all of my swap (pagefile) without freezing or killing any processes. So it's not a hardware issue.

Revision history for this message
Nuno Sucena Almeida (slug-debian) wrote :

I'm running 11.10 now with the stock kernel (at the moment 3.0.0-16) and never had any more (weird) problems, so you might want to give it a try by upgrading your system or try with a newer kernel.

Revision history for this message
Marian (nemo-ikkoku) wrote :

Just reporting that I'm experiencing this behavior too.

As soon as a program tries to allocate much memory on the heap and the system starts swapping, everything freezes
and I can't do anything but power-down.

Specs: 8GB of RAM, Ubuntu 12.04 amd64, linux 3.2.0-20-generic, encrypted swap and root fs.

Revision history for this message
penalvch (penalvch) wrote :

Marian Tietz, please execute the following via the Terminal and feel free to subscribe me to it:
ubuntu-bug linux

Thanks!

Revision history for this message
penalvch (penalvch) wrote :

Ohad Lutzky, thank you for reporting this and helping make Ubuntu better. Dapper server reached EOL on June 1, 2011.
Please see this document for currently supported Ubuntu releases:
https://wiki.ubuntu.com/Releases

We were wondering if this is still an issue in a supported release? If so, could you please capture the oops following https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Capturing_OOPs ? As well, can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command in a supported release from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux <replace-with-bug-number>

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.

Please let us know your results. Thanks in advance.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: dapper needs-upstream-testing
description: updated
Revision history for this message
Jakub Fiľo (plantroon) wrote :

Same problem. This was terrible. I was waiting 4 hours while my PC went back into normal. I was in risk of loosing important data, so I could not reboot and running Windows 7 virtual machine caused the freeze (I didnt realise I dont have enough RAM, but SWAP was empty). When PC went back into normal state, Windows 7 BSODed but Ubuntu was still alive. It was still extremely laggy. I opened up system monitor to see whats going on: SWAP was 50% full and RAM also 50% full. There was a huge disk activity because swap was unloading (I think it was transfering data to RAM).

I hope this will get fixed soon. Whats weird I did not expect these problems on 32 bit version of Ubuntu. I also had RAM and swap full but everything went just OK.

Sorry for my English if anything is wrong. I also have a question if there is any reason why swap should be encrypted? Isnt it like encrypting RAM?

Revision history for this message
Helio Tadao Goto (tadaog) wrote :

For me it occurs with a 8 GB plain swap file in my 4GB notebok from the beginning since Ubuntu 10.04 until 12.04 LTS 64-bit with 3.2.0-27-generic kernel. It's incredible!

Revision history for this message
penalvch (penalvch) wrote :

Jakub Fiľo / Helio Tadao Goto, could you please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see https://help.ubuntu.com/community/ReportingBugs#Bug_Reporting_Etiquette . If you do file a new report, please feel free to subscribe me to it. Thank you for your understanding.

Helpful Bug Reporting Links:
https://help.ubuntu.com/community/ReportingBugs#A3._Make_sure_the_bug_hasn.27t_already_been_reported
https://help.ubuntu.com/community/ReportingBugs#Adding_Apport_Debug_Information_to_an_Existing_Launchpad_Bug
https://help.ubuntu.com/community/ReportingBugs#Adding_Additional_Attachments_to_an_Existing_Launchpad_Bug

Revision history for this message
NickNackGus (nicknackgus) wrote :

Same issue, Ubuntu 10.04.4 LTS 64-bit. 6GB RAM, 40GB Swap. DDR2 RAM to be specific. That doesn't affect the bug, it just means buying new RAM costs as much or more than a new motherboard!

Seriously, though? A bug that's been around since at least 1999, and still hasn't been fixed? 64-bit has been around a while, I wouldn't be surprised if 128-bit comes out in the next 5-10 years, so why don't we have an old 64-bit bug fixed?

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
NickNackGus (nicknackgus) wrote :

Almost forgot, here's a screenshot of my RAM and swap usage. Cached should be the first thing to go to swap, followed by buffered if more room is needed. What do you know? Over 50% of my RAM is used by what SHOULD be in swap. And with the sort of stuff I do on my computer, I'd say maybe even 75% of it should be in swap. I've got multiple instances of Firefox, Banshee, Audacity, GIMP, and LibreOffice open under typical work loads. This is just background tasks and my backup utility running.

Revision history for this message
penalvch (penalvch) wrote :

NickNackGus, could you please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see https://help.ubuntu.com/community/ReportingBugs#Bug_Reporting_Etiquette . If you do file a new report, please feel free to subscribe me to it. Thank you for your understanding.

Helpful Bug Reporting Links:
https://help.ubuntu.com/community/ReportingBugs#A3._Make_sure_the_bug_hasn.27t_already_been_reported
https://help.ubuntu.com/community/ReportingBugs#Adding_Apport_Debug_Information_to_an_Existing_Launchpad_Bug
https://help.ubuntu.com/community/ReportingBugs#Adding_Additional_Attachments_to_an_Existing_Launchpad_Bug

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Helio Tadao Goto (tadaog) wrote :

After trying changing all elevator options, and many suggestions out there, I've checked irqbalance. In /etc/default/irqbalance, I've changed in my Ubuntu 12.04 LTS 64-bit notebook, ENABLED from 0 to 1, as follows:

root@XXX4:/etc/default# cat irqbalance
#Configuration for the irqbalance daemon

#Should irqbalance be enabled?
ENABLED="1"
#Balance the IRQs only once?
ONESHOT="0"

All freezes stopped (of course when swapping occurs is definitely slower than RAM). Only occasional sloppiness in the mouse/keyboards remained at sustained swap operation moments.

My notebook is an Lenovo IdeaPad G530, with a CPU Intel Dual Core T3400 2.16GHz with 4 GB RAM.

Revision history for this message
Bernd Kreuss (prof7bit) wrote :

After having been affected by this for many years now and also observing that it seemed to get even worse with every new version (entire system freezing with heavy disk-IO as soon as it touches the end of the RAM, even ctrl+alt+f1 taking 5 minutes until console is switched, having to reboot multiple times per day as soon a firefox came aross a resource hungry website) today I tried again to find out whether this could be improved by tuning some of the various vm parameters and I might have finally found something: I have put the following line into my /etc/rc.local

sysctl vm.vfs_cache_pressure=100000

and since I did this and after a clean reboot today I have not had these immense problems anymore. This is the only vm parameter I have changed.

I have only 500MB of RAM on this laptop (running xubuntu 12.04) and here the problem could be easily reproduced: When browsing the web with firefox while already close to the limit of available RAM the memory usage sometimes can quickly spike by another 100 MB within milliseconds and then immediately the *entire* machine would have frozen. A few years ago a ctrl+alt+f1 would still have reacted within maybe 10 seconds (on the same laptop) but nowadays it seemed to take 5 minutes or didn't notice the key press at all sometimes.

After I set vfs_cache_pressure to 10000 (default is 100) I am not able to reproduce this extreme behavior anymore. Now it just starts using swap, applications becomimg slower but the complete freezing is almost gone, mouse pointer keeps being movable and my keyboard shortcuts to emergency kill some of my usual suspects still work in reasonable time (It almost seems I don't even need them anymore at all now).

Revision history for this message
Ivan Danov (danov) wrote :

I have experienced the same bug. I have re-reported the bug, as needed. The new bug report is at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1162073

Revision history for this message
Maxim Imakaev (mimakaev) wrote :

I have experienced the same issue since I started using Ubuntu 8.04. Often one of the users will run a program which would take a lot of memory, and this would completely freeze graphics and make all text terminals very slow. Sometimes I run a code which accidentally allocates more than my 32GB RAM, and if I don't hit Ctrl+C within 1-2 seconds, it will never even go through in gnome-terminal. The only solution would be to SSH in to the machine and kill the process manually.

In some cases even ssh or console login do not help, and I have to wait for hours or reboot the machine.

I am surprised that there is no way to save 50-100 megabytes of RAM free for a root to log in and kill the process; it would be very helpful, especially when rebooting the computer is not possible.

Revision history for this message
Soo (soowjo) wrote :

I also encounter the same problem on 14.04 LTS. With 10+ windows open, the system becomes frozen unexpectedly. Once it's frozen, it either becomes totally unresponsive such that nothing works except for the long-press power button or minimally responsive so that I can shift to terminal using CTRL+ALT+F1. But then, the terminal is so slow as well and I eventually turn the system off with power button.

Revision history for this message
asgard2 (kamp000x) wrote :

same problem with Ubuntu 14.10 and 3.16.0-34-generic.

Revision history for this message
penalvch (penalvch) wrote :

asgard2, it would help immensely if you filed a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

penalvch (penalvch)
Changed in linux (Ubuntu):
importance: Undecided → Low
information type: Public → Public Security
information type: Public Security → Private Security
information type: Private Security → Public
Mihai b (themihai)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in fedora:
importance: Unknown → Medium
status: Unknown → Confirmed
36 comments hidden view all 116 comments
Revision history for this message
lou (louvee) wrote :

Fedora 27, fully affected.

New to linux a couple of years ago, I decided to try 'live' versions on two Win 7 laptops (4GB Ram each).

Debian 8.6 first, then 8.7, 9.2 and 9.3. Ubuntu 14,15,16
Fedora 26, currently 27
DE's: Xfce, GNOME, Unity, Mate, Cinnamon, always the same problem.
(XFce hold up the longest; less memory intensive)

Manifested when multiple tabs are opened on the web browser (10,15,20,25, depending on the browser and version and how memory intensive it is):
FF52 up to 61 developers (the newer ones eat up memory faster)
Chrome, Chromium.

I banged my head against the wall for 1.5 years before I stumbled upon this thread, thinking it was bad memory (many days of running memtest), or other hardware (but in both laptops? Couldn't be).

The system will suddenly SEIZE up if you're close to memory capacity. If I am close, and I take my eyes off the USB drive for an instant and it begins to flash non-stop, once it goes beyond 10 seconds I likely cannot drop to the console I keep opened to kill the Firefox ps. Maybe it will respond after an hour, 2 or 4 hours, usually not. Time to power off.

Now I keep gnome-system-monitor opened and watch the memory approach 96% and restart the browser, and in the case of Debian/Ubuntu on X11, restart gnome-shell (which has a major leak problem also).

On one laptop I did make a separate partition for Deb 8.7 with a swap space. Same thing happens. I will observe the HD light come on and stay solid. That's the end.

I was shocked to find this bug and that it has existed for more than 10 years.

It makes this almost an nonviable platform and I'm loving Linux otherwise.

Sorry for the rant, there was 1.5 years of frustration built into it.

Thanks for all the work the dev's do, I know it aint easy.

Revision history for this message
lou (louvee) wrote :

..Forgot to add, the Fedora 27 live version is running on one laptop in which I've upgraded the RAM to 8Gb. It has an (older version) Core-i5 in it for those curious and runs developer version of FF (60).

It has happens here with gedit with 8 files opened, 2 file manager windows and 8 tabs, 3 or 4 xterms, Signal app, vlc occasionally playing audio.. that's the base.

# uname -a
Linux localhost-live 4.13.9-300.fc27.x86_64 #1 SMP Mon Oct 23 13:41:58 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Excuse me while I restart the browser, I'm at 97%. (maybe logout and HUP gnome-shell too from the console because I'm starting off low on memory already)

Revision history for this message
jecks (slayerproof32) wrote :

System crashes on high mem usage

Steps to reproduce
Open you fav browser
Open a Youtube video and 4 tabs of google docs
Check system monitor for ram usage
Wow. 98.8% and only 10% swap used???
Open one more tab
System crash, have to reboot by holding power button

Expected result
The memory is freed by swapping more files onto my swap partition

Tested bug on
-Opensuse Leap 15.0
-Fedora 28
-Ubuntu 18.04
-Manjaro Linux (latest version)

Specs
-CPU core 2 duo e6400 (usage is not high when the system crashes)
-2.9 (3gb) Elpida ddr2 ram 533 MHz
-Gnome 3.26 and 3.28
-Ati x1300 Gpu
-3.0 gb swap partition

Open a few more google docs tabs and crush my other computer with
-CPU i5-520M
-3.9 (4gb) Ram
-Gnome 3.28
-Intel Igpu
-4gb swap

Revision history for this message
mostafa741 (mostafasaid-2008) wrote :

still happens with Fedora 28

Changed in fedora:
status: Confirmed → Won't Fix
Revision history for this message
Taddeo Manzi (sinistristradali) wrote :

"Won't fix"? This bug does not permit me to zip big files.
"Sorry mom, you can't zip those files, GNU/Linux thinks your pc hasn't enough power to do it, best coming back to windows, where ram filling up doesn't kill your system."

Revision history for this message
jecks (slayerproof32) wrote :

Sitting here with a whole bunch of tabs open in YouTube. Memory used is 3.5 out of 3.7. Swap is now 760 megabytes. This is after my system has been left untouched for about 10 mins now. CPU usage is anywhere from 30 to 60%. It unfroze about a minute ago, but is still super slow, and it refroze when I tried to click “processes” in the system monitor app.

penalvch (penalvch)
summary: - System freeze on high memory usage
+ System freeze on high memory usage when DMA is disabled
no longer affects: linux (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote : Re: System freeze on high memory usage when DMA is disabled

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
penalvch (penalvch)
affects: linux (Arch Linux) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: New → Incomplete
no longer affects: linux (Ubuntu)
affects: fedora → ubuntu
Changed in ubuntu:
importance: Medium → Undecided
status: Won't Fix → New
importance: Undecided → Low
status: New → Incomplete
affects: ubuntu → linux (Ubuntu)
Revision history for this message
penalvch (penalvch) wrote :

jecks (slayerproof32), this report is poorly defined, and not root caused as no debugging logs have been provided by the original reporter. However, what little is known is that the issue for the original reporter was correlated to when DMA was disabled in the BIOS, using a now old, and unsupported version of Ubuntu. Hence, if you would like your issue root caused and resolved, you will want to file a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

Revision history for this message
marco (nazgul17) wrote :

Hi Christopher,

How does one go about providing debugging information?
Please consider the limitations of an environment that does not respond to commands.

Thank you!

Revision history for this message
jecks (slayerproof32) wrote :

I’d say you probably want to start with these 4 commands and posting the results
Lspci -v
Inxi -Fzxc0
dmesg > dmesg.log (will save to ~/dmesg.log)

top -b > top.log (run this command, then trigger the memory issue. It will save to ~/top.log. If you have to hard reboot, the file will still be saved)

Revision history for this message
jecks (slayerproof32) wrote :

Consider also posting the output of
Sudo parted -l

Here are my results if these commands. Iam running top right now and will be posted soon.

https://pastebin.com/8q75EXyz 1 – Inxi-Fzxc0
https://pastebin.com/mnj4Bamu – Lspci -V
https://pastebin.com/KqpKuCa3 1 --Dmesg

Revision history for this message
penalvch (penalvch) wrote :

jecks (slayerproof32), it is most unhelpful to post comments or attachments here, or give pastebin links. If you want to motivate developers to want to look into your issue, file a new report. If you want developers to ignore your problem, continue posting here.

marco (nazgul17), one may always file a report manually via https://bugs.launchpad.net/ubuntu/+source/linux/+filebug . For more on this, please see https://help.ubuntu.com/community/ReportingBugs .

Revision history for this message
kd4ua506I9uzkaa (kd4ua506i9uzkaa-deactivatedaccount) wrote :

The attached kernel patch (applied on top of 4.18.5) that I've tried, almost completely eliminates the disk thrashing(the constant reading of executable(and .so) files on every context switch) associated with freezing the OS and so, with this patch, the OOM-killer is triggered within a maxium of 1 second when it is needed, rather than, without this patch, freeze the OS for minutes(or just a long time, it may even auto reboot depending on your kernel .config options set to panic(reboot) on hang after xx seconds) with constant disk reading well before OOM-killer gets triggered.

More info as to why OS freezes with constant disk reading when running out of RAM, here: https://unix.stackexchange.com/a/463469/306023

The patch is from inside this question: https://stackoverflow.com/q/52067753/10239615

Note: I don't recommend using this patch in production, without actual programmers saying that it's ok first. I'm not a programmer. Maybe some programmer could improve it?

I am using the patch currently inside a Qubes OS R4.0 Fedora 28 AppVM where I can easily reproduce freezing the VM's OS (+ the constant disk thrashing, seen from `sudo iotop` in dom0) without the patch, while attempting to compile firefox with 4000MB max RAM setting for the VM, and with the patch basically no freezing and little to no disk thrashing during the 1 second it takes for OOM-killer to kill the offending process(es).

Revision history for this message
jecks (slayerproof32) wrote :

Do you think this could eventually be merged into a kernel? Has anyone filed a kernel bugzilla with the patch so we can improve it, and get it merged to the kernel?

Revision history for this message
jecks (slayerproof32) wrote :

It is also possible, since we now know how the bug is triggered, should Marcus open a new report?

Revision history for this message
jecks (slayerproof32) wrote :

This also explains the reason a person with a USB drive was seeing thrashing (constant blinking) when using the fedora live environment with many tabs open.

tags: added: patch
penalvch (penalvch)
summary: - System freeze on high memory usage when DMA is disabled
+ When DMA is disabled system freeze on high memory usage
tags: removed: patch
1 comments hidden view all 116 comments
Revision history for this message
jecks (slayerproof32) wrote :

I noticed improvements involving this bug starting with 4.17. In 4.17, freezes over 2 min are uncommon. Even with over 2 gigs of swap used.

Revision history for this message
lou (louvee) wrote :

Update:

I'd commented in detail about this bug (post 77,78). I run the live versions of Linux on a 4GB Core-i5 laptop (and another 4GB pentium laptop also.)

Just wanted to add:

I've added 4Gb of RAM to the Core-i5 laptop for 8Gb total now. This helps a lot, but it's not a cure.

With Fedora 28, the system will still cease up with maybe 2 dozen (or less depending on what's happening (video, etc) ) FF tabs opened/active.

I came back here to note that, I'm currently using a Live Debian Stretch (9.5).

There are obviously significant differences in the way these variants of Linux manage memory.

Why?

Because under the same system conditions (Gnome, same s/w programs installed and/or running), I can open WAY more tabs in FF on Debian; open more simultaneous programs, without fear of a sudden system heart-attack.

In fact, it is much harder for me to cause the system freeze in Debian, even with approaching 50 tabs opened in FF developer 63...

I understand there are underlying Fedora vs Debian system differences like: systemd vs init, and Wayland vs Xorg, Gnome versions (3.28.1 vs 3.22.5) and kernel revisions (4.16.3-301.fc28.x86_64 vs 4.9.110-1 (2018-07-05) ), but in all, I find Debian WAYYYY more forgiving, and more manageable, ESPECIALLY in light of this FATAL flaw, AND the known Gnome memory leak bug which can easily be remediated for in Debian by restarting Gnome (via Alt-F2, r) to free back up that memory. (The only way to accomplish this in Fedora is to actually log out of your session because of Wayland limitations.)

Anyway, I jut thought it's another data point to add to the mystery.

I still have to keep resource monitor opened even in Stretch, just in case, but I only crashed Stretch once over the past 3 months or so when I was in the 80's (mem % used) and let a video play for 2 hrs without checking up.

Normally anyway, that percentage isn't rising above the 70's in my typical "working" environment.

Finally, I'd like to mention to those asking for logs, etc., for this issue, realize that WHEN this issue occurs, it *is* essentially a heart-attack for the system. There is no recourse, and no way to gather logs. EVERYTHING ceases up- usually never to come back. A hard power-cycle is the only recourse, and NO logs which would shed light on the issue are written. EVERYTHING stops- including log writing.

This is the reality.

I *do* have a few logs from and old (non-live) Jesse 8.7.1 install-- for a few times when, the system did revive, after hours-- and there's nothing in there that would shed light on the issue. The (very) few (interesting/odd) entries in the log that I've researched pointed to no other instances/causes of this same issue.

It would be nice after 11 or 12 years of this issue, if someone higher up and more knowledgeable in the development "food chain" would would simply replicate the issue, it's not really that hard to do so at all.

It honestly is a show-stopper.

Ciao.

Revision history for this message
kd4ua506I9uzkaa (kd4ua506i9uzkaa-deactivatedaccount) wrote :

@lou (louvee)
I did try to ask on the kernel mail list (here https://lkml.org/lkml/2018/9/10/296 ), but got no replies, however the good news is, for me anyway, that I'm still successfully using my own kernel(4.18.9) patch(le9d.patch) , in Qubes OS (both in dom0 and VM kernels).

Here's a copy of it, if you want to give it a test.

tags: added: patch
Revision history for this message
lou (louvee) wrote :

@constantoverride

Thanks for the patch. AS I'm running Live, I can't upgrade the kernel.

It looks like that patch was merged into the mainline kernel at some point since it's reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1577528 , that the bug no longer exists.

I'll have to wait for Debian to rebase a release on 4.17.5 or greater it seems.

Revision history for this message
Tcll (tcll5850) wrote :

guess I may as well join this party...
I use Xubuntu 16.04 on a few machines, and was initially experiencing this issue for some time...

the machines:

compact:
CPU: Intel Atom D2550 1.86GHz
RAM: 1x2GB DDR3 SO-DIMM
swap: 4GB (OS HDD) + 4GB (Flash Drive)

primary: (no longer operational)
CPU: AMD Athlon II x2 2.7GHz
RAM: 2x2GB DDR3
swap: 2GB (inactive HDD) + 2GB (OS HDD) + 4GB (active HDD)

secondary:
CPU: x86 Intel Pentium4 2.8GHz 1M
RAM: 2x1GB DDR
swap: 8GB (OS HDD)

new-primary:
CPU: x64 Intel Pentium4 3.4GHz +HT
RAM: 4x2GB (only 3504MB used) DDR2
swap: 2GB (SATA-SSD) + 2GB (OS HDD)

on the initial issue, all of these machines (except my primary, or my new primary, which I just got) would freeze when the RAM filled up.
this issue seems to have been fixed some time ago by an update after my primary died, where in the case of that one, it would corrupt the HDD when using swap.
but now after a recent update, the issue is back, and my compact is currently unresponsive.
the issue has also happened on my new primary as well, but not for very long and I was able to close a few browser tabs.

regardless, this needs to be fixed again.

Revision history for this message
jecks (slayerproof32) wrote :

The patch was never merged. The big was closed because there was an improvement in 4.17.5, and it seems like many reports like these are just closed, and never fixed. This issue still very much exists in 4.19.
Some observations:
This issue gets more apparent when you have less physical memory (system less likely to unfreeze)
If you have a full swap drive, and full ram, system freezes to it is force rebooted
If your memory is full, occasionally the system will not freezeㅋ
If the system is swapping when not frozen, it writes more to swap then when it is frozen.
If swap is being used, logging in is very slow after waking computer up. (Important processes are being swapped out)
Linux OOM killer is most likely not doing its job.
Windows and MacOS does not freeze when using swap. What is the difference with Linux.

Revision history for this message
jecks (slayerproof32) wrote :

*bug

Revision history for this message
jecks (slayerproof32) wrote :

Please post your kernel versions, and maybe inxi if you expierience this.

Revision history for this message
chernuubyl (chernuubyl) wrote :

Running Xubuntu 18.04. This issue has always been fairly annoying for me since I don't have much ram. Updated to 4.17.5 about a month and a half ago, and did seem to minimise this issue somewhat. It's only totally frozen and required a force shutdown once since, though there is noticeable but temporary freezing sometimes. Improvement, but not totally fixed, yeah.

inxi:
CPU~Dual core Intel Core i5 M 480 (-MT-MCP-) speed/max~1268/2667 MHz Kernel~4.17.5-041705-generic x86_64 Up~22:41 Mem~2920.6/3809.3MB HDD~500.1GB(33.6% used) Procs~249 Client~Shell inxi~2.3.56

Although my cpu is usually throttled to 2.13 GHz or less for the sake of battery life. 5.36gb of swap space which never gets anywhere near used up.

Revision history for this message
lou (louvee) wrote :

I'm posting again. This bug still exists. Easily reproducible. Here's how. (You NEED to be able to essentially use up RAM to see this bug in action)

Tested on a 3GB desktop (Core 2 Quad), running Live Ubuntu 18.10 LTS off of a flash drive (pendrivelinux.com). Kernel 4.18.0-10.

FF 63.0. I set the download folder to a dir on the hard drive so as not to deliberately stress free RAM.

With the system showing about 1.5GB free (System Monitor/Resources tab), trying to d/l this 1.25GB ROM image from Mega ( https://mega.nz/#!KUAyRKjJ!3hALO7dkuyFdE41BTWf1OfHaZmdTA-Kzd8q0HYiMbYs ), the d/l gets to 100% but system monitor shows RAM at 97% or 98%, the flash drive lights up and stays lit. System frozen, as I've reported previously with my other laptops.

I've mitigated this on the laptops because they now have 8GB of RAM. BUT-- even then, I can STILL crash those systems using Live Debian (or whatever flavor). It just takes more stressing (more open tabs, bigger d/l's, whatever) to get there, but it does.

Quite the blackpill that Linux memory management has the serious flaw un-addressed for 13 years now.

It's very easy to reproduce on *any* 64-bit system following the guidelines above.

Revision history for this message
Keith Hutton (keith5001) wrote :

Try increasing vm.min_free_kbytes

I have two memory eater programs, one uses 1Mb steps and the other 1Kb steps. When I run the 1Mb, the oom killer cuts in. When I run the 1Kb program the oom does not.
If I raise the vm.min_free_kbytes = 170000-250000 both programs are terminated, no system freeze.

Revision history for this message
Keith Hutton (keith5001) wrote :

Further, to my comment above, if I enable swap, the program continues to run using up all the swap, and then terminates without a system freeze. The program pauses for about 1/3 second intervals as swap gets used. The higher vm.min_free_kbytes the pauses are less. I have a third memory eater and I managed to get up to 66 Gb virtual memory allocated. My old laptop (9 years) has only 2Gb Ram.

Revision history for this message
lou (louvee) wrote :

Which distro are you testing on?

Any Debian based (past 4 yrs at least), Arch, Fedora-- OOM does NOT kick in. Try it on a bootable USB instance as I suggested. No swap there at all.

Changing min_free just makes the freeze occur earlier. (also depends on which DE. Gnome/Unity is the worst offender..

Revision history for this message
Keith Hutton (keith5001) wrote :

I am running on a ~9 year laptop with 1866MiB ram and 3933MiB swap.
Lubuntu installed. Linux ub2 4.18.0-16-generic #17-Ubuntu SMP Fri Feb 8 00:06:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
So my problem is for example, having two different browsers open with too many open browser tabs (4-6) on each caused a high memory load with system freeze. I noticed that swap was not being used in a why that I think it should be, that is a general slow down as more and more swap was used. Instead I got instant freeze for 30 minutes or more. So I found a program called munch.c and played around with it. I also created different versions of munch.c so memory usage was different. I was able to change system vm settings (in /proc/sys/vm) so the freeze stopped and swap started to be used correctly so that real memory and virtual memory became balanced. I define "balanced" to mean as more swap is used the system gradually becomes slower without freezing. In fact in the beginning the Linux-memory-swap system was very poor (default settings) and a Microsoft Windows 7 machine with same memory far far better at memory-swap management. My conclusion (for now) is that by altering default settings the Linux system became much better because the freeze bug was eliminated. I turned on the SysRq feature to enable the (f) option.
I found I could make the SysRq (f) work straight away. Also found the automatic OOM now kicked in as well, if I waited a bit.
I will add more details to this at a later time.

Revision history for this message
Dima (dima2017) wrote :

Just a little note. Probably it is not DMA related issue as it entitled. I found it happening with zram swap too.

Revision history for this message
Dima (dima2017) wrote :

I've tried to set vm.min_free_kbytes to 6% of my total physical RAM, divided by the number of cores as it is said there https://askubuntu.com/questions/41778/computer-freezing-on-almost-full-ram-possibly-disk-cache-problem, but I decided to keep vm.swapiness at its default (60), because I was going to tune up zram.
And then I had installed zram-config package, opened /usr/bin/init-zram-swapping file with my text editor and had changed this string
mem=$(((totalmem / 2 / ${NRDEVICES}) * 1024))
to this:
mem=$(((totalmem / 2 / ${NRDEVICES}) * 5 * 1024))

Then I had rebooted and tested this all with "stress -m 4 --vm-bytes=1G".
This all was happening on my one core 2Gb x86 Lubuntu 16.04 notebook. Seems it works.

Revision history for this message
Dima (dima2017) wrote :

So as a conclusion for my old 2Gb laptop, I've set zram to twice of amount of physical RAM (/usr/bin/init-zram-swapping), and have added this parameters to /etc/sysctl.d/99-sysctl.conf:

vm.min_free_kbytes=128000
vm.overcommit_memory=2
vm.overcommit_ratio=90

And seems it work for me. And I have to say that I don't have physical swap and have /tmp and /var/tmp mounted to ram disk.

Revision history for this message
Dima (dima2017) wrote :

The editing of overcommits makes chromium crash. So I don't know

Revision history for this message
lou (louvee) wrote :
Revision history for this message
Simon Pugnet (qw-simon) wrote :
Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
kd4ua506I9uzkaa (kd4ua506i9uzkaa-deactivatedaccount) wrote :

the initial patch I've attached/tried (in #89) was crap, but now I'm successfully using this one le9h.patch (attached)

also here: https://gist.github.com/howaboutsynergy/04fd9be927835b055ac15b8b64658e85

there are similar patches from years ago made by some devs, if you follow those urls (mentioned in patch) you can see them

Either way, this will prevent Active(file) to get lower than the set value, more or less. As a workaround is good enough for me. (far better than my initial attempt)

Revision history for this message
Tong Zhang (lzto) wrote :

Unbelievable!
I have no idea why my system with 32GB of memory is still haunted by this dumb issue.
The ssh got completely unresponsive, even with a physical monitor and keyboard attached to the system, when this happens and the only solution to get back control is a hard reset.
Man.. if anyone from upstream cares about this issue at least give user a prompt to kill the problematic process manually.
Rendering the full system unresponsive like this is unacceptable.

Revision history for this message
otiskujawa (otiskujawa) wrote :

Remote Debian server on 5.10.0-23-amd64 x86_64 kernel after running memory-hungry job.
Swap enabled, 16GB of ram, Intel Xeon W3550.
System hangs without any warning.

Displaying first 40 and last 40 comments. View all 116 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.