Bogus "Out of Memory" compiz invoked oom-killer

Bug #1811093 reported by Craig
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

This has happened five times in the past week (four times in the past
36 hours):

  kernel: compiz invoked oom-killer:

There is plenty of free core memory and almost no swap in use when it
happens. This is Xenial with kernel version is 4.4.0-141, which
contains fix to previously reported bug #1655842 "Out of Memory"
oom-killer bug. I only have one week of logs to go by, which contain
five occurrences. It has happened four times in the past 36 hours.
Sometimes it happens right after logging into the unity / lighdm
desktop (three out of five times), but not always. In four out of
five occurrences a postgres job was running and oom-killer killed
postgres. In the one other instance, postgres was quietly idle in the
background and oom-killer killed Thunderbird. Kernel messages in
syslog from five occurrences:

compiz invoked oom-killer: gfp_mask=0x24040c0, order=3, oom_score_adj=0
compiz invoked oom-killer: gfp_mask=0x24040c0, order=3, oom_score_adj=0
compiz invoked oom-killer: gfp_mask=0x24040c0, order=2, oom_score_adj=0
compiz invoked oom-killer: gfp_mask=0x24040c0, order=2, oom_score_adj=0
compiz invoked oom-killer: gfp_mask=0x24040c0, order=3, oom_score_adj=0

Additional info:

$> lsb_release -rd
Description: Ubuntu 16.04.5 LTS
Release: 16.04

$> cat /proc/version_signature
Ubuntu 4.4.0-141.167-generic 4.4.162

Tags: xenial
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1811093

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Craig (craig-st) wrote :

compiz:
  Installed: 1:0.9.12.3+16.04.20180221-0ubuntu1

Revision history for this message
Craig (craig-st) wrote :
Revision history for this message
Craig (craig-st) wrote :
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v5.0 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0-rc1/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Craig (craig-st) wrote :

Not sure when it started. I can only go by the logs I have, and they only go back nine days. But the postgres job that I mentioned previously, and which caused several oom-kill's in a row, was last run a month and a half ago, and at that time it ran without any problems. But it may have run without being logged into Unity desktop, not sure. And other random oom-kills may have gone unnoticed.

I'll take a look at installing the upstream kernels you cited. I've never manually installed a kernel from the .deb's before. I guess I'll use https://wiki.ubuntu.com/Kernel/MainlineBuilds?action=show&redirect=KernelMainlineBuilds as a guide. Looks like I'll have to uninstall VirtualBox. If that's all I have to do, and if installing the upstream kernels doesn't generate any errors, I'll report back with findings.

Revision history for this message
Craig (craig-st) wrote :

Have had two or three more instances of this oom-killer behavior. Today we also had an incident where the system really DID run out of memory, but oom-killer did not kick in. I noticed the box being sluggish and disk churning, so I started system monitor and watched it happen. One of the firefox web containers was using up 11+ GB of memory. There was no available core memory, no available swap, and pretty soon desktop froze. Was unable to switch to a non-desktop pty or do anything, disk churning away relentlessly. Waited a little while, and then used the reset button on the box to reboot.

After that happened, we've installed upstream kernel and are running it right now.

$> cat /proc/version
Linux version 5.0.0-050000rc1-generic (kernel@tangerine) (gcc version 8.2.0 (Ubuntu 8.2.0-12ubuntu1)) #201901062130 SMP Mon Jan 7 02:32:47 UTC 2019

So far, unable to induce an oom-killer anomaly.

Revision history for this message
Craig (craig-st) wrote :

No oom-killer event with v5.0 kernel.
Linux version 5.0.0-050000rc1-generic (kernel@tangerine) (gcc version 8.2.0 (Ubuntu 8.2.0-12ubuntu1)) #201901062130 SMP Mon Jan 7 02:32:47 UTC 2019

Possible explanations:

Scenario #1) There was a kernel bug and it is fixed in v5.0. Or,

Scenario #2) Compiz requests a large chunk of memory in certain
situations that did not occur when testing with the v5.0 kernel. All
of the oom-killer events here were compiz. They may not have occurred
when testing with the v5.0 kernel either because testing was limited
in duration, or because of differences between the testing environment
and the production environment. Our testing with the v5.0 kernel was
done at 800x600 resolution, no proprietary graphics driver. The
oom-killer events described in this bug report occurred at 1920 x 1080
resolution with proprietary nvidia graphics driver. There may or may
not be a kernel bug and it may or may not be fixed in v5.0.

Moving forward we will most likely switch production to open source
nouveau graphics driver and/or Xenial LTS HWE and/or fully upgrade to
18.04.1 Bionic.

I am going to leave this bug marked "Incomplete" due to the
differences between our v5.0 kernel test environment and our
production environment.

Attached are some notes that will be useful to me if I ever come back
here.

Revision history for this message
Craig (craig-st) wrote :

So far, no oom-killer problems with Xenial HWE in production. Newer kernel (Ubuntu 4.15.0-43.46~16.04.1-generic 4.15.18). Not sure if it matters, but still using the proprietary Nvidia driver, but it may be a newer version. Same version of compiz (0.9.12.3).

If you do not hear from us further, Xenial HWE channel solved the problem.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.