trusty kernel crash in OOM killer related to cgroups

Bug #1592429 reported by Moritz Mühlenhoff on 2016-06-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Kamal Mostafa
Trusty
Undecided
Kamal Mostafa

Bug Description

The trusty kernel can crash if the OOM killer kills processes which have reached a memory limit imposed by cgroups. More details can be found in this blog post:

https://community.nitrous.io/posts/stability-and-a-linux-oom-killer-bug

The upstream patch series which ultimately landed in 3.14 is available at:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?id=4d4048be8a93769350efa31d2482a038b7de73d0&qt=range&q=9853a407b97d8d066b5a865173a4859a3e69fd8a...4d4048be8a93769350efa31d2482a038b7de73d0

One of those four (0c740d0afc3bff0a097ad03a1c8df92757516f5c) is already part of the trusty kernel.

CVE References

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1592429

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Changed in linux (Ubuntu):
status: Incomplete → In Progress
assignee: nobody → Kamal Mostafa (kamalmostafa)
Changed in linux (Ubuntu Trusty):
status: New → In Progress
assignee: nobody → Kamal Mostafa (kamalmostafa)
Kamal Mostafa (kamalmostafa) wrote :

Moritz, here is a test kernel with the three needed oom_kill patches* applied. Please confirm that it resolves the problem:

http://people.canonical.com/~kamal/lp1592429/

*[mainline]
4d4048b oom_kill: add rcu_read_lock() into find_lock_task_mm()
ad96244 oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()
1da4db0 oom_kill: change oom_kill.c to use for_each_thread()

Kamal Mostafa (kamalmostafa) wrote :

Same kernel, but rebuilt with the complete set of .debs:
http://people.canonical.com/~kamal/lp1592429.2/

Moritz Mühlenhoff (moritz-4) wrote :

This looks good to me: We have been running the test kernel on nine production servers for more than a week now and the systems are stable (while previous we'd seen three kernel hands over the course of 48 hours).

Kamal Mostafa (kamalmostafa) wrote :
Chris J Arges (arges) on 2016-06-27
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Kamal Mostafa (kamalmostafa) wrote :

Moritz, its understood that this can only be "verified" as a long-term stability issue. Please install the -proposed kernel on some of your servers as a smoke test, and we'll consider if verified if you don't see problems in the next few days.

Moritz Mühlenhoff (moritz-4) wrote :

I've installed the 3.13.0-92.139 kernel from trusty-proposed on eight production systems. I'll let you know by Monday.

Kamal Mostafa (kamalmostafa) wrote :

Per Moritz, the problem has not occurred on the production systems running the -proposed kernel.

tags: added: verification-done-trusty
removed: verification-needed-trusty
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-92.139

---------------
linux (3.13.0-92.139) trusty; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1597060

  [ Josh Boyer ]

  * SAUCE: UEFI: acpi: Ignore acpi_rsdp kernel parameter when module
    loading is restricted
    - LP: #1566221
  * SAUCE: UEFI: efi: Make EFI_SECURE_BOOT_SIG_ENFORCE depend on EFI
    - LP: #1566221
  * SAUCE: UEFI MODSIGN: Import certificates from UEFI Secure Boot
    - LP: #1566221, #1571691
  * SAUCE: UEFI: efi: Disable secure boot if shim is in insecure mode
    - LP: #1566221, #1571691

  [ Matthew Garrett ]

  * SAUCE: UEFI: Add secure_modules() call
    - LP: #1566221
  * SAUCE: UEFI: PCI: Lock down BAR access when module security is enabled
    - LP: #1566221
  * SAUCE: UEFI: x86: Lock down IO port access when module security is
    enabled
    - LP: #1566221
  * SAUCE: UEFI: ACPI: Limit access to custom_method
    - LP: #1566221
  * SAUCE: UEFI: asus-wmi: Restrict debugfs interface when module loading
    is restricted
    - LP: #1566221
  * SAUCE: UEFI: Restrict /dev/mem and /dev/kmem when module loading is
    restricted
    - LP: #1566221
  * SAUCE: UEFI: kexec: Disable at runtime if the kernel enforces module
    loading restrictions
    - LP: #1566221
  * SAUCE: UEFI: x86: Restrict MSR access when module loading is restricted
    - LP: #1566221
  * SAUCE: UEFI: Add option to automatically enforce module signatures when
    in Secure Boot mode
    - LP: #1566221

  [ Stefan Bader ]

  * [Config] Add pm80xx scsi driver to d-i
    - LP: #1595628

  [ Tim Gardner ]

  * [Config] CONFIG_EFI_SECURE_BOOT_SIG_ENFORCE=y
  * SAUCE: UEFI: Display MOKSBState when disabled
    - LP: #1566221, #1571691
  * SAUCE: UEFI: Add secure boot and MOK SB State disabled sysctl
    - LP: #1593075
  * SAUCE: UEFI: Set EFI_SECURE_BOOT bit in x86_efi_facility
    - LP: #1593075
  * [Config] CONFIG_EFI=n for arm64
    - LP: #1566221

  [ Upstream Kernel Changes ]

  * powerpc/tm: Abort syscalls in active transactions
    - LP: #1572624
  * HID: core: prevent out-of-bound readings
    - LP: #1579190
  * efi: Add separate 32-bit/64-bit definitions
    - LP: #1566221
  * x86/efi: Build our own EFI services pointer table
    - LP: #1566221
  * mm: migrate dirty page without clear_page_dirty_for_io etc
    - LP: #1581865
    - CVE-2016-3070
  * oom_kill: change oom_kill.c to use for_each_thread()
    - LP: #1592429
  * oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()
    - LP: #1592429
  * oom_kill: add rcu_read_lock() into find_lock_task_mm()
    - LP: #1592429
  * virtio_balloon: return the amount of freed memory from leak_balloon()
    - LP: #1587089
  * virtio_balloon: free some memory from balloon on OOM
    - LP: #1587089
  * virtio_ballon: change stub of release_pages_by_pfn
    - LP: #1587089
  * virtio_balloon: do not change memory amount visible via /proc/meminfo
    - LP: #1587089

 -- Kamal Mostafa <email address hidden> Tue, 28 Jun 2016 12:40:49 -0700

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers