Servers going OOM after updating kernel from 4.10 to 4.13

Bug #1748408 reported by Dr. Jens Harbott on 2018-02-09
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Joseph Salisbury
Artful
High
Joseph Salisbury

Bug Description

== SRU Justification ==
We are seeing this on multiple servers after upgrading from previous 4.10 series HWE kernels to the new 4.13 HWE series. With the new kernel, free memory is continously decreasing at a high rate and the servers start swapping and finally OOMing services within days. With the 4.10 kernel, decrease of free memory is slower and stabilizes after a while.

Latest kernel tested is linux-image-4.13.0-32-generic but the issue also affects older kernels from that series, tested back to linux-image-4.13.0-19-generic. No issue with linux-image-4.10.0-42-generic.

The servers are running as OpenStack controller nodes using either Ocata or Pike UCA plus ceph. See attached graph for the memory behaviour.

== Fix ==
2b9478ffc550("i40e: Fix memory leak related filter programming status")
62b4c6694dfd("i40e: Add programming descriptors to cleaned_count")

== Regression Potential ==
Low. Limited to i40e and fix existing regression.

== Test Case ==
A test kernel was built with these patches and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.13.0-32-generic 4.13.0-32.35~16.04.1
ProcVersionSignature: Ubuntu 4.13.0-32.35~16.04.1-generic 4.13.13
Uname: Linux 4.13.0-32-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
Date: Fri Feb 9 09:45:50 2018
ProcEnviron:
 LANGUAGE=en_US:
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US.utf8
 SHELL=/bin/bash
SourcePackage: linux-hwe
UpgradeStatus: No upgrade log present (probably fresh install)

Dr. Jens Harbott (j-harbott) wrote :
Dr. Jens Harbott (j-harbott) wrote :
summary: - Servers going OOM
+ Servers going OOM after updating kernel from 4.10 to 4.13
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.15 kernel[0].

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15

affects: linux-hwe (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Artful):
importance: Undecided → Critical
importance: Critical → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Artful):
status: New → In Progress
Changed in linux (Ubuntu):
status: New → Triaged
Changed in linux (Ubuntu Artful):
status: In Progress → Triaged
Dr. Jens Harbott (j-harbott) wrote :

@Joseph: I did test 4.15.2, but some things are failing, in particular docker because of lacking AUFS support, so I need to build a kernel myself I guess, which will take a bit.

Also note that I'm seeing this on Xenial machines, didn't test with Artful. We used to run them with the 4.10 HWE kernels because they offer improved performance in some areas compared to the stock 4.4. kernel. Started seeing these issue after HWE switched to 4.13 a couple of weeks ago.

Joseph Salisbury (jsalisbury) wrote :

Thanks for the update. I requested testing of the mainline kernel to see if there is a commit in mainline that fixes the bug, which we could backport back to 4.13.

If the bug is not fixed in mailine, we can perform a kernel bisect to identify the commit that introduced this regression.

Dr. Jens Harbott (j-harbott) wrote :

Ok, nevermind the aufs issue, I got that resolved. Should have some results with mainline kernels in a couple of days.

Dr. Jens Harbott (j-harbott) wrote :

So here are the first results:

4.11.0-041100-generic #201705041534 - not affected
4.12.0-041200-generic #201707022031 - affected
4.13.0-041300-generic #201709031731 - affected
4.13.16-041316-generic #201711240901 - affected

Results for newer kernels are not so clear, they do not fail as fast as previous ones, but they do still fill up memory and - later - swap slowly. The rate is so slow however, that it will probably take weeks to come to some definitive results here.

Thus my next step, unless there is a better proposal, will be starting to bisect from 4.11 to 4.12, git expects 13 steps for that.

Dimitri Pappas (fragtion) wrote :

Hi guys. I'm experiencing problems with 4.13.0-36 on a cloud server (x64) with dynamic ram management. The server is provisioned to use up to 12GB of RAM, but it got so bad that only 1GB was visible, causing everything to a halt while swap usage went through the roof. Could it be related?

And here is another instance of kernel 4.13 showing memory problems similar to what is being described here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1722778

I have reverted to 4.11.0-041100-generic #201705041534, for now

Dimitri Pappas (fragtion) wrote :

Oh and I have also experienced random, unexplained OOM crashing of squid process recently on two separate machines (one i386 and one amd64) which seem to coincide in time with the upgrade from kernel 4.10 to 4.13

tags: added: kernel-bug-exists-upstream
Joseph Salisbury (jsalisbury) wrote :

@Dr. Jens Harbott, Just let me know if you need assistance with the bisect between 4.11 and 4.12. I can build the kernels for you. I would say the next step would be to test the 4.12 release candidates to narrow down the issue further. 4.12-rc1 is available here:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12-rc1/

You can just change the 'rc' part of that link to test the other release candidates, such as rc2, rc3, etc.

Dr. Jens Harbott (j-harbott) wrote :

Sorry for the delay, bisecting took longer than planned, but I now have the result:

6964e53f55837b0c49ed60d36656d2e0ee4fc27b is the first bad commit
commit 6964e53f55837b0c49ed60d36656d2e0ee4fc27b
Author: Jacob Keller <email address hidden>
Date: Mon Jun 12 15:38:36 2017 -0700

    i40e: fix handling of HW ATR eviction

The bad news is that this patch pretty certainly isn't directly the culprit, as it only fixes (and re-enables) features that seem to have been messed up earlier. So not sure how to proceed now, probably need to discuss this with upstream developers?

Dr. Jens Harbott (j-harbott) wrote :

A colleague found that this seems to be a known issue:

https://www.spinics.net/lists/netdev/msg458258.html

and the fix should be

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972

I will try cherry-picking this onto 4.13, not sure why it never seems to have been pulled into the stable branch.

Also not sure why we are still seeing issues with >= 4.14, very likely a completely different issue there, but I think we'll be fine if we get 4.13 fixed for now.

Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 2b9478ffc550f1. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1748408

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

Dr. Jens Harbott (j-harbott) wrote :

Reading the thread further, we seem to need two patches, see https://www.spinics.net/lists/netdev/msg462051.html, so I'm going to add bc6d6fd2f916a0794ae4c44b28e14e2d172e05e0 into the build, too.
Will try that on top of b32038eb34ee42fd8056f99f88652270f6667996 (tag: Ubuntu-4.13.0-32.35).

I also tested the "ethtool --set-priv-flags <intf> flow-director-atr off" option and it seems to slow down the leak similar to >= 4.14 kernels. So either that fixes only part of the issue or we have a different one that only got masked up to now.

Third option would be using the upstream i40e driver instead, testing with 2.3.4 currently and that also seems to resolve the issue.

Dr. Jens Harbott (j-harbott) wrote :

O.k., confirming that this series of patches fixes the issue:

~/linux$ git log --oneline|head -3
bc6d6fd2f916 i40e: Add programming descriptors to cleaned_count
69949b3bd674 i40e: Fix memory leak related filter programming status
b32038eb34ee UBUNTU: Ubuntu-4.13.0-32.35

Can you build the same thing on top of the latest 4.13 set? Seems some special gcc foo is needed to make the retpoline stuff working there

Vivien GUEANT (vivienfr) wrote :

I have a significant memory leak after upgrading from previous 4.10 series HWE kernels to the new 4.13 HWE series for Ubuntu 16.04 server with Ethernet controller Intel X710 for 10GbE SFP+

# dmesg | grep i40e
[ 1.625565] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.1.14-k
[ 1.625565] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
[ 1.688509] i40e 0000:02:00.0: fw 5.40.47690 api 1.5 nvm 5.40 0x80002d35 18.0.17
[ 1.959126] i40e 0000:02:00.0: MAC address: 3c:fd:fe:1a:1d:e0
[ 2.060021] i40e 0000:02:00.0: PCI-Express: Speed 8.0GT/s Width x4
[ 2.060091] i40e 0000:02:00.0: PCI-Express bandwidth available for this device may be insufficient for optimal performance.
[ 2.060096] i40e 0000:02:00.0: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate.
[ 2.085931] i40e 0000:02:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 8 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[ 2.140793] i40e 0000:02:00.1: fw 5.40.47690 api 1.5 nvm 5.40 0x80002d35 18.0.17
[ 2.422817] i40e 0000:02:00.1: MAC address: 3c:fd:fe:1a:1d:e2
[ 2.442684] i40e 0000:02:00.1: PCI-Express: Speed 8.0GT/s Width x4
[ 2.442696] i40e 0000:02:00.1: PCI-Express bandwidth available for this device may be insufficient for optimal performance.
[ 2.442715] i40e 0000:02:00.1: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate.
[ 2.443043] i40e 0000:02:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 8 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[ 2.480205] i40e 0000:02:00.0 enp2s0f0: renamed from eth1
[ 2.512183] i40e 0000:02:00.1 enp2s0f1: renamed from eth0
[ 5.800514] i40e 0000:02:00.0 enp2s0f0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None

Dr. Jens Harbott (j-harbott) wrote :

After running for a couple of days, it seems that we are still seeing the slow memory leak similar to what was noticed in >= 4.14 earlier with the patched kernel. But it won't be possible for me to bisect at that rate.

@Joseph: Getting a patched current 4.13 still would be nice, getting instructions for how to build such a kernel would be even nicer.

Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commits 2b9478 and 62b4c66. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1748408

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

Hide

Dr. Jens Harbott (j-harbott) wrote :

The test kernel solves the issue in the same way as my own kernel earlier, i.e. we still seem to have a very slow running memory leak with this kernel. I'm also seeing this slow leak when I replace the in-tree i40e driver by an upstream version (2.3.4), so either it is unrelated or contained in both.

Joseph Salisbury (jsalisbury) wrote :

Do you think an Artful SRU request should be sent for commits 2b9478 and 62b4c66? Or would you like to investigate the slow memory leak further?

Dr. Jens Harbott (j-harbott) wrote :

The slow leak will probably be tolerable for the time being, having those two patches added to the kernel would surely be a pretty valuable step that I think should be done now. My target still is Xenial with the hwe kernel, though. If you need to go via Artful to fix that, well, go ahead.

Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Artful):
status: Triaged → In Progress
Joseph Salisbury (jsalisbury) wrote :
description: updated
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Stefan Bader (smb) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
Dr. Jens Harbott (j-harbott) wrote :

@Stefan: I haven't reproduced the issue on Artful and I don't have an environment to do so. The original issue is for the HWE kernel on Xenial and only for that I can perform verification.

Stefan Bader (smb) wrote :

Together with the new Artful kernel there was also a new HWE kernel that is based on the new Artful kernel (4.13.0-38.43~16.04.1). Verification can be done with that kernel as well. Just the automatically generated messages are for the base kernels where the patch was applied to. The HWE kernel is a backport of the Artful kernel right now.

Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Dr. Jens Harbott (j-harbott) wrote :

Running the -proposed kernel on two machines now, will provide the results in a couple of days.

Dr. Jens Harbott (j-harbott) wrote :

Proposed kernels show the same improved behaviour as the earlier test kernels.

tags: added: verification-done-artful
removed: verification-needed-artful
Vivien GUEANT (vivienfr) wrote :

For how long do the Xenial HWE kernels stay in the "proposed" ?

Launchpad Janitor (janitor) wrote :
Download full text (18.9 KiB)

This bug was fixed in the package linux - 4.13.0-38.43

---------------
linux (4.13.0-38.43) artful; urgency=medium

  * linux: 4.13.0-38.43 -proposed tracker (LP: #1755762)

  * Servers going OOM after updating kernel from 4.10 to 4.13 (LP: #1748408)
    - i40e: Fix memory leak related filter programming status
    - i40e: Add programming descriptors to cleaned_count

  * [SRU] Lenovo E41 Mic mute hotkey is not responding (LP: #1753347)
    - platform/x86: ideapad-laptop: Increase timeout to wait for EC answer

  * fails to dump with latest kpti fixes (LP: #1750021)
    - kdump: write correct address of mem_section into vmcoreinfo

  * headset mic can't be detected on two Dell machines (LP: #1748807)
    - ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289
    - ALSA: hda - Fix headset mic detection problem for two Dell machines
    - ALSA: hda - Fix a wrong FIXUP for alc289 on Dell machines

  * CIFS SMB2/SMB3 does not work for domain based DFS (LP: #1747572)
    - CIFS: make IPC a regular tcon
    - CIFS: use tcon_ipc instead of use_ipc parameter of SMB2_ioctl
    - CIFS: dump IPC tcon in debug proc file

  * i2c-thunderx: erroneous error message "unhandled state: 0" (LP: #1754076)
    - i2c: octeon: Prevent error message on bus error

  * hisi_sas: Add disk LED support (LP: #1752695)
    - scsi: hisi_sas: directly attached disk LED feature for v2 hw

  * EDAC, sb_edac: Backport 1 patch to Ubuntu 17.10 (Fix missing DIMM sysfs
    entries with KNL SNC2/SNC4 mode) (LP: #1743856)
    - EDAC, sb_edac: Fix missing DIMM sysfs entries with KNL SNC2/SNC4 mode

  * [regression] Colour banding and artefacts appear system-wide on an Asus
    Zenbook UX303LA with Intel HD 4400 graphics (LP: #1749420)
    - drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA

  * DVB Card with SAA7146 chipset not working (LP: #1742316)
    - vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems

  * [Asus UX360UA] battery status in unity-panel is not changing when battery is
    being charged (LP: #1661876) // AC adapter status not detected on Asus
    ZenBook UX410UAK (LP: #1745032)
    - ACPI / battery: Add quirk for Asus UX360UA and UX410UAK

  * ASUS UX305LA - Battery state not detected correctly (LP: #1482390)
    - ACPI / battery: Add quirk for Asus GL502VSK and UX305LA

  * support thunderx2 vendor pmu events (LP: #1747523)
    - perf pmu: Extract function to get JSON alias map
    - perf pmu: Pass pmu as a parameter to get_cpuid_str()
    - perf tools arm64: Add support for get_cpuid_str function.
    - perf pmu: Add helper function is_pmu_core to detect PMU CORE devices
    - perf vendor events arm64: Add ThunderX2 implementation defined pmu core
      events
    - perf pmu: Add check for valid cpuid in perf_pmu__find_map()

  * lpfc.ko module doesn't work (LP: #1746970)
    - scsi: lpfc: Fix loop mode target discovery

  * Ubuntu 17.10 crashes on vmalloc.c (LP: #1739498)
    - powerpc/mm/book3s64: Make KERN_IO_START a variable
    - powerpc/mm/slb: Move comment next to the code it's referring to
    - powerpc/mm/hash64: Make vmalloc 56T on hash

  * ethtool -p fails to light NIC LED on HiSilicon D05 systems (LP: #1748567)
    - net...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers