Memory leak in net/xfrm/xfrm_state.c - 8 pages per ipsec connection

Bug #1853197 reported by MIKE OLLIFF on 2019-11-19
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Bionic
High
Stefan Bader
Disco
High
Stefan Bader
Eoan
High
Stefan Bader

Bug Description

[SRU Justification]

== Impact ==

An upstream change in v4.11 made xfrm loose memory (8 pages per ipsec connection). This was fixed in v5.4 by:
  commit 86c6739eda7d "xfrm: Fix memleak on xfrm state destroy"

== Fix ==

Pick the upstream fix into all affected series.

== Testcase ==

see below

== Risk of Regression ==

Low, the change adds a single memory release case in one driver. The effect can be verified.

---

Ubuntu linux distro, 4.15.0-62 kernel, server platform.
This OS is used as an IPSec VPN gateway. It serves up to several hundred concurrent connections

In an attempt to upgrade from the 4.4 kernel to 4.15, the team noticed that VPN gateway VMs were running out of physical memory after 12-48 hours, depending on load.

Attachments from a server machine in this state in attached leakinfo.txt
output of free -t
output of /proc/meminfo in out of memory condition
output of /slabtop -o -sc
/sys/kernel/debug/page_owner sorted and aggregated after server ran for 12 hrs and ran out of memory
Patches for 4.15 and 5.4

Highlight from page_owner, we can see the leak is a buffer associated with the ipsec impelementation. Each connection leaks 32k of memory via alloc_page with order=3

100960 times:
Page allocated via order 3, mask 0x1085220(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP)
 get_page_from_freelist+0xd64/0x1250
 __alloc_pages_nodemask+0x11c/0x2e0
 alloc_pages_current+0x6a/0xe0
 skb_page_frag_refill+0x71/0x100
 esp_output_head+0x265/0x3e0 [esp4]
 esp_output+0xbc/0x180 [esp4]
 xfrm_output_resume+0x179/0x530
 xfrm_output+0x8e/0x230
 xfrm4_output_finish+0x2b/0x30
 __xfrm4_output+0x3a/0x50
 xfrm4_output+0x43/0xc0
 ip_forward_finish+0x51/0x80
 ip_forward+0x38a/0x480
 ip_rcv_finish+0x122/0x410
 ip_rcv+0x292/0x360
 __netif_receive_skb_core+0x815/0xbd0

Patch to fix this issue in 4.15 (tested and verified on same server exhibiting above leak):
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 728272f..7842f83 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -451,6 +451,10 @@ static void xfrm_state_gc_destroy(struct xfrm_state *x)
        }
        xfrm_dev_state_free(x);
        security_xfrm_state_free(x);
+
+ if(x->xfrag.page)
+ put_page(x->xfrag.page);
+
        kfree(x);
}

Patch for master branch (5.4 I believe) from Paul Wouters (<email address hidden>)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index c6f3c4a1bd99..f3423562d933 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -495,6 +495,8 @@ static void ___xfrm_state_destroy(struct xfrm_state *x)
                                x->type->destructor(x);
                                xfrm_put_type(x->type);
                }
+ if (x->xfrag.page)
+ put_page(x->xfrag.page);
                xfrm_dev_state_free(x);
                security_xfrm_state_free(x);
                xfrm_state_free(x);

Severity: Critical - we are unable to use any kernel later than 4.11, and are sticking with 4.4 in production.

MIKE OLLIFF (mikeo-symc) wrote :

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1853197

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
MIKE OLLIFF (mikeo-symc) wrote :

All VPN servers have been rolled back to 4.4
Additional log collection is not possible.
Setting status to confirmed.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
MIKE OLLIFF (mikeo-symc) on 2019-11-19
description: updated
Kai-Heng Feng (kaihengfeng) wrote :

commit 86c6739eda7d2a03f2db30cbee67a5fb81afa8ba
Author: Steffen Klassert <email address hidden>
Date: Wed Nov 6 08:13:49 2019 +0100

    xfrm: Fix memleak on xfrm state destroy

    We leak the page that we use to create skb page fragments
    when destroying the xfrm_state. Fix this by dropping a
    page reference if a page was assigned to the xfrm_state.

    Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
    Reported-by: JD <email address hidden>
    Reported-by: Paul Wouters <email address hidden>
    Signed-off-by: Steffen Klassert <email address hidden>

This commit will be automatically picked by later kernel update since it has "Fixes" tag.

MIKE OLLIFF (mikeo-symc) wrote :

That fix is in the master branch - can it be backported?

Stefan Bader (smb) wrote :

Setting this to invalid for Focal. The fix is in upstream v5.4 and we will move to that version soon.

Changed in linux (Ubuntu Bionic):
importance: Undecided → High
Changed in linux (Ubuntu Disco):
importance: Undecided → High
Changed in linux (Ubuntu Eoan):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
status: New → Triaged
Changed in linux (Ubuntu Disco):
status: New → Triaged
Changed in linux (Ubuntu Eoan):
status: New → Triaged
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Stefan Bader (smb) on 2019-11-29
description: updated
Changed in linux (Ubuntu Eoan):
assignee: nobody → Stefan Bader (smb)
Changed in linux (Ubuntu Disco):
assignee: nobody → Stefan Bader (smb)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Stefan Bader (smb)
Changed in linux (Ubuntu Bionic):
status: Triaged → Fix Committed
Changed in linux (Ubuntu Disco):
status: Triaged → Fix Committed
Changed in linux (Ubuntu Eoan):
status: Triaged → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-disco' to 'verification-done-disco'. If the problem still exists, change the tag 'verification-needed-disco' to 'verification-failed-disco'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-disco
tags: added: verification-needed-bionic

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Bernd Schütte (pent1ckel) wrote :

it is running for five days and memory consumption looks normal (not leaking)

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Confirmed
MIKE OLLIFF (mikeo-symc) wrote :

Tested 4.15 bionic with original use case. Memory leak is resolved.

Stefan Bader (smb) on 2019-12-10
Changed in linux (Ubuntu Bionic):
status: Confirmed → Fix Committed
tags: added: verification-done-bionic
removed: verification-needed-bionic
Bernd Schütte (pent1ckel) wrote :

Does it help when we test disco and eoan as well? The test case is very easy and those kernels are affected as well.

Stefan Bader (smb) wrote :

If there is an easy way to get those releases set up and tested, it helps to helps to build confidence. In this case I think the chances a not that high, that the change has a different effect in different kernel versions. But if someone either already is on Eoan/5.3 or has time to double check, that sure has value. I would not bother about Disco/5.0 that much because that is going end of life soon.

Bernd Schütte (pent1ckel) wrote :

Tested 5.3.0-25-generic on Eoan and it fixes the memory leak there as well.

Aleksei (lexa-4) on 2019-12-18
tags: added: verification-done-eoan
removed: verification-needed-eoan
Khaled El Mously (kmously) wrote :

Added 'verification-done-disco' based on Stefan's latest comment

tags: added: verification-done-disco
removed: verification-needed-disco
Launchpad Janitor (janitor) wrote :
Download full text (27.4 KiB)

This bug was fixed in the package linux - 5.3.0-26.28

---------------
linux (5.3.0-26.28) eoan; urgency=medium

  * eoan/linux: 5.3.0-26.28 -proposed tracker (LP: #1856807)

  * nvidia-435 is in eoan, linux-restricted-modules only builds against 430,
    ubiquity gives me the self-signed modules experience instead of using the
    Canonical-signed modules (LP: #1856407)
    - Add nvidia-435 dkms build

linux (5.3.0-25.27) eoan; urgency=medium

  * eoan/linux: 5.3.0-25.27 -proposed tracker (LP: #1854762)

  * CVE-2019-14901
    - SAUCE: mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()

  * CVE-2019-14896 // CVE-2019-14897
    - SAUCE: libertas: Fix two buffer overflows at parsing bss descriptor

  * CVE-2019-14895
    - SAUCE: mwifiex: fix possible heap overflow in mwifiex_process_country_ie()

  * [CML] New device id's for CMP-H (LP: #1846335)
    - mmc: sdhci-pci: Add another Id for Intel CML
    - i2c: i801: Add support for Intel Comet Lake PCH-H
    - mtd: spi-nor: intel-spi: Add support for Intel Comet Lake-H SPI serial flash
    - mfd: intel-lpss: Add Intel Comet Lake PCH-H PCI IDs

  * i915: Display flickers (monitor loses signal briefly) during "flickerfree"
    boot, while showing the BIOS logo on a black background (LP: #1836858)
    - [Config] FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER=y

  * Please add patch fixing RK818 ID detection (LP: #1853192)
    - SAUCE: mfd: rk808: Fix RK818 ID template

  * Kernel build log filled with "/bin/bash: line 5: warning: command
    substitution: ignored null byte in input" (LP: #1853843)
    - [Debian] Fix warnings when checking for modules signatures

  * Lenovo dock MAC Address pass through doesn't work in Ubuntu (LP: #1827961)
    - r8152: Add macpassthru support for ThinkPad Thunderbolt 3 Dock Gen 2

  * Dell XPS 13 9350/9360 headphone audio hiss (LP: #1654448) // [XPS 13 9360,
    Realtek ALC3246, Black Headphone Out, Front] High noise floor (LP: #1845810)
    - ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360

  * no HDMI video output since GDM greeter after linux-oem-osp1 version
    5.0.0-1026 (LP: #1852386)
    - drm/i915: Add new CNL PCH ID seen on a CML platform
    - SAUCE: drm/i915: Fix detection for a CMP-V PCH

  * [broadwell-rt286, playback] Since Linux 5.2rc2 audio playback no longer
    works on Dell Venue 11 Pro 7140 (LP: #1846539)
    - [Config] Drop snd-sof-intel-bdw build
    - SAUCE: ASoC: SOF: Intel: Broadwell: clarify mutual exclusion with legacy
      driver

  * [CML-S62] Need enable turbostat patch support for Comet lake- S 6+2
    (LP: #1847451)
    - SAUCE: tools/power turbostat: Add Cometlake support

  * External microphone can't work on some dell machines with the codec alc256
    or alc236 (LP: #1853791)
    - SAUCE: ALSA: hda/realtek - Move some alc256 pintbls to fallback table
    - SAUCE: ALSA: hda/realtek - Move some alc236 pintbls to fallback table

  * Memory leak in net/xfrm/xfrm_state.c - 8 pages per ipsec connection
    (LP: #1853197)
    - xfrm: Fix memleak on xfrm state destroy

  * CVE-2019-18660: patches for Ubuntu (LP: #1853142) // CVE-2019-18660
    - powerpc/64s: support nospectre_v2 cmdline option
    - powerp...

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (42.3 KiB)

This bug was fixed in the package linux - 5.0.0-38.41

---------------
linux (5.0.0-38.41) disco; urgency=medium

  * disco/linux: 5.0.0-38.41 -proposed tracker (LP: #1854788)

  * [Regression] Failed to boot disco kernel built from master-next (kernel
    kernel NULL pointer dereference) (LP: #1853981)
    - SAUCE: blk-mq: Fix blk_mq_make_request for mq devices

  * CVE-2019-14901
    - SAUCE: mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()

  * CVE-2019-14896 // CVE-2019-14897
    - SAUCE: libertas: Fix two buffer overflows at parsing bss descriptor

  * CVE-2019-14895
    - SAUCE: mwifiex: fix possible heap overflow in mwifiex_process_country_ie()

  * [CML] New device id's for CMP-H (LP: #1846335)
    - mmc: sdhci-pci: Add another Id for Intel CML
    - i2c: i801: Add support for Intel Comet Lake PCH-H
    - mtd: spi-nor: intel-spi: Add support for Intel Comet Lake-H SPI serial flash
    - mfd: intel-lpss: Add Intel Comet Lake PCH-H PCI IDs

  * Please add patch fixing RK818 ID detection (LP: #1853192)
    - SAUCE: mfd: rk808: Fix RK818 ID template

  * [SRU][B/OEM-B/OEM-OSP1/D] Enable new Elan touchpads which are not in current
    whitelist (LP: #1853246)
    - Input: elan_i2c - export the device id whitelist
    - HID: quirks: Refactor ELAN 400 and 401 handling

  * Lenovo dock MAC Address pass through doesn't work in Ubuntu (LP: #1827961)
    - r8152: Add macpassthru support for ThinkPad Thunderbolt 3 Dock Gen 2

  * [CML-S62] Need enable turbostat patch support for Comet lake- S 6+2
    (LP: #1847451)
    - SAUCE: tools/power turbostat: Add Cometlake support

  * External microphone can't work on some dell machines with the codec alc256
    or alc236 (LP: #1853791)
    - SAUCE: ALSA: hda/realtek - Move some alc256 pintbls to fallback table
    - SAUCE: ALSA: hda/realtek - Move some alc236 pintbls to fallback table

  * Memory leak in net/xfrm/xfrm_state.c - 8 pages per ipsec connection
    (LP: #1853197)
    - xfrm: Fix memleak on xfrm state destroy

  * CVE-2019-18660: patches for Ubuntu (LP: #1853142) // CVE-2019-18660
    - powerpc/64s: support nospectre_v2 cmdline option
    - powerpc/book3s64: Fix link stack flush on context switch
    - KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel

  * Raydium Touchscreen on ThinkPad L390 does not work (LP: #1849721)
    - HID: i2c-hid: fix no irq after reset on raydium 3118

  * Make Goodix I2C touchpads work (LP: #1853842)
    - HID: i2c-hid: Remove runtime power management
    - HID: i2c-hid: Send power-on command after reset

  * Touchpad doesn't work on Dell Inspiron 7000 2-in-1 (LP: #1851901)
    - Revert "UBUNTU: SAUCE: mfd: intel-lpss: add quirk for Dell XPS 13 7390
      2-in-1"
    - lib: devres: add a helper function for ioremap_uc
    - mfd: intel-lpss: Use devm_ioremap_uc for MMIO

  * CVE-2019-19055
    - nl80211: fix memory leak in nl80211_get_ftm_responder_stats

  * [CML-S62] Need enable intel_rapl patch support for Comet lake- S 6+2
    (LP: #1847454)
    - powercap/intel_rapl: add support for CometLake Mobile
    - powercap/intel_rapl: add support for Cometlake desktop

  * [CML-S62] Need enable intel_pmc_core driver patch for Comet l...

Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (42.4 KiB)

This bug was fixed in the package linux - 4.15.0-74.84

---------------
linux (4.15.0-74.84) bionic; urgency=medium

  * bionic/linux: 4.15.0-74.84 -proposed tracker (LP: #1856749)

  * [Hyper-V] KVP daemon fails to start on first boot of disco VM (LP: #1820063)
    - [Packaging] bind hv_kvp_daemon startup to hv_kvp device

  * Unrevert "arm64: Use firmware to detect CPUs that are not affected by
    Spectre-v2" (LP: #1854207)
    - arm64: Get rid of __smccc_workaround_1_hvc_*
    - arm64: Use firmware to detect CPUs that are not affected by Spectre-v2

  * Bionic kernel panic on Cavium ThunderX CN88XX (LP: #1853485)
    - SAUCE: irqchip/gic-v3-its: Add missing return value in
      its_irq_domain_activate()

linux (4.15.0-73.82) bionic; urgency=medium

  * bionic/linux: 4.15.0-73.82 -proposed tracker (LP: #1854819)

  * CVE-2019-14901
    - SAUCE: mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()

  * CVE-2019-14896 // CVE-2019-14897
    - SAUCE: libertas: Fix two buffer overflows at parsing bss descriptor

  * CVE-2019-14895
    - SAUCE: mwifiex: fix possible heap overflow in mwifiex_process_country_ie()

  * CVE-2019-18660: patches for Ubuntu (LP: #1853142) // CVE-2019-18660
    - powerpc/64s: support nospectre_v2 cmdline option
    - powerpc/book3s64: Fix link stack flush on context switch
    - KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel

  * Please add patch fixing RK818 ID detection (LP: #1853192)
    - SAUCE: mfd: rk808: Fix RK818 ID template

  * [SRU][B/OEM-B/OEM-OSP1/D] Enable new Elan touchpads which are not in current
    whitelist (LP: #1853246)
    - HID: quirks: Fix keyboard + touchpad on Lenovo Miix 630
    - Input: elan_i2c - export the device id whitelist
    - HID: quirks: Refactor ELAN 400 and 401 handling

  * Lenovo dock MAC Address pass through doesn't work in Ubuntu (LP: #1827961)
    - r8152: Add macpassthru support for ThinkPad Thunderbolt 3 Dock Gen 2

  * s390/dasd: reduce the default queue depth and nr of hardware queues
    (LP: #1852257)
    - s390/dasd: reduce the default queue depth and nr of hardware queues

  * External microphone can't work on some dell machines with the codec alc256
    or alc236 (LP: #1853791)
    - SAUCE: ALSA: hda/realtek - Move some alc256 pintbls to fallback table
    - SAUCE: ALSA: hda/realtek - Move some alc236 pintbls to fallback table

  * Memory leak in net/xfrm/xfrm_state.c - 8 pages per ipsec connection
    (LP: #1853197)
    - xfrm: Fix memleak on xfrm state destroy

  * CVE-2019-19083
    - drm/amd/display: memory leak

  * update ENA driver for DIMLIB dynamic interrupt moderation (LP: #1853180)
    - net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it
    - net: ena: switch to dim algorithm for rx adaptive interrupt moderation
    - net: ena: reimplement set/get_coalesce()
    - net: ena: enable the interrupt_moderation in driver_supported_features
    - net: ena: remove code duplication in
      ena_com_update_nonadaptive_moderation_interval _*()
    - net: ena: remove old adaptive interrupt moderation code from ena_netdev
    - net: ena: remove ena_restore_ethtool_params() and relevant fields
    - net: ena: remov...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers