rtlwifi: aggressive memory leak

Bug #1831751 reported by roussel geoffrey
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Unknown
linux (Ubuntu)
Fix Released
High
Unassigned
Cosmic
Fix Committed
Undecided
Connor Kuehl

Bug Description

[Impact]

 * Upstream commit 0a9f8f0a1ba9 "rtlwifi: fix btmpinfo timeout while processing C2H_BT_INFO" fixed a timeout message by adding a fast path which allowed commands to skip the queue to be processed immediately. However, the fast path doesn't free the sk_buff when it completes; this results in a memory leak when commands are fast-tracked.

[Test Case]

 * This was tested in the bug report on a RTL8723BE card. As the system is running, a memory leak is observed until it gets to the point where a reboot is necessary.

 * The following patch was applied and the ever-increasing memory consumption no longer experienced.

[Regression Potential]

 * This was fixed in Linux 4.20 and participants in both the LP bug and the Github issue reports [1] have reported positive test results with just this patch applied.

[1] https://github.com/lwfinger/rtlwifi_new/issues/401

Original bug description follows:
---

Hey, i got a memory leak on Ubuntu 18.04.2 even in console mode (no X/GUI) the memory usage grows slowly to take all the available RAM when i let the computer running over the night (with just top and irssi), and i have to reboot to get things back to normal. I didn't have this problem on Ubuntu 17.10 but i was still flooded with message about pci aer taking lots of disk space in the logs, but pci=noaer fixed this problem and i had no memory leak.
The computer is a common laptop: HP Pavilion.

---
Kernel log gets spammed with AERs so owner uses "pci=noaer"; that was briefly disabled to capture the AERs.

Memory seems to be consumed (~6 GB of 8GB) just by leaving PC overnight booted just to console (systemd.unit=multi-user.target).

The memory leak doesn't affect Windows but owner is going to check Windows Event Log for signs of AERs being logged.

---

Original suspect of AER is not guilty.

This turns out to be a bug in the rtlwifi driver where in some rare circumstances it fails to free an sk_buf.

Reporter has been testing a DKMS build of rtlwfi with the fix applied and confirms it solves the issue.

Upstream has the commit. Can we get this cherry-picked into all releases?

commit 8cfa272b0d321160ebb5b45073e39ef0a6ad73f2
Author: Larry Finger <email address hidden>
Date: Sat Nov 17 20:55:03 2018 -0600

    rtlwifi: Fix leak of skb when processing C2H_BT_INFO

    With commit 0a9f8f0a1ba9 ("rtlwifi: fix btmpinfo timeout while processing
    C2H_BT_INFO"), calling rtl_c2hcmd_enqueue() with rtl_c2h_fast_cmd() true,
    the routine returns without freeing that skb, thereby leaking it.

    This issue has been discussed at https://github.com/lwfinger/rtlwifi_new/issues/401
    and the fix tested there.

    Fixes: 0a9f8f0a1ba9 ("rtlwifi: fix btmpinfo timeout while processing C2H_BT_INFO")
    Reported-and-tested-by: Francisco Machado Magalhães Neto <email address hidden>
    Cc: Francisco Machado Magalhães Neto <email address hidden>
    Cc: Ping-Ke Shih <email address hidden>
    Cc: Stable <email address hidden> # 4.18+
    Signed-off-by: Larry Finger <email address hidden>
    Signed-off-by: Kalle Valo <email address hidden>

diff --git a/drivers/net/wireless/realtek/rtlwifi/base.c b/drivers/net/wireless/realtek/rtlwifi/base.c
index f4122c8fdd97..ef9b502ce576 100644
--- a/drivers/net/wireless/realtek/rtlwifi/base.c
+++ b/drivers/net/wireless/realtek/rtlwifi/base.c
@@ -2289,6 +2289,7 @@ void rtl_c2hcmd_enqueue(struct ieee80211_hw *hw, struct sk_buff *skb)

        if (rtl_c2h_fast_cmd(hw, skb)) {
                rtl_c2h_content_parsing(hw, skb);
+ kfree_skb(skb);
                return;
        }

TJ (tj)
description: updated
TJ (tj)
description: updated
Revision history for this message
TJ (tj) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1831751

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
TJ (tj) wrote : Re: Possible memory leak due to PCI AER faults even with pci=noaer
Revision history for this message
TJ (tj) wrote :
Revision history for this message
TJ (tj) wrote :
Revision history for this message
TJ (tj) wrote :
Revision history for this message
TJ (tj) wrote :
Revision history for this message
TJ (tj) wrote :
Revision history for this message
TJ (tj) wrote :
Changed in linux (Ubuntu):
status: Incomplete → In Progress
assignee: nobody → TJ (tj)
TJ (tj)
description: updated
Revision history for this message
TJ (tj) wrote :

Packages for 4.18.0-20.21-kmemleak are currently building in my bug-fixes PPA. See

https://launchpad.net/~tj/+archive/ubuntu/bugfixes/+packages

TJ (tj)
summary: - Possible memory leak due to PCI AER faults even with pci=noaer
+ rtlwifi: aggresive memory leak
TJ (tj)
description: updated
Revision history for this message
roussel geoffrey (roussel-geoffrey) wrote : Re: rtlwifi: aggresive memory leak

Ok the bug is fixed, it was in fact the same issue as discussed here:

https://github.com/lwfinger/rtlwifi_new/issues/401

And the fix:

https://github.com/0day-ci/linux/commit/3e10e17914a56b27007604880d6de7e7ff241e14

Rebuilding the rtlwifi module with the patch fixed the problem, memory is stable not leaking anymore!
Thanks a lot #ubuntu-discuss and especially TJ-!

TJ (tj)
Changed in linux (Ubuntu):
status: In Progress → Confirmed
tags: added: bionic
Changed in linux (Ubuntu):
importance: Undecided → High
TJ (tj)
Changed in linux (Ubuntu):
assignee: TJ (tj) → nobody
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

TJ,

Will you send the patch to kernel-team mailing list?

Revision history for this message
TJ (tj) wrote :

Upstream commit:

commit 8cfa272b0d321160ebb5b45073e39ef0a6ad73f2
Author: Larry Finger <email address hidden>
Date: Sat Nov 17 20:55:03 2018 -0600

    rtlwifi: Fix leak of skb when processing C2H_BT_INFO

    With commit 0a9f8f0a1ba9 ("rtlwifi: fix btmpinfo timeout while processing
    C2H_BT_INFO"), calling rtl_c2hcmd_enqueue() with rtl_c2h_fast_cmd() true,
    the routine returns without freeing that skb, thereby leaking it.

    This issue has been discussed at https://github.com/lwfinger/rtlwifi_new/issues/401
    and the fix tested there.

    Fixes: 0a9f8f0a1ba9 ("rtlwifi: fix btmpinfo timeout while processing C2H_BT_INFO")
    Reported-and-tested-by: Francisco Machado Magalhães Neto <email address hidden>

summary: - rtlwifi: aggresive memory leak
+ rtlwifi: aggressive memory leak
Revision history for this message
TJ (tj) wrote :

Request for cherry-pick sent to kernel-team mailing list 27 June 2019.

Changed in linux:
status: Unknown → Fix Released
tags: added: rls-bb-incoming
Revision history for this message
Connor Kuehl (connork) wrote :

Hi TJ and Roussel,

Have you experienced this with the built-in RTL driver on a Bionic 4.15 kernel or is it just a DKMS module you're using on Bionic? I only ask because I'm looking at Bionic's version of the routine this patch is for and I am not seeing an sk_buff to free; this makes me wonder if this exact leak is occurring in the built-in for Bionic at all.

I do see this in Cosmic, though, and the patch fits right in.

Changed in linux (Ubuntu Cosmic):
assignee: nobody → Connor Kuehl (connork)
Revision history for this message
roussel geoffrey (roussel-geoffrey) wrote :

Hi Connor,

I had this problem on built-in Bionic 4.18:

Linux akem-HP 4.18.0-24-generic #25~18.04.1-Ubuntu SMP Thu Jun 20 11:13:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Then TJ- helped me to build a DKMS module which fixed the problem.

Connor Kuehl (connork)
Changed in linux (Ubuntu Cosmic):
status: New → In Progress
Connor Kuehl (connork)
description: updated
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed-cosmic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-cosmic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
roussel geoffrey (roussel-geoffrey) wrote :

Hey,
I use the patched version, however it looks like there is still another leak inside, this one is visible only after several days running, i've let the computer runs for 30 days with just few small apps running and the RAM was nearly full, it was also using cache, and i couldn't trace back the memory, just like the first time i had this kind of issue. So i've blacklisted the 8723be module and i use a USB Wifi stick instead with another chip and for now, about a week later(running same config) the memory is stable.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Upstream commit available in Focal, marking this bug as fix released for linux.

Please open a new bug if you're still experiencing this issue on newer releases (Cosmic EOL)

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.