Memory leak when a blind CONNECT tunnel job is closed

Bug #1989380 reported by phyphor
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Squid
Unknown
Unknown
squid (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Undecided
Sergio Durigan Junior

Bug Description

[ Impact ]

Squid users can experience memory leaks when a "lonely" to-server connection closes, because blind CONNECT tunnel jobs are not being destroyed in this scenario. In other words, if a connection to the server gets closed, squid fails to close its associated tunnel. This memory leak would build up after some time leading to squid being OOM killed.

This regression was introduced by https://github.com/squid-cache/squid/commit/25d2603fc361912cf7c38ce0789ce07254d73b0f, which is present in the squid version that we ship on Jammy. It's been fixed on version 5.4.

[ Test Plan ]

Unfortunately, this is one of those bugs that require a non-trivial amount of time *and* setup to trigger. The reporter has been very kind and helpful, was able to consistently reproduce the bug after a few days of leaving squid running, and kindly tested the proposed patch to make sure it works. After 11 days without seeing the bug manifest again, I decided that it's justifiable to proceed with the SRU and rely on the reporter's help to officially verify that the problem has indeed been addressed.

[ Where problems could occur ]

The patch is extremely simple: it just calls "retryOrBail" as the last step of the function that is notified when the server closes the tunnel. The "retryOrBail" function acts almost like a destructor, making sure to either retry/reforward the connection if applicable, or to bail and delete the resources otherwise.

Although this code has been running in production for several months without regressions (as can be seen in the upstream bug's comments), there is still the (small) chance that a problematic situation arises due to this extra function call, especially considering that it deals with volatile situations like clients and servers closing a connection, sometimes abruptly. In the unlikely event that we encounter such problems in the future, we can revert the patch and get in touch with upstream, who has always been very responsive and competent in fixing complex bugs.

[ Original Description ]

There is a known memory leak in squid 5.2 - as noted here: https://bugs.squid-cache.org/show_bug.cgi?id=5132

The only readily available version of squid in the latest LTS release of Ubuntu (22.04) is 5.2-1ubuntu4.1, which means that doing a release upgrade leads to Ubuntu web proxies having an unstable instance of squid, as it gets OOM killed.

See, for example, another user with an issue: https://www.spinics.net/lists/squid/msg95428.html
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
CasperMD5CheckResult: pass
DistroRelease: Ubuntu 22.04
InstallationDate: Installed on 2021-11-22 (294 days ago)
InstallationMedia: Ubuntu-Server 20.04.3 LTS "Focal Fossa" - Release amd64 (20210824)
Package: squid 5.2-1ubuntu4.1
PackageArchitecture: amd64
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.15.0-47.51-generic 5.15.46
Tags: jammy uec-images
Uname: Linux 5.15.0-47-generic x86_64
UpgradeStatus: Upgraded to jammy on 2022-08-26 (17 days ago)
UserGroups: N/A
_MarkForUpload: True
mtime.conffile..etc.squid.squid.conf: 2022-09-01T15:51:34.064594

Related branches

phyphor (phyphor)
summary: - squid memory leak
+ squid 5.2 memory leak - affects Ubuntu LTS 22.04
Revision history for this message
Robie Basak (racb) wrote : Re: squid 5.2 memory leak - affects Ubuntu LTS 22.04

Thank you for taking the time to report this bug and helping to make Ubuntu better.

According to the upstream bug, this is the upstream commit to master: https://github.com/squid-cache/squid/commit/752fa2083698a88533ef23d88490f43f153f7a4c

It looks like this was fixed in 5.5-1, so marking Fix Released for Kinetic, and opening a task for Jammy.

https://github.com/squid-cache/squid/commit/54ad10efe146c863bdadee0d1161299ea89f5419 looks like the cherry-pick to the upstream v5 branch.

@phyphor it would be helpful if you could provide steps to reproduce the problem on 22.04 please, to help validate the fix.

Changed in squid (Ubuntu):
status: New → Fix Released
summary: - squid 5.2 memory leak - affects Ubuntu LTS 22.04
+ Memory leak when a blind CONNECT tunnel job closed
summary: - Memory leak when a blind CONNECT tunnel job closed
+ Memory leak when a blind CONNECT tunnel job is closed
Revision history for this message
phyphor (phyphor) wrote :

@racb the steps were:
Have an existing squid proxy and do a release upgrade, or install a fresh version of Ubuntu LTS 22.04 and install squid.
In either case the server ends up with squid 5.2, which has a known memory leak.

We have multiple, highly available, squid proxies for thousands of VMs.
The ones running Ubuntu 20.04.5 LTS (Focal Fossa), have squid 4.10 and have been stable for months, but the one upgraded to 22.04.1 (Jammy Jellyfish) had squid 5.2 which slowly increased the memory it was taking until it had consumed all of the available memory (4GB over about a day) until squid was oomkilled and had to be restarted.

Unfortunately it isn't feasible to provide a sample of the network traffic that generates this failure, but this is a recognised issue with that version of squid.

Revision history for this message
phyphor (phyphor) wrote : Dependencies.txt

apport information

tags: added: apport-collected jammy uec-images
description: updated
Revision history for this message
phyphor (phyphor) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
phyphor (phyphor) wrote : modified.conffile..etc.squid.squid.conf.txt

apport information

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you for providing more information regarding your setup, phyphor.

As it is unlikely that we will be able to reproduce the scenario that triggers this problem, I believe we can consider moving forward with the SRU and rely on you to test and validate the fix. Do you think you could do that? I can prepare a PPA with the proposed patch. Thanks.

Revision history for this message
phyphor (phyphor) wrote :

Apologies for the delay in response - I will absolutely be able to spin up a test box with the test fix via a PPA.

I will be able to get results for you, and should even be able to respond much quicker once I do than I did this time!

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote (last edit ):

Hello phyphor,

Thanks for the reply. I went ahead and prepared a squid package containing the backported patch from upstream, and uploaded it to the following PPA:

https://launchpad.net/~sergiodj/+archive/ubuntu/squid-bug1989380-proper
(edited)

Please let me know when you have the chance to test it. Depending on the results, we shall proceed with the SRU process.

Thanks a lot.

Changed in squid (Ubuntu Jammy):
assignee: nobody → Sergio Durigan Junior (sergiodj)
tags: added: server-todo
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Actually, let me reupload the package with a new version. One sec.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :
Revision history for this message
phyphor (phyphor) wrote :

Thanks for this, Sergio. I didn't want to make changes on Friday so I deployed this version (Version: 5.2-1ubuntu4.3~ppa1) this morning on our only node running 22.04.1 (jammy), and brought it into production.

It's been running well so far, so it appears that the memory leak has, indeed, been vanquished, but I think we'll all feel better if we let it run for a while before fully confirming.

I'll provide an update if it breaks, otherwise sometime next week if that's convenient, for you.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote : Re: [Bug 1989380] Re: Memory leak when a blind CONNECT tunnel job is closed

On Monday, January 09 2023, phyphor wrote:

> Thanks for this, Sergio. I didn't want to make changes on Friday so I
> deployed this version (Version: 5.2-1ubuntu4.3~ppa1) this morning on our
> only node running 22.04.1 (jammy), and brought it into production.
>
> It's been running well so far, so it appears that the memory leak has,
> indeed, been vanquished, but I think we'll all feel better if we let it
> run for a while before fully confirming.
>
> I'll provide an update if it breaks, otherwise sometime next week if
> that's convenient, for you.

Hi phyphor,

Thanks a lot for following up. Yeah, letting this run for a while
sounds perfect and is the way to go. Did you experience the memory
leaks before one week of having squid running? If yes, the one week
sounds good. Otherwise, we can wait more time too.

Cheers,

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Revision history for this message
phyphor (phyphor) wrote :

I seem to recall it used to be so bad that in our production use case (thousands of proxied connections) it would fail in a day or so.

It's now been up for 11 days, and holding steady. I figured it was likely to be good after the first few days, but we wanted to give it a full run this week as our usage was slightly down last week and we wanted to be certain before giving the green light.

Obviously we can only confirm what we've seen, but it seems like you've successfully resolved the problem which is really wonderful news!

Please do let me know if there's any additional information I can reasonably provide to help this patch get through the Stable Release Update, and thank you again for your work on this.

Revision history for this message
phyphor (phyphor) wrote :

lsb-release:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

squid-v:
Squid Cache: Version 5.2
Service Name: squid
Ubuntu linux

systemctl status squid:
● squid.service - Squid Web Proxy Server
     Loaded: loaded (/lib/systemd/system/squid.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2023-01-07 19:57:57 UTC; 1 week 4 days ago
       Docs: man:squid(8)
   Main PID: 1091 (squid)
      Tasks: 4 (limit: 4574)
     Memory: 2.4G
        CPU: 1d 2h 57min 1.891s

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you very much for the follow up, phyphor.

I believe 11 days without seeing the bug manifest is good enough for what we need here. I will go ahead and prepare the SRU.

I'll be on PTO during most part of next week, so it may take a little bit for things to move forward. I spoke to someone else from my team and they will be monitoring this bug in case it needs some more TLC.

Once the SRU is accepted, we will need your help (again) to confirm that the official package indeed fixes the problem. You should see a comment here asking for such help, but I will ping you just in case.

Thanks again.

Changed in squid (Ubuntu Jammy):
status: New → Confirmed
description: updated
Changed in squid (Ubuntu Jammy):
status: Confirmed → In Progress
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Looking at this in my SRU shift of today.

Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello phyphor, or anyone else affected,

Accepted squid into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/squid/5.2-1ubuntu4.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in squid (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
phyphor (phyphor) wrote :

I upgraded a server that was running 20.04 to 22.04 and then added ubuntu-jammy-proposed.list following the instructions on the linked site. I followed the instructions to upgrade squid to the proposed version and have been running it (in production) as part of a highly available group of proxies in production (the others all running the latest squid available under 20.04).

I don't have full information about the usage over the past 5 days but there have been at least hundreds of distinct users and thousands of distinct sessions.

It's now been running in production for 5 days and still appears solid - no ballooning of memory, no reported issues from users, no high load averages, and no problems jumping on to the server and getting a response from the shell:

lsb-release:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

apt show squid:
Package: squid
Version: 5.2-1ubuntu4.3
Priority: optional

systemctl status squid:
● squid.service - Squid Web Proxy Server
     Loaded: loaded (/lib/systemd/system/squid.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2023-02-10 01:27:38 UTC; 5 days ago
       Docs: man:squid(8)
    Process: 6178 ExecStartPre=/usr/sbin/squid --foreground -z (code=exited, status=0/SUCCESS)
   Main PID: 6183 (squid)
      Tasks: 4 (limit: 4496)
     Memory: 2.7G
        CPU: 2d 6min 13.940s
     CGroup: /system.slice/squid.service
             ├─ 6183 /usr/sbin/squid --foreground -sYC
             ├─ 6186 "(squid-1)" --kid squid-1 --foreground -sYC
             ├─ 6188 "(unlinkd)"
             └─1701774 "(pinger)"

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
phyphor (phyphor) wrote :

To confirm, using the stock "stable" version of squid we had previously observed significant squid consuming more and more memory until it was OOM killed in a matter of, at most, a couple of days, if not hours, so we believe that 5 days of use, including the weekend which is our busiest period, has shown the proposed version is robust.

We will continue to monitor but don't expect to see any issues and are looking forward to happily upgrading the entire estate to 22.04 once the updated version of squid is accepted into main.

Thanks again.

tags: removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package squid - 5.2-1ubuntu4.3

---------------
squid (5.2-1ubuntu4.3) jammy; urgency=medium

  * d/p/close-tunnel-if-to-server-conn-closes-after-client.patch:
    Close tunnel "job" after to-server client connection closes,
    fixing memory leak. (LP: #1989380)

 -- Sergio Durigan Junior <email address hidden> Thu, 05 Jan 2023 15:50:48 -0500

Changed in squid (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for squid has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.