long-running stunnel leaks memory

Bug #1655153 reported by Bruce Guenter on 2017-01-09
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
stunnel4 (Debian)
Fix Released
Unknown
stunnel4 (Ubuntu)
Medium
Unassigned
Xenial
Medium
Unassigned

Bug Description

[Impact]

 * This bug results in a leak of TLS session objects in the stunnel4 server whenever a connection is closed. For a long running stunnel4 server, it can eventually consume all available memory.

 * This bug was introduced in stunnel 5.27, and subsequently fixed in 5.33. Ubuntu Xenial uses 5.30.

 * For Ubuntu, only Xenial is currently impacted by this bug, as previous versions of Ubuntu use an older version of stunnel4 (prior to 5.27), and later versions of Ubuntu use a newer version of stunnel4 (at least 5.33).

 * This patch backports a single specific fix to free TLS session objects when a connection is closed, but contains no other changes from newer stunnel4 versions.

[Test Case]

 * The bug and fix can be reproduced fairly easily by setting up an stunnel4 server, then using openssl s_client to hammer against the stunnel4 server. For example, with the server running on localhost port 443, proxying to a local Apache instance, and using a client certificate:

=
#!/bin/bash
while true; do
  echo "" | openssl s_client -connect localhost:443 \
    -cert /etc/stunnel/client.pem
done
=

In another window, monitor RSS of the stunnel4 server process with something like:

=
watch 'ps -p $(</var/run/stunnel4/pid) -o rss,comm'
=

 * The RSS of the stunnel4 process will continue to grow over time.

 * After installing the patched version via my PPA[1] and re-running the test, the RSS of the stunnel4 process will grow for a few minutes and then reach a steady state where it no longer continues to grow.

[1] https://launchpad.net/~lscotte/+archive/ubuntu/stunnel4

[Regression Potential]

 * None expected. This backports a fix in newer versions of upstream stunnel4.

 * In my own environment, I've been running a production stunnel4 server with my patch for over 85 days (zero restarts of the stunnel4 process). With the current Xenial version I was unable to run for more than 1 day without restarting stunnel4.

[Original Description]

We are running a long-running stunnel4 daemon to proxy TLS connections to another set of servers. After leaving it running for a few weeks, its memory usage had grown to 1.5GB. Restarting it reduced its memory usage to expected levels (VSZ and RSS) but while I've been watching it today it has grown by more than 10MB.

The stunnel website indicates that there have been fixes relating to memory leaks in versions 5.32 and 5.33, but Ubuntu LTS is still running 5.30.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: stunnel4 3:5.30-1
ProcVersionSignature: Ubuntu 4.4.0-45.66-generic 4.4.21
Uname: Linux 4.4.0-45-generic i686
ApportVersion: 2.20.1-0ubuntu2.4
Architecture: i386
Date: Mon Jan 9 16:03:37 2017
InstallationDate: Installed on 2015-10-31 (435 days ago)
InstallationMedia: Ubuntu-Server 15.10 "Wily Werewolf" - Release i386 (20151021)
ProcEnviron:
 TERM=xterm
 SHELL=/bin/bash
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 XDG_RUNTIME_DIR=<set>
SourcePackage: stunnel4
UpgradeStatus: Upgraded to xenial on 2016-05-18 (236 days ago)
mtime.conffile..etc.default.stunnel4: 2016-10-26T22:22:28.166247

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in stunnel4 (Ubuntu):
status: New → Confirmed
Scott Emmons (lscotte) wrote :

We are seeing the same issue, but only since upgrading from trusty to xenial.

After about 1 day:

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
1 116 1512 1 20 0 711992 512324 - Ss ? 1:46 /usr/bin/stunnel4 /etc/stunnel/stunnel.conf

After restarting stunnel4:

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
1 116 15023 1 20 0 182596 5392 - Ssl ? 0:00 /usr/bin/stunnel4 /etc/stunnel/stunnel.conf

Scott Emmons (lscotte) wrote :

It is quite likely that this is a bug which was introduced in stunnel 5.27[1] and subsequently fixed in 5.33[2]:

  - Fixed a TLS session caching memory leak (thx to Richard Kraemer).
    Before stunnel 5.27 this leak only emerged with sessiond enabled.

[1] https://www.stunnel.org/pipermail/stunnel-users/2016-May/005485.html
[2] https://www.stunnel.org/pipermail/stunnel-announce/2016-June/000122.html

Changed in stunnel4 (Ubuntu):
importance: Undecided → Medium
importance: Medium → High
importance: High → Medium
Scott Emmons (lscotte) wrote :

I have a possible patch for this by backporting a specific fix related to a SSL session leak from upstream stunnel4. It seems to be working well for me.

With 5.30-1 (the current version in Xenial), the RSS keeps growing. With this patch applied, RSS grows to around ~13000 and stays there.

It's somewhat difficult to prove the derivation of this patch from upstream stunnel4, as there is no version control repository for stunnel4. I made this patch by comparing the source of 5.32 and 5.33, and ultimately there was just a single line that looked to be relevant - adding a call to SSL_SESSION_free(). I can't promise this is a full fix, but it looks promising based on my own testing.

Can someone else experiencing this issue give this diff a try and see if it improves things for you as well? If this looks good, then perhaps we can get the stunnel4 package maintainer to sponsor getting this in.

Upstream Debian testing/sid is already using a newer version, so this is something that would be an Ubuntu patch and only applies to Xenial and other Ubuntu versions where stunnel4 versions >5.27 and <5.33 are used.

The attachment "stunnel4_5.30-1.1.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Scott Emmons (lscotte) wrote :

A couple of additional comments:

To make testing easy, feel free to try this PPA with the patch: https://launchpad.net/~lscotte/+archive/ubuntu/stunnel4/

Also, I discovered that this bug is present in the version provided in jessie-backports, so I've also opened this bug in upstream Jessie: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=864391

Scott Emmons (lscotte) wrote :

Fix confirmed to solve the SSL session leak for me - under constant load, the RSS for stunnel would continue to grow - seeing an RSS of 500000+ was common after a few hours. Now, after running overnight the RSS is still at 13348 and I was able to remove a cron job to restart stunnel.

Changed in stunnel4 (Debian):
status: Unknown → New
Changed in stunnel4 (Debian):
status: New → Fix Released
Bryan Quigley (bryanquigley) wrote :

This is fixed in all releases above 16.04 so should be marked Fix Released once the Xenial task is approved.

Bryan Quigley (bryanquigley) wrote :

(oops, leaving the Xenial task open of course)

Simon Quigley (tsimonq2) wrote :

@Bryan: Done.

Changed in stunnel4 (Ubuntu Xenial):
status: New → Fix Released
status: Fix Released → Confirmed
Changed in stunnel4 (Ubuntu):
status: Confirmed → Fix Released
Changed in stunnel4 (Ubuntu Xenial):
importance: Undecided → Medium
Simon Quigley (tsimonq2) wrote :

Hello Scott! Apologies for the delay on a review.

In the changelog, please change unstable (a Debian codename) to xenial (the Ubuntu codename you are targeting it to) and change the version from 3:5.30-1.1 to an Ubuntu version (I think 3:5.30-1ubuntu0.1 would be best).

Also, please add a DEP-3 patch header to your patch by running `quilt header --dep3 -e` when that patch is on the top. More details on that can be found here: http://dep.debian.net/deps/dep3/

Lastly, please edit the bug to follow the Stable Release Updates bug template: https://wiki.ubuntu.com/StableReleaseUpdates

Thank you for your help in fixing this bug!

Unsubscribing ~ubuntu-sponsors, please resubscribe ~ubuntu-sponsors when you have an updated patch.

Scott Emmons (lscotte) wrote :

Thanks Simon, I didn't expect my current patch to be the final one - just a demonstration of the fix. I am more than happy to contribute a compliant patch, if this one-off fix for Xenial is the best way to go. Unfortunately, upstream debian closed the bug without fixing jessie-backports and the maintainer has not responded to email, so I don't expect to see movement there (and the affected version is only in jessie-backports and xenial at this point).

I can happily report that running with my patch, stunnel4 has been up now for 85 days and the RSS of the process is still just 13084 (as reported, previously I had to restart stunnel4 as the RSS would grow to 500000+ in a few hours).

I'll rework the patch and resubmit it. Thanks again for your reply and guidance - it's greatly appreciated!

Scott Emmons (lscotte) wrote :

Attached is an updated debdiff. I have attempted to fill out the header per recommendations (somewhat tricky for stunnel4, as there is no bug tracking system and granularity of commits is by release - official source repository is an rsync of tarballs, but the maintainer does have a github mirror which I have linked to). Please let me know what I've missed.

I also updated my PPA[1] with a build of this patch.

[1] https://launchpad.net/~lscotte/+archive/ubuntu/stunnel4

Thanks again for all the help and guidance!

Scott Emmons (lscotte) on 2017-08-31
description: updated
Simon Quigley (tsimonq2) wrote :

Hey Scott! There's a couple of things that are not correct with your patch:
 1. "Fixes launchpad bug 1655153." - this is not enough to automatically close the bug report, it should be like this, preferably at the end of the changelog entry: "(LP: #1655153)".
 2. Ubuntu is different than Debian in that while it is nice to ask the previous uploader before uploading things, Ubuntu Developers collectively maintain and are responsible for packages. As such, this isn't a non-maintainer upload (I'm an Ubuntu Developer and I acknowledge your change), so it shouldn't have that entry. Also, please change the Maintainer in debian/control as such.
 3. The description in the patch should be indented by a space so it is machine-readable.
 4. Instead of linking to the commit in "Origin", it should replace "5.33" in "Applied-Upstream".

Since these are literally just changelog and DEP-3 header nitpicks, I've just fixed it (but indicated above for you to note and attached the debdiff so you can see exactly what I uploaded) and uploaded it (waiting for an SRU team member to review it now). :)

Thank you for your contribution to Ubuntu, I really appreciate the promptness of a follow-up patch and your willingness to fix this bug!

description: updated
Scott Emmons (lscotte) wrote :

Thank you very much Simon, I do appreciate your time and help in getting my patch correct.

Hello Bruce, or anyone else affected,

Accepted stunnel4 into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/stunnel4/3:5.30-1ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in stunnel4 (Ubuntu Xenial):
status: Confirmed → Fix Committed
tags: added: verification-needed verification-needed-xenial

The server we were observing the problem on was upgraded to Yakkety, which is running stunnel4 5.35, so I can no longer test this.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.