openssl: backport to jammy "clear method store / query cache confusion"

Bug #2033422 reported by Adrien Nader
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openssl (Ubuntu)
New
Undecided
Unassigned
Jammy
Fix Released
Medium
Simon Chopin
Lunar
Fix Released
Undecided
Unassigned

Bug Description

=== SRU information ===
[ATTENTION]
This SRU contains THREE changes which are listed in the section below.

[Meta]
This bug is part of a series of three bugs for a single SRU.
This ( #2033422 ) is the "central" bug with the global information and debdiff.

This SRU addresses three issues with Jammy's openssl version:
- http://pad.lv/1994165: ignored SMIME signature errors
- http://pad.lv/2023545: imbca engine dumps core
- http://pad.lv/2033422: very high CPU usage for concurrent TLS connections (this one)

The SRU information has been added to the three bug reports and I am attaching the debdiff here only for all three.

All the patches have been included in subsequent openssl 3.0.x releases which in turn have been included in subsequent Ubuntu releases. There has been no report of issues when updating to these Ubuntu releases.

I have rebuilt the openssl versions and used abi-compliance-checker to compare the ABIs of the libraries in jammy and the one for the SRU. Both matched completely (FYI, mantic's matched completely too).

I have also pushed the code to git (without any attempt to make it git-ubuntu friendly).

https://code.launchpad.net/~adrien-n/ubuntu/+source/openssl/+git/openssl/+ref/jammy-sru

I asked Brian Murray about phasing speed and he concurs a slow roll-out is probably better for openssl. There is a small uncertainty because a security update could come before the phasing is over, effectively fast-forwarding the SRU. Still, unless there is already a current pre-advisory, this is probably better than a 10% phasing which is over after only a couple days anyway.
NB: at the moment openssl doesn't phase slowly so this needs to be implemented.

[Impact]
Severely degraded performance for concurrent operations compared to openssl 1.1. The performance is so degraded that some workloads fail due to timeouts or insufficient resources (noone magically has 5 times more machines). As a consequence, a number of people use openssl 1.1 instead and do not get security updates.

[Test plan]
Rafael Lopez has shared a simple benchmarks in http://pad.lv/2009544 with https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/2009544/+attachment/5690224/+files/main.py .

To test, follow these steps:
- run "time python3 main.py" # using the aforementioned main.py script
- apt install -t jammy-proposed libssl3
- run "time python3 main.py"
- compare the runtimes for the two main.py runs

You can run this on x86_64, Raspberry Pi 4 or any machine, and get a very large speed-up in all cases. The improvements are not architecture-dependant.

Using this changeset, I get the following numbers for ten runs on my laptop:

3.0.2:
    real 2m5.567s
    user 4m3.948s
    sys 2m0.233s

this SRU:
    real 0m23.966s
    user 2m35.687s
    sys 0m1.920s

As can be easily seen, the speed-up is massive: system time is divided by 60 and overall wall clock time is roughly five times lower.

In http://pad.lv/2009544 , Rafael also shared his performance numbers and they are relatable to these. He used slightly different versions (upstreams rather than patched with cherry-picks) but at least one of the version used does not include other performance change. He also used different hardware and this performance issue seems to depend on the number of CPUs available but also obtained a performance several times better. Results on a given machine vary also very little across runs (less than 2% variation on runs of size 10). They are also very similar on a Raspberry Pi 4 (8GB).

The benchmark uses https://www.google.com/humans.txt which takes around 130ms to download on my machine but I modified the script to download something only 20ms away. Results are so close to the ones using humans.txt that they are within the error margin. This is consistent with the high-concurrency in the benchmark which both saturates CPU, and "hides" latencies that are relatively low.

Finally, there are positive reports on github. Unfortunately they are not always completely targeted at these patches only and therefore I will not link directly to them but they have also been encouraging.

[Where problems could occur]
The change is spread over several patches which touch the internals of openssl. As such, the engine and provider functionality could be broken by these changes. Fortunately, in addition to upstream's code review, these patches are included in openssl 3.0.4 (iirc) and therefore in kinetic. No issue related to these changes was reported on launchpad or upstream.

However, it is possible that there were more patch dependencies than these in either 3.0.3 or 3.0.4. In that case there could be problems.

[Patches]
The patches come directly from upstream and apply cleanly.

https://github.com/openssl/openssl/pull/18151#issuecomment-1118535602

* https://git.launchpad.net/~adrien-n/ubuntu/+source/openssl/tree/debian/patches/jammy-sru-0001-Drop-ossl_provider_clear_all_operation_bits-and-all-.patch?h=jammy-sru&id=04ef023920ab08fba214817523fba897527dfff0
* https://git.launchpad.net/~adrien-n/ubuntu/+source/openssl/tree/debian/patches/jammy-sru-0002-Refactor-method-construction-pre-and-post-condition.patch?h=jammy-sru&id=04ef023920ab08fba214817523fba897527dfff0
* https://git.launchpad.net/~adrien-n/ubuntu/+source/openssl/tree/debian/patches/jammy-sru-0003-Don-t-empty-the-method-store-when-flushing-the-query.patch?h=jammy-sru&id=04ef023920ab08fba214817523fba897527dfff0
* https://git.launchpad.net/~adrien-n/ubuntu/+source/openssl/tree/debian/patches/jammy-sru-0004-Make-it-possible-to-remove-methods-by-the-provider-t.patch?h=jammy-sru&id=04ef023920ab08fba214817523fba897527dfff0
* https://git.launchpad.net/~adrien-n/ubuntu/+source/openssl/tree/debian/patches/jammy-sru-0005-Complete-the-cleanup-of-an-algorithm-in-OSSL_METHOD_.patch?h=jammy-sru&id=04ef023920ab08fba214817523fba897527dfff0
* https://git.launchpad.net/~adrien-n/ubuntu/+source/openssl/tree/debian/patches/jammy-sru-0006-For-child-libctx-provider-don-t-count-self-reference.patch?h=jammy-sru&id=04ef023920ab08fba214817523fba897527dfff0
* https://git.launchpad.net/~adrien-n/ubuntu/+source/openssl/tree/debian/patches/jammy-sru-0007-Add-method-store-cache-flush-and-method-removal-to-n.patch?h=jammy-sru&id=04ef023920ab08fba214817523fba897527dfff0

=== Original description ===

This is about SRU'ing to Jammy the patches at https://github.com/openssl/openssl/pull/18151#issuecomment-1118535602 . They're purely performance but their impact is large. They have been released as part of openssl 3.0.4 (they're among the first after 3.0.3) which has been included in Kinetic.

Adrien Nader (adrien)
Changed in openssl (Ubuntu Lunar):
assignee: nobody → Adrien Nader (adrien-n)
Changed in openssl (Ubuntu Jammy):
milestone: none → ubuntu-22.04.4
milestone: ubuntu-22.04.4 → jammy-updates
Changed in openssl (Ubuntu Lunar):
assignee: Adrien Nader (adrien-n) → nobody
Changed in openssl (Ubuntu Jammy):
assignee: nobody → Adrien Nader (adrien-n)
importance: Undecided → Medium
status: New → In Progress
Changed in openssl (Ubuntu Lunar):
status: New → Fix Released
Revision history for this message
Adrien Nader (adrien) wrote :

I've created a PPA for Jammy that incorporates the fix mentionned. The details are available at https://launchpad.net/~adrien-n/+archive/ubuntu/openssl-jammy-sru . Could you test it and confirm your issue is solved?

Adrien Nader (adrien)
description: updated
Adrien Nader (adrien)
description: updated
Adrien Nader (adrien)
description: updated
description: updated
description: updated
Revision history for this message
Adrien Nader (adrien) wrote :

Attaching debdiff for openssl from 3.0.2-0ubuntu1.10 to 3.0.2-0ubuntu1.11

description: updated
Adrien Nader (adrien)
description: updated
Adrien Nader (adrien)
description: updated
description: updated
Adrien Nader (adrien)
description: updated
Adrien Nader (adrien)
description: updated
Adrien Nader (adrien)
description: updated
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for the contribution, Adrien.

I find the naming scheme you chose for the patches a bit confusing. For example, you're using the prefix "jammy-sru-0001-" on several patches that are actually not strictly related. You also don't mention any patch explicitly in the d/changelog entry, which forces the reader to open d/p/series and look at the comments there. Moreover, the patches are missing DEP-3 headers (which, in this case, would be very useful when trying to understand the context when looking at a single patch).

Could you please address the concerns above before we proceed with the upload?

Thanks.

Revision history for this message
Adrien Nader (adrien) wrote :

(did my mail answer from yesterday get eaten by launchpad?)

Here's an updated debdiff that:
- renames files using the lpXXXX- prefix,
- reworks the changelog to a more typical format:
    * what (LP: #XXXX)
      - ${file}
- adds DEP-3 to the patches

I've pushed an updated build on LP at https://launchpad.net/~adrien-n/+archive/ubuntu/openssl-jammy-sru/+packages

It's still building unfortunately and I noticed typos in the changelog which I corrected but didn't upload to the PPA due to how long it takes to build. The differences are very minor (first level of the lsit in d/changelog used - rather than *).

Revision history for this message
Adrien Nader (adrien) wrote :

Removed ~ubuntu-sponsors for a few days while a few things settle.

Adrien Nader (adrien)
description: updated
Revision history for this message
Simon Chopin (schopin) wrote :

I am not going to upload this, because I'm widely uncomfortable with bug 1990216 (details there).

In addition, I have a few superficial suggestions for aesthetics:

* Use lpXXXXX subdirectories in d/patches rather than add a prefix to the patchname
* Add a Bug-Ubuntu field with a LP URL to each patch to make it even easier to go back to the relevant bug

Revision history for this message
Simon Chopin (schopin) wrote :

In addition, could the test plan maybe be edited to answer the following question?

"How can one validate that the package in -proposed addresses the issue?"

Adrien Nader (adrien)
description: updated
Adrien Nader (adrien)
description: updated
description: updated
Revision history for this message
Adrien Nader (adrien) wrote :

Forgot to upload the latest debdiff.

Adrien Nader (adrien)
description: updated
Revision history for this message
Simon Chopin (schopin) wrote :

A version containing a fix for this has been uploaded to the Jammy queue to be processed by the SRU team. Thanks, Adrien :)

Adrien Nader (adrien)
description: updated
Adrien Nader (adrien)
description: updated
description: updated
Revision history for this message
Steve Langasek (vorlon) wrote : Proposed package upload rejected

An upload of openssl to jammy-proposed has been rejected from the upload queue for the following reason: "Please reupload without debian/patches/lp1994165/0002-Handle-SMIME_crlf_copy-return-code.patch".

Revision history for this message
Adrien Nader (adrien) wrote :

Here is an updated version.

I've dropped the extra patch for #1994165 and fixed the changelog where I had swapped comments for two of the patches.

I've created a new PPA at https://launchpad.net/~adrien-n/+archive/ubuntu/jammy-openssl-2033422-sru because the version is unchanged (there has been no new openssl security update).

Simon Chopin (schopin)
Changed in openssl (Ubuntu Jammy):
assignee: Adrien Nader (adrien-n) → Simon Chopin (schopin)
Revision history for this message
Adrien Nader (adrien) wrote :

I'm attaching an updated debdiff.

- remove left-over patches for a bug that we decided to not handle as part of this SRU (patches were already unlisted from d/p/series)
- added Bug-Ubuntu entries to patches

PPA is the same. New build is at https://launchpad.net/~adrien-n/+archive/ubuntu/jammy-openssl-2033422-sru/+sourcepub/15684316/+listing-archive-extra .

Revision history for this message
Simon Chopin (schopin) wrote :

Reviewed and uploaded.

For the particular patch series attached to this very bug, I'm seeing the patchset as two distincts parts. First are a few optimization patches by being more selective in flushing some caches, whereas the second part are behaviour changes that belong more in the realm of fixes than optimization. Given the nature of those fixes (e.g. "Complete the cleanup of an algorithm in OSSL_METHOD_STORE"), I'm *assuming* that they were hidden by the previous aggressive flushing, especially since they originate from the same PR (https://github.com/openssl/openssl/pull/18151)

Revision history for this message
Adrien Nader (adrien) wrote :

Thanks for the review and upload.

I have a similar take on the patches in this series and I believe it would be very difficult and riskier to try to skip some of the patches in this series which has seen real-world use as a whole, starting with openssl >= 3.0.4 (which we started shipping in lunar).

Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Hello Adrien, or anyone else affected,

Accepted openssl into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/openssl/3.0.2-0ubuntu1.13 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in openssl (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Simon Déziel (sdeziel) wrote (last edit ):

Using a fast HTTPS server (same LAN as client), the `main.py` tests goes from ~8s to ~2.5s:

# time python3 /tmp/main.py
Distro: Ubuntu 22.04.3 LTS
Python Version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
OpenSSL Version: OpenSSL 3.0.2 15 Mar 2022
0 1 2 3 ... 99
real 0m8.163s
user 0m16.041s
sys 0m3.474s

# apt-get install -t jammy-proposed libssl3
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be upgraded:
  libssl3
1 upgraded, 0 newly installed, 0 to remove and 38 not upgraded.
Need to get 1,902 kB of archives.
After this operation, 1,024 B of additional disk space will be used.

# time python3 /tmp/main.py
Distro: Ubuntu 22.04.3 LTS
Python Version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
OpenSSL Version: OpenSSL 3.0.2 15 Mar 2022
0 1 2 3 ... 99
real 0m2.533s
user 0m9.188s
sys 0m0.096s

Note: the "OpenSSL Version" string remains the same even after upgrading the libssl3 package. I would have thought the date would change but no.

Upgrading the server side (NGINX) didn't make a measurable difference.

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (openssl/3.0.2-0ubuntu1.13)

All autopkgtests for the newly accepted openssl (3.0.2-0ubuntu1.13) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

diffoscope/205 (armhf)
dotnet6/6.0.126-0ubuntu1~22.04.1 (amd64, arm64)
ganeti/3.0.2-1ubuntu1 (armhf)
linux-aws-5.19/5.19.0-1029.30~22.04.1 (arm64)
linux-aws-6.5/6.5.0-1011.11~22.04.1 (arm64)
linux-azure-6.2/6.2.0-1018.18~22.04.1 (arm64)
linux-gcp-6.2/6.2.0-1019.21~22.04.1 (arm64)
linux-gke/5.15.0-1048.53 (arm64)
linux-lowlatency-hwe-6.2/6.2.0-1018.18~22.04.1 (arm64)
linux-lowlatency-hwe-6.5/6.5.0-14.14.1~22.04.1 (arm64)
linux-mtk/5.15.0-1029.33 (arm64)
linux-nvidia-6.5/6.5.0-1010.10 (arm64)
linux-oracle-6.2/6.2.0-1017.18~22.04.1 (arm64)
puma/5.5.2-2ubuntu2 (arm64)
python-bonsai/1.3.0+ds-3build1 (armhf)
ruby3.0/3.0.2-7ubuntu2.4 (amd64, arm64, i386)
seqkit/2.1.0+ds-1 (s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#openssl

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Adrien Nader (adrien) wrote (last edit ):

Thanks a lot for the verification Simon!

I looked at the test results and I believe failed tests are all fine:

- diffoscope: pyhon "ModuleNotFoundError: No module named 'tests.utils'"
- dotnet*: complains that this dotnet is not tested for 24.04 (yes, 24.04); this system of keeping a matrix of host/targets compatibility is being removed upstream due to being impossible to maintain (confirmed by Dominik)
- ganeti: same failure as with the iptables SRU
- linux-*: timed out building the kernel
- ruby3.0: expired certificates in testsuite

For the following packages, I re-tried the tests and they succeeded:

- puma: retried twice and passed; at least one occurrence of the same failure
- python-bonsai: retried; passing now; there have been various errors in this package that disappear on re-try
- seqkit: retried; and passing I guess since it disappeared; there's a (long) history of this test failing
- systemd: retried; timeouts which don't really seem to be related to openssl, or at least I couldn't spot an error, same issues can be seen in several other logs

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssl - 3.0.2-0ubuntu1.13

---------------
openssl (3.0.2-0ubuntu1.13) jammy; urgency=medium

  * Fix (upstream): crash when using an engine for ciphers used by DRBG
    (LP: #2023545)
    - lp2023545/0001-Release-the-drbg-in-the-global-default-context-befor.patch
  * Fix (upstream): do not ignore return values for S/MIME signature
    (LP: #1994165)
    - lp1994165/0001-REGRESSION-CMS_final-do-not-ignore-CMS_dataFinal-res.patch
  * Perf (upstream): don't empty method stores and provider synchronization
    records when flushing the query cache (LP: #2033422)
    - lp2033422/0001-Drop-ossl_provider_clear_all_operation_bits-and-all-.patch
    - lp2033422/0002-Refactor-method-construction-pre-and-post-condition.patch
    - lp2033422/0003-Don-t-empty-the-method-store-when-flushing-the-query.patch
    - lp2033422/0004-Make-it-possible-to-remove-methods-by-the-provider-t.patch
    - lp2033422/0005-Complete-the-cleanup-of-an-algorithm-in-OSSL_METHOD_.patch
    - lp2033422/0006-For-child-libctx-provider-don-t-count-self-reference.patch
    - lp2033422/0007-Add-method-store-cache-flush-and-method-removal-to-n.patch

 -- Adrien Nader <email address hidden> Tue, 09 Jan 2024 11:42:50 +0100

Changed in openssl (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for openssl has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.