use-after-free in af_alg_accept() due to bh_lock_sock()

Bug #1884766 reported by Mauricio Faria de Oliveira on 2020-06-23
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Mauricio Faria de Oliveira
Xenial
Medium
Mauricio Faria de Oliveira
Bionic
Medium
Mauricio Faria de Oliveira
Eoan
Medium
Mauricio Faria de Oliveira
Focal
Medium
Mauricio Faria de Oliveira
Groovy
Undecided
Unassigned

Bug Description

[Impact]

 * Users of the Linux kernel's crypto userspace API
   reported BUG() / kernel NULL pointer dereference
   errors after kernel upgrades.

 * The stack trace signature is an accept() syscall
   going through af_alg_accept() and hitting errors
   usually in one of:
   - apparmor_sk_clone_security()
   - apparmor_sock_graft()
   - release_sock()

[Fix]

 * This is a regression introduced by upstream commit
   37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
   in sk_destruct") which made its way through stable.

 * The offending patch allows the critical regions
   of af_alg_accept() and af_alg_release_parent() to
   run concurrently; now with the "right" events on 2
   CPUs it might drop the non-atomic reference counter
   of the alg_sock then the sock, thus release a sock
   that is still in use.

 * The fix is upstream commit 34c86f4c4a7b ("crypto:
   af_alg - fix use-after-free in af_alg_accept() due
   to bh_lock_sock()") [1]. It changes alg_sock's ref
   counter to atomic, which addresses the root cause.

[Test Case]

 * There is a synthetic test case available, which
   uses a kprobes kernel module to synchronize the
   concurrent CPUs on the instructions responsible
   for the problem; and a userspace part to run it.

 * The organic reproducer is the Varnish Cache Plus
   software with the Crypto vmod (which uses kernel
   crypto userspace API) under long, very high load.

 * The patch has been verified on both reproducers
   with the 4.15 and 5.7 kernels.

 * More tests performed with 'stress-ng --af-alg'
   with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal
   (all on same version of stress-ng, V0.11.14)

   No regressions observed from original kernel.
   (the af-alg stressor can exercise almost all
   kernel crypto modules shipped with the kernel;
   so it checks more paths/crypto alg interfaces.)

[Regression Potential]

 * The fix patch does a fundamental change in how
   alg_sock reference counters work, plus another
   change to the 'nokey' counting. This of course
   *has* a risk of regression.

 * Regressions theoretically could manifest as use
   after free errors (in case of undercounting) in
   the af_alg functions or silent memory leaks (in
   case of overcounting), but also other behaviors
   since reference counting is key to many things.

 * FWIW, this patch has been written by the crypto
   subsystem maintainer, who certainly knows a lot
   of the normal and corner cases, thus giving the
   patch more credit.

 * Testing with the organic reproducer ran as long
   as 5 days, without issues, so it does look good.

[Other Info]

 * Not sending for Groovy (should get via Unstable).

 * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d

[Stack Trace Examples]

Examples:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    ...
    RIP: 0010:apparmor_sk_clone_security+0x26/0x70
    ...
    Call Trace:
     security_sk_clone+0x33/0x50
     af_alg_accept+0x81/0x1c0 [af_alg]
     alg_accept+0x15/0x20 [af_alg]
     SYSC_accept4+0xff/0x210
     SyS_accept+0x10/0x20
     do_syscall_64+0x73/0x130
     entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    general protection fault: 0000 [#1] SMP PTI
    ...
    RIP: 0010:__release_sock+0x54/0xe0
    ...
    Call Trace:
     release_sock+0x30/0xa0
     af_alg_accept+0x122/0x1c0 [af_alg]
     alg_accept+0x15/0x20 [af_alg]
     SYSC_accept4+0xff/0x210
     SyS_accept+0x10/0x20
     do_syscall_64+0x73/0x130
     entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Changed in linux (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
tags: added: sts
description: updated
description: updated
Changed in linux (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Eoan):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Focal):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Groovy):
status: Confirmed → Won't Fix
importance: Medium → Undecided
assignee: Mauricio Faria de Oliveira (mfo) → nobody
Changed in linux (Ubuntu):
status: Confirmed → In Progress
description: updated

Focal: testing
=====

$ ./stress-ng --version
stress-ng, version 0.11.14 (gcc 9.3, x86_64 Linux 5.4.0-38-generic) 💻🔥

$ sudo modprobe -a \
          $(modinfo \
              /lib/modules/$(uname -r)/kernel/crypto/*.ko \
              /lib/modules/$(uname -r)/kernel/arch/*/crypto/*.ko \
              | grep -ow 'crypto-.*')

No error/strange kernel messages logged in /var/log/kern.log.

original:
--------

 $ uname -rv
 5.4.0-38-generic #42-Ubuntu SMP Mon Jun 8 14:14:24 UTC 2020

 $ ./stress-ng --af-alg 0 --timeout 1h 2>&1 | tee ../stress-ng.log.focal.orig
 stress-ng: info: [27052] dispatching hogs: 11 af-alg
 stress-ng: info: [27054] stress-ng-af-alg: 62 cryptographic algorithms found in /proc/crypto
 stress-ng: info: [27054] stress-ng-af-alg: 101 cryptographic algorithms max (with defconfigs)
 stress-ng: info: [27052] successful run completed in 3600.38s (1 hour, 0.38 secs)

modified:
--------

 $ uname -rv
 5.4.0-38-generic #42+test20200623b1 SMP Tue Jun 23 09:37:56 -03 2020

 $ ./stress-ng --af-alg 0 --timeout 1h 2>&1 | tee ../stress-ng.log.focal.mod.2
 stress-ng: info: [2577] dispatching hogs: 11 af-alg
 stress-ng: info: [2579] stress-ng-af-alg: 62 cryptographic algorithms found in /proc/crypto
 stress-ng: info: [2579] stress-ng-af-alg: 101 cryptographic algorithms max (with defconfigs)
 stress-ng: info: [2577] successful run completed in 3600.52s (1 hour, 0.52 secs)

Eoan: testing
====

original:
--------

 $ uname -rv
 5.3.0-62-generic #56-Ubuntu SMP Tue Jun 23 11:20:52 UTC 2020

 $ ./stress-ng --version
 stress-ng, version 0.11.14 (gcc 9.2, x86_64 Linux 5.3.0-62-generic) 💻🔥

 $ ./stress-ng --af-alg 0 --timeout 1h 2>&1 | tee ../stress-ng.log.eoan.orig
 stress-ng: info: [10690] dispatching hogs: 11 af-alg
 stress-ng: info: [10692] stress-ng-af-alg: 64 cryptographic algorithms found in /proc/crypto
 stress-ng: info: [10692] stress-ng-af-alg: 101 cryptographic algorithms max (with defconfigs)
 stress-ng: info: [10690] successful run completed in 3600.34s (1 hour, 0.34 secs)

modified:
--------

 $ uname -rv
 5.3.0-62-generic #56+test20200630b1 SMP Tue Jun 30 12:33:10 -03 2020

 $ ./stress-ng --af-alg 0 --timeout 1h 2>&1 | tee ../stress-ng.log.eoan.mod
 stress-ng: info: [2453] dispatching hogs: 11 af-alg
 stress-ng: info: [2455] stress-ng-af-alg: 64 cryptographic algorithms found in /proc/crypto
 stress-ng: info: [2455] stress-ng-af-alg: 101 cryptographic algorithms max (with defconfigs)
 stress-ng: info: [2453] successful run completed in 3600.44s (1 hour, 0.44 secs)

Disco: testing
=====

original:
--------

 $ uname -rv
 5.0.0-38-generic #41-Ubuntu SMP Tue Dec 3 00:27:35 UTC 2019

 $ ./stress-ng --version
 stress-ng, version 0.11.14 (gcc 8.3, x86_64 Linux 5.0.0-38-generic) 💻🔥

 $ ./stress-ng --af-alg 0 --timeout 1h 2>&1 | tee ../stress-ng.log.disco.orig
 stress-ng: info: [13699] dispatching hogs: 11 af-alg
 stress-ng: info: [13701] stress-ng-af-alg: 63 cryptographic algorithms found in /proc/crypto
 stress-ng: info: [13701] stress-ng-af-alg: 102 cryptographic algorithms max (with defconfigs)
 stress-ng: info: [13699] successful run completed in 3600.49s (1 hour, 0.49 secs)

modified:
--------

 $ uname -rv
 5.0.0-56-generic #60+test20200630b1 SMP Tue Jun 30 14:12:46 -03 2020

 $ ./stress-ng --af-alg 0 --timeout 1h 2>&1 | tee ../stress-ng.log.disco.mod
 stress-ng: info: [2101] dispatching hogs: 11 af-alg
 stress-ng: info: [2103] stress-ng-af-alg: 63 cryptographic algorithms found in /proc/crypto
 stress-ng: info: [2103] stress-ng-af-alg: 102 cryptographic algorithms max (with defconfigs)
 stress-ng: info: [2101] successful run completed in 3600.48s (1 hour, 0.48 secs)

Bionic: testing
======

original:
--------

 $ uname -rv
 4.15.0-107-generic #108-Ubuntu SMP Mon Jun 8 17:51:33 UTC 2020

 $ ./stress-ng --version
 stress-ng, version 0.11.14 (gcc 7.5, x86_64 Linux 4.15.0-107-generic) 💻🔥

 $ ./stress-ng --af-alg 0 --timeout 1h 2>&1 | tee ../stress-ng.log.bionic.orig
 stress-ng: info: [13821] dispatching hogs: 11 af-alg
 stress-ng: info: [13823] stress-ng-af-alg: 59 cryptographic algorithms found in /proc/crypto
 stress-ng: info: [13823] stress-ng-af-alg: 105 cryptographic algorithms max (with defconfigs)
 stress-ng: info: [13821] successful run completed in 3600.44s (1 hour, 0.44 secs)

modified:
--------

 $ uname -rv
 4.15.0-107-generic #108+test20200623b1 SMP Tue Jun 23 09:55:21 -03 2020

 $ ./stress-ng --af-alg 0 --timeout 1h 2>&1 | tee ../stress-ng.log.bionic.mod
 stress-ng: info: [1551] dispatching hogs: 11 af-alg
 stress-ng: info: [1553] stress-ng-af-alg: 59 cryptographic algorithms found in /proc/crypto
 stress-ng: info: [1553] stress-ng-af-alg: 105 cryptographic algorithms max (with defconfigs)
 stress-ng: info: [1551] successful run completed in 3600.44s (1 hour, 0.44 secs)

Xenial
======

original:
--------

 $ uname -rv
 4.4.0-185-generic #215-Ubuntu SMP Mon Jun 8 21:53:19 UTC 2020

 $ ./stress-ng --version
 stress-ng, version 0.11.14 (gcc 5.4, x86_64 Linux 4.4.0-185-generic) 💻🔥

 $ ./stress-ng --af-alg 0 --timeout 30m 2>&1 | tee ../stress-ng.log.xenial.orig
 stress-ng: info: [11823] dispatching hogs: 11 af-alg
 stress-ng: info: [11825] stress-ng-af-alg: 52 cryptographic algorithms found in /proc/crypto
 stress-ng: info: [11825] stress-ng-af-alg: 105 cryptographic algorithms max (with defconfigs)
 stress-ng: info: [11823] successful run completed in 1800.31s (30 mins, 0.31 secs)

modified:
--------

 $ uname -rv
 4.4.0-185-generic #215+test20200630b1 SMP Tue Jun 30 18:46:39 -03 2020

 $ ./stress-ng --version
 stress-ng, version 0.11.14 (gcc 5.4, x86_64 Linux 4.4.0-185-generic) 💻🔥

 $ ./stress-ng --af-alg 0 --timeout 30m 2>&1 | tee ../stress-ng.log.xenial.mod
 stress-ng: info: [12286] dispatching hogs: 11 af-alg
 stress-ng: info: [12288] stress-ng-af-alg: 52 cryptographic algorithms found in /proc/crypto
 stress-ng: info: [12288] stress-ng-af-alg: 105 cryptographic algorithms max (with defconfigs)
 stress-ng: info: [12286] successful run completed in 1800.38s (30 mins, 0.38 secs)

Common to all test runs: stress-ng version,
and command to load as many crypto modules
as found/possible in the system.

$ ./stress-ng --version
stress-ng, version 0.11.14 (<build system details>) 💻🔥

$ sudo modprobe -a \
          $(modinfo \
              /lib/modules/$(uname -r)/kernel/crypto/*.ko \
              /lib/modules/$(uname -r)/kernel/arch/*/crypto/*.ko \
              | grep -ow 'crypto-.*')

[E/F/Unstable][PATCH 0/1] crypto: fix regression/use-after-free in af_alg_accept()
https://lists.ubuntu.com/archives/kernel-team/2020-June/111620.html

[E/F/Unstable][PATCH 1/1] crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()
https://lists.ubuntu.com/archives/kernel-team/2020-June/111621.html

[D][PATCH] crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()
https://lists.ubuntu.com/archives/kernel-team/2020-June/111622.html

[B][PATCH] crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()
https://lists.ubuntu.com/archives/kernel-team/2020-June/111623.html

[X][PATCH] crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()
https://lists.ubuntu.com/archives/kernel-team/2020-June/111624.html

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Eoan):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic

Verification done for bionic-proposed.

The reporter user confirmed that the organic reproducer (Varnish Cache Plus with the Crypto vmod) ran successfully over the weekend with the 4.15.0-114-generic kernel, to approximately 3 days (2d 20h runtime.)

The same workload used to trigger the bug with a few hours, so it does look good now.

tags: added: verification-done-bionic
removed: verification-needed-bionic

I've also ran stress-ng as in comment #1 (below) on 4 CPUs for 8 hours on X/B/F.

No signs of issues: it finishes successfully and no weird messages in the kernel logs.

$ sudo modprobe -a \
          $(modinfo \
              /lib/modules/$(uname -r)/kernel/crypto/*.ko \
              /lib/modules/$(uname -r)/kernel/arch/*/crypto/*.ko \
              | grep -ow 'crypto-.*')

$ ./stress-ng --version && ./stress-ng --af-alg 0 --timeout 8h 2>&1 | tee ../stress-ng.log.$(lsb_release -cs)-proposed

Focal
-----

$ uname -rv
5.4.0-43-generic #47-Ubuntu SMP Sat Aug 8 06:34:35 UTC 2020

$ cat stress-ng.log.focal-proposed
stress-ng: info: [11012] dispatching hogs: 4 af-alg
stress-ng: info: [11015] stress-ng-af-alg: 61 cryptographic algorithms found in /proc/crypto
stress-ng: info: [11015] stress-ng-af-alg: 101 cryptographic algorithms max (with defconfigs)
stress-ng: info: [11012] successful run completed in 28800.11s (8 hours, 0.11 secs)

Bionic
------

$ uname -rv
4.15.0-113-generic #114-Ubuntu SMP Sun Aug 9 07:27:58 UTC 2020

$ cat stress-ng.log.bionic-proposed
stress-ng: info: [10848] dispatching hogs: 4 af-alg
stress-ng: info: [10851] stress-ng-af-alg: 57 cryptographic algorithms found in /proc/crypto
stress-ng: info: [10851] stress-ng-af-alg: 105 cryptographic algorithms max (with defconfigs)
stress-ng: info: [10848] successful run completed in 28800.09s (8 hours, 0.09 secs)

Xenial
------

$ uname -rv
4.4.0-187-generic #217-Ubuntu SMP Tue Jul 21 04:18:15 UTC 2020

$ cat stress-ng.log.xenial-proposed
stress-ng: info: [12108] dispatching hogs: 4 af-alg
stress-ng: info: [12111] stress-ng-af-alg: 50 cryptographic algorithms found in /proc/crypto
stress-ng: info: [12111] stress-ng-af-alg: 105 cryptographic algorithms max (with defconfigs)
stress-ng: info: [12108] successful run completed in 28800.08s (8 hours, 0.08 secs)

Oh, and for "Eoan" (5.3/linux-hwe on Bionic), all good with stress-ng as well.

$ uname -rv
5.3.0-66-generic #60-Ubuntu SMP Tue Aug 11 08:42:43 UTC 2020

$ ./stress-ng --version && ./stress-ng --af-alg 0 --timeout 2h 2>&1 | tee ../stress-ng.log.eoan-bionic-proposed
stress-ng, version 0.11.14 (gcc 7.5, x86_64 Linux 5.3.0-66-generic) 💻🔥
stress-ng: info: [29537] dispatching hogs: 4 af-alg
stress-ng: info: [29539] stress-ng-af-alg: 66 cryptographic algorithms found in /proc/crypto
stress-ng: info: [29539] stress-ng-af-alg: 101 cryptographic algorithms max (with defconfigs)
stress-ng: info: [29537] successful run completed in 7200.06s (2 hours, 0.06 secs)

Brian Murray (brian-murray) wrote :

The Eoan Ermine has reached end of life, so this bug will not be fixed for that release

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Won't Fix
Launchpad Janitor (janitor) wrote :
Download full text (55.0 KiB)

This bug was fixed in the package linux - 4.15.0-115.116

---------------
linux (4.15.0-115.116) bionic; urgency=medium

  * bionic/linux: 4.15.0-115.116 -proposed tracker (LP: #1893055)

  * [Potential Regression] dscr_inherit_exec_test from powerpc in
    ubuntu_kernel_selftests failed on B/E/F (LP: #1888332)
    - powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

linux (4.15.0-114.115) bionic; urgency=medium

  * bionic/linux: 4.15.0-114.115 -proposed tracker (LP: #1891052)

  * ipsec: policy priority management is broken (LP: #1890796)
    - xfrm: policy: match with both mark and mask on user interfaces

linux (4.15.0-113.114) bionic; urgency=medium

  * bionic/linux: 4.15.0-113.114 -proposed tracker (LP: #1890705)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Reapply "usb: handle warm-reset port requests on hub resume" (LP: #1859873)
    - usb: handle warm-reset port requests on hub resume

  * Bionic update: upstream stable patchset 2020-07-29 (LP: #1889474)
    - gpio: arizona: handle pm_runtime_get_sync failure case
    - gpio: arizona: put pm_runtime in case of failure
    - pinctrl: amd: fix npins for uart0 in kerncz_groups
    - mac80211: allow rx of mesh eapol frames with default rx key
    - scsi: scsi_transport_spi: Fix function pointer check
    - xtensa: fix __sync_fetch_and_{and,or}_4 declarations
    - xtensa: update *pos in cpuinfo_op.next
    - drivers/net/wan/lapbether: Fixed the value of hard_header_len
    - net: sky2: initialize return of gm_phy_read
    - drm/nouveau/i2c/g94-: increase NV_PMGR_DP_AUXCTL_TRANSACTREQ timeout
    - irqdomain/treewide: Keep firmware node unconditionally allocated
    - SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO
      compeletion")
    - spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours
    - IB/umem: fix reference count leak in ib_umem_odp_get()
    - uprobes: Change handle_swbp() to send SIGTRAP with si_code=SI_KERNEL, to fix
      GDB regression
    - ALSA: info: Drop WARN_ON() from buffer NULL sanity check
    - ASoC: rt5670: Correct RT5670_LDO_SEL_MASK
    - btrfs: fix double free on ulist after backref resolution failure
    - btrfs: fix mount failure caused by race with umount
    - btrfs: fix page leaks after failure to lock page for delalloc
    - bnxt_en: Fix race when modifying pause settings.
    - hippi: Fix a size used in a 'pci_free_consistent()' in an error handling
      path
    - ax88172a: fix ax88172a_unbind() failures
    - net: dp83640: fix SIOCSHWTSTAMP to update the struct with actual
      configuration
    - drm: sun4i: hdmi: Fix inverted HPD result
    - net: smc91x: Fix possible memory leak in smc_drv_probe()
    - bonding: check error value of register_netdevice() immediately
    - mlxsw: destroy workqueue when trap_register in mlxsw_emad_init
    - ipvs: fix the connection sync failed in some cases
    - i2c: rcar: always clear ICSAR to avoid side effects
    - bonding: check return value of register_netdevice() in bond_newlink()
    - serial: exar: Fix GPIO configuration for Sealevel cards based on XR17V35X
    - scripts/decode_stacktrace: strip basepath from all paths
    - HID: i...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released

Marking X/F/G as Fix Released.

X/F got the patch via stable updates, thus no LP tag / bot messages.

Xenial version: 4.4.0-189.219
Focal version: 5.4.0-45.49
Groovy version: 5.8 and later.

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Groovy):
status: Won't Fix → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers