32-bit x86 kernel 4.15.0-50 crash in vmalloc_sync_all

Bug #1830433 reported by Mathieu Desnoyers
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned
Bionic
Fix Released
Medium
Unassigned

Bug Description

[Impact]

Commit d653420532d580156c8486686899ea6a9eeb7bf0 in bionic enabled kernel page table isolation for x86_32, but also introduced a kernel bug (the BUG_ON() condition in vmalloc_sync_one()) that seems to happen when vmalloc_sync_all() is called multiple times (e.g., in a busy loop).

The real problem seems to be a race condition with page-table entries' initialization that can be fixed applying the upstream commit 9bc4f28af75a91aea0ae383f50b0a430c4509303 ("x86/mm: Use WRITE_ONCE() when setting PTEs").

[Test Case]

The bug can be easily triggered by rebooting the system a couple of times and loading this module:

https://launchpadlibrarian.net/428142172/vmalloc-stress-test.c

[Fix]

The following upstream fix seems to resolve the problem:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9bc4f28af75a91aea0ae383f50b0a430c4509303

In addition to that the following other upstream fixes are required (all clean cherry picks) to do a cleaner backport of 9bc4f28af75a91aea0ae383f50b0a430c4509303:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=86fa949b050184ffc53688516a6a83ae5f98d08a
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=792adb90fa724ce07c0171cbc96b9215af4b1045
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=785a19f9d1dd8a4ab2d0633be4656653bd3de1fc
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f967db0b9ed44ec3057a28f3b28efc51df51b835
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f77084d96355f5fba8e2c1fb3a51a393b1570de7

[Regression Potential]

All upstream fixes, tested on the affected platform, backport changes are minimal.

[Original bug report]

Hi,

I'm reproducing a kernel bug in vmalloc_sync_all() with a 32-bit x86 kernel.

The problem appears in

Linux ubuntu 4.15.0-50-generic #54-Ubuntu SMP Mon May 6 18:45:45 UTC 2019 i686 i686 i686 GNU/Linux

Kernels 4.15.0-49 and prior work fine.
The kernel 4.18.0-20-generic works fine.
This problem has not been experienced with upstream Linux kernels.

It appears that invoking vmalloc_sync_all() a few times end up triggering this issue. This can be triggered by restarting the lttng-sessiond service with lttng-modules-dkms installed (sometimes a few restarts are needed to trigger the bug). This ends up unloading and reloading those modules, which issues a few vmalloc_sync_all() as side-effect.

I'm not reporting this issue with the "ubuntu-bug linux" command because it crashes the system on that kernel (system hangs, no console output).

My test system runs within a kvm virtual machine on a 64-bit host.

lsb release:

Description: Ubuntu 18.04.2 LTS
Release: 18.04

Information about my kernel:

linux-image-4.15.0-50-generic:
  Installed: 4.15.0-50.54
  Candidate: 4.15.0-50.54
  Version table:
 *** 4.15.0-50.54 500
        500 http://ca.archive.ubuntu.com/ubuntu bionic-updates/main i386 Packages
        500 http://security.ubuntu.com/ubuntu bionic-security/main i386 Packages
        100 /var/lib/dpkg/status

Information about lttng-modules-dkms:

lttng-modules-dkms:
  Installed: 2.10.5-1ubuntu1.2
  Candidate: 2.10.5-1ubuntu1.2
  Version table:
 *** 2.10.5-1ubuntu1.2 500
        500 http://ca.archive.ubuntu.com/ubuntu bionic-updates/universe i386 Packages
        100 /var/lib/dpkg/status
     2.10.5-1ubuntu1 500
        500 http://ca.archive.ubuntu.com/ubuntu bionic/universe i386 Packages

Revision history for this message
Mathieu Desnoyers (compudj) wrote :
Revision history for this message
Mathieu Desnoyers (compudj) wrote :
Revision history for this message
Mathieu Desnoyers (compudj) wrote :
Revision history for this message
Mathieu Desnoyers (compudj) wrote :
Revision history for this message
Mathieu Desnoyers (compudj) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: bionic
Revision history for this message
Michael Jeanson (mjeanson) wrote :

I built a minimal module that will trigger this bug when loaded/unloaded in a loop, usually happens in less than 30 seconds.

You can grab it here : https://github.com/mjeanson/boom

Build it with make and run the boom.sh script.

Andrea Righi (arighi)
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Bionic):
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Andrea Righi (arighi) wrote :

This module in attach (similar to the one posted by @mjeanson) has been used as an effective reproducer for this bug. It looks like we need to reboot the system a couple of times and load this module to immediately trigger the bug.

Andrea Righi (arighi)
description: updated
Changed in linux (Ubuntu Bionic):
status: Confirmed → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
NorthSec Infrastructure bot (northsec-bot) wrote :

I was not able to trigger the crash on linux-image-4.15.0-55-generic=4.15.0-55.60 from -proposed. Looks like it's fixed.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (11.2 KiB)

This bug was fixed in the package linux - 4.15.0-55.60

---------------
linux (4.15.0-55.60) bionic; urgency=medium

  * linux: 4.15.0-55.60 -proposed tracker (LP: #1834954)

  * Request backport of ceph commits into bionic (LP: #1834235)
    - ceph: use atomic_t for ceph_inode_info::i_shared_gen
    - ceph: define argument structure for handle_cap_grant
    - ceph: flush pending works before shutdown super
    - ceph: send cap releases more aggressively
    - ceph: single workqueue for inode related works
    - ceph: avoid dereferencing invalid pointer during cached readdir
    - ceph: quota: add initial infrastructure to support cephfs quotas
    - ceph: quota: support for ceph.quota.max_files
    - ceph: quota: don't allow cross-quota renames
    - ceph: fix root quota realm check
    - ceph: quota: support for ceph.quota.max_bytes
    - ceph: quota: update MDS when max_bytes is approaching
    - ceph: quota: add counter for snaprealms with quota
    - ceph: avoid iput_final() while holding mutex or in dispatch thread

  * QCA9377 isn't being recognized sometimes (LP: #1757218)
    - SAUCE: USB: Disable USB2 LPM at shutdown

  * hns: fix ICMP6 neighbor solicitation messages discard problem (LP: #1833140)
    - net: hns: fix ICMP6 neighbor solicitation messages discard problem
    - net: hns: fix unsigned comparison to less than zero

  * Fix occasional boot time crash in hns driver (LP: #1833138)
    - net: hns: Fix probabilistic memory overwrite when HNS driver initialized

  * use-after-free in hns_nic_net_xmit_hw (LP: #1833136)
    - net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()

  * hns: attempt to restart autoneg when disabled should report error
    (LP: #1833147)
    - net: hns: Restart autoneg need return failed when autoneg off

  * systemd 237-3ubuntu10.14 ADT test failure on Bionic ppc64el (test-seccomp)
    (LP: #1821625)
    - powerpc: sys_pkey_alloc() and sys_pkey_free() system calls
    - powerpc: sys_pkey_mprotect() system call

  * [UBUNTU] pkey: Indicate old mkvp only if old and curr. mkvp are different
    (LP: #1832625)
    - pkey: Indicate old mkvp only if old and current mkvp are different

  * [UBUNTU] kernel: Fix gcm-aes-s390 wrong scatter-gather list processing
    (LP: #1832623)
    - s390/crypto: fix gcm-aes-s390 selftest failures

  * System crashes on hot adding a core with drmgr command (4.15.0-48-generic)
    (LP: #1833716)
    - powerpc/numa: improve control of topology updates
    - powerpc/numa: document topology_updates_enabled, disable by default

  * Kernel modules generated incorrectly when system is localized to a non-
    English language (LP: #1828084)
    - scripts: override locale from environment when running recordmcount.pl

  * [UBUNTU] kernel: Fix wrong dispatching for control domain CPRBs
    (LP: #1832624)
    - s390/zcrypt: Fix wrong dispatching for control domain CPRBs

  * CVE-2019-11815
    - net: rds: force to destroy connection if t_sock is NULL in
      rds_tcp_kill_sock().

  * Sound device not detected after resume from hibernate (LP: #1826868)
    - drm/i915: Force 2*96 MHz cdclk on glk/cnl when audio power is enabled
    - drm/i915: Save the old CDCLK atomic state
...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Patch available in Focal, mark this as fix-released.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.