32-bit x86 kernel 4.15.0-50 crash in vmalloc_sync_all

Bug #1830433 reported by Mathieu Desnoyers on 2019-05-24
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Bionic
Medium
Unassigned

Bug Description

[Impact]

Commit d653420532d580156c8486686899ea6a9eeb7bf0 in bionic enabled kernel page table isolation for x86_32, but also introduced a kernel bug (the BUG_ON() condition in vmalloc_sync_one()) that seems to happen when vmalloc_sync_all() is called multiple times (e.g., in a busy loop).

The real problem seems to be a race condition with page-table entries' initialization that can be fixed applying the upstream commit 9bc4f28af75a91aea0ae383f50b0a430c4509303 ("x86/mm: Use WRITE_ONCE() when setting PTEs").

[Test Case]

The bug can be easily triggered by rebooting the system a couple of times and loading this module:

https://launchpadlibrarian.net/428142172/vmalloc-stress-test.c

[Fix]

The following upstream fix seems to resolve the problem:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9bc4f28af75a91aea0ae383f50b0a430c4509303

In addition to that the following other upstream fixes are required (all clean cherry picks) to do a cleaner backport of 9bc4f28af75a91aea0ae383f50b0a430c4509303:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=86fa949b050184ffc53688516a6a83ae5f98d08a
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=792adb90fa724ce07c0171cbc96b9215af4b1045
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=785a19f9d1dd8a4ab2d0633be4656653bd3de1fc
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f967db0b9ed44ec3057a28f3b28efc51df51b835
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f77084d96355f5fba8e2c1fb3a51a393b1570de7

[Regression Potential]

All upstream fixes, tested on the affected platform, backport changes are minimal.

[Original bug report]

Hi,

I'm reproducing a kernel bug in vmalloc_sync_all() with a 32-bit x86 kernel.

The problem appears in

Linux ubuntu 4.15.0-50-generic #54-Ubuntu SMP Mon May 6 18:45:45 UTC 2019 i686 i686 i686 GNU/Linux

Kernels 4.15.0-49 and prior work fine.
The kernel 4.18.0-20-generic works fine.
This problem has not been experienced with upstream Linux kernels.

It appears that invoking vmalloc_sync_all() a few times end up triggering this issue. This can be triggered by restarting the lttng-sessiond service with lttng-modules-dkms installed (sometimes a few restarts are needed to trigger the bug). This ends up unloading and reloading those modules, which issues a few vmalloc_sync_all() as side-effect.

I'm not reporting this issue with the "ubuntu-bug linux" command because it crashes the system on that kernel (system hangs, no console output).

My test system runs within a kvm virtual machine on a 64-bit host.

lsb release:

Description: Ubuntu 18.04.2 LTS
Release: 18.04

Information about my kernel:

linux-image-4.15.0-50-generic:
  Installed: 4.15.0-50.54
  Candidate: 4.15.0-50.54
  Version table:
 *** 4.15.0-50.54 500
        500 http://ca.archive.ubuntu.com/ubuntu bionic-updates/main i386 Packages
        500 http://security.ubuntu.com/ubuntu bionic-security/main i386 Packages
        100 /var/lib/dpkg/status

Information about lttng-modules-dkms:

lttng-modules-dkms:
  Installed: 2.10.5-1ubuntu1.2
  Candidate: 2.10.5-1ubuntu1.2
  Version table:
 *** 2.10.5-1ubuntu1.2 500
        500 http://ca.archive.ubuntu.com/ubuntu bionic-updates/universe i386 Packages
        100 /var/lib/dpkg/status
     2.10.5-1ubuntu1 500
        500 http://ca.archive.ubuntu.com/ubuntu bionic/universe i386 Packages

Mathieu Desnoyers (compudj) wrote :
Mathieu Desnoyers (compudj) wrote :
Mathieu Desnoyers (compudj) wrote :
Mathieu Desnoyers (compudj) wrote :
Mathieu Desnoyers (compudj) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: bionic
Michael Jeanson (mjeanson) wrote :

I built a minimal module that will trigger this bug when loaded/unloaded in a loop, usually happens in less than 30 seconds.

You can grab it here : https://github.com/mjeanson/boom

Build it with make and run the boom.sh script.

Andrea Righi (arighi) on 2019-06-13
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Bionic):
status: New → Confirmed
importance: Undecided → Medium
Andrea Righi (arighi) wrote :

This module in attach (similar to the one posted by @mjeanson) has been used as an effective reproducer for this bug. It looks like we need to reboot the system a couple of times and load this module to immediately trigger the bug.

Andrea Righi (arighi) on 2019-06-13
description: updated
Changed in linux (Ubuntu Bionic):
status: Confirmed → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic

I was not able to trigger the crash on linux-image-4.15.0-55-generic=4.15.0-55.60 from -proposed. Looks like it's fixed.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (11.2 KiB)

This bug was fixed in the package linux - 4.15.0-55.60

---------------
linux (4.15.0-55.60) bionic; urgency=medium

  * linux: 4.15.0-55.60 -proposed tracker (LP: #1834954)

  * Request backport of ceph commits into bionic (LP: #1834235)
    - ceph: use atomic_t for ceph_inode_info::i_shared_gen
    - ceph: define argument structure for handle_cap_grant
    - ceph: flush pending works before shutdown super
    - ceph: send cap releases more aggressively
    - ceph: single workqueue for inode related works
    - ceph: avoid dereferencing invalid pointer during cached readdir
    - ceph: quota: add initial infrastructure to support cephfs quotas
    - ceph: quota: support for ceph.quota.max_files
    - ceph: quota: don't allow cross-quota renames
    - ceph: fix root quota realm check
    - ceph: quota: support for ceph.quota.max_bytes
    - ceph: quota: update MDS when max_bytes is approaching
    - ceph: quota: add counter for snaprealms with quota
    - ceph: avoid iput_final() while holding mutex or in dispatch thread

  * QCA9377 isn't being recognized sometimes (LP: #1757218)
    - SAUCE: USB: Disable USB2 LPM at shutdown

  * hns: fix ICMP6 neighbor solicitation messages discard problem (LP: #1833140)
    - net: hns: fix ICMP6 neighbor solicitation messages discard problem
    - net: hns: fix unsigned comparison to less than zero

  * Fix occasional boot time crash in hns driver (LP: #1833138)
    - net: hns: Fix probabilistic memory overwrite when HNS driver initialized

  * use-after-free in hns_nic_net_xmit_hw (LP: #1833136)
    - net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()

  * hns: attempt to restart autoneg when disabled should report error
    (LP: #1833147)
    - net: hns: Restart autoneg need return failed when autoneg off

  * systemd 237-3ubuntu10.14 ADT test failure on Bionic ppc64el (test-seccomp)
    (LP: #1821625)
    - powerpc: sys_pkey_alloc() and sys_pkey_free() system calls
    - powerpc: sys_pkey_mprotect() system call

  * [UBUNTU] pkey: Indicate old mkvp only if old and curr. mkvp are different
    (LP: #1832625)
    - pkey: Indicate old mkvp only if old and current mkvp are different

  * [UBUNTU] kernel: Fix gcm-aes-s390 wrong scatter-gather list processing
    (LP: #1832623)
    - s390/crypto: fix gcm-aes-s390 selftest failures

  * System crashes on hot adding a core with drmgr command (4.15.0-48-generic)
    (LP: #1833716)
    - powerpc/numa: improve control of topology updates
    - powerpc/numa: document topology_updates_enabled, disable by default

  * Kernel modules generated incorrectly when system is localized to a non-
    English language (LP: #1828084)
    - scripts: override locale from environment when running recordmcount.pl

  * [UBUNTU] kernel: Fix wrong dispatching for control domain CPRBs
    (LP: #1832624)
    - s390/zcrypt: Fix wrong dispatching for control domain CPRBs

  * CVE-2019-11815
    - net: rds: force to destroy connection if t_sock is NULL in
      rds_tcp_kill_sock().

  * Sound device not detected after resume from hibernate (LP: #1826868)
    - drm/i915: Force 2*96 MHz cdclk on glk/cnl when audio power is enabled
    - drm/i915: Save the old CDCLK atomic state
...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers