vm-segv from ubuntu_stress_smoke_test failed on B

Bug #1864063 reported by Po-Hsu Lin on 2020-02-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
Medium
Kleber Sacilotto de Souza
ubuntu-kernel-tests
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned

Bug Description

Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1
Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe generic kernel

Test hang with vm-segv:
05:58:36 DEBUG| [stdout] vm-addr PASSED
05:58:36 DEBUG| [stdout] vm-rw STARTING
05:58:41 DEBUG| [stdout] vm-rw RETURNED 0
05:58:41 DEBUG| [stdout] vm-rw PASSED
05:58:41 DEBUG| [stdout] vm-segv STARTING
+ ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive
+ scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@onibi:kernel-test-results /var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive

CVE References

Po-Hsu Lin (cypressyew) on 2020-02-20
tags: added: 4.15 amd64 bionic sru-20200217 ubuntu-stress-smoke-test xenial

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1864063

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Po-Hsu Lin (cypressyew) on 2020-02-20
description: updated
Po-Hsu Lin (cypressyew) wrote :

BTW this test has passed on another node "kili" on 4.15.0-89.89~16.04.1

Colin Ian King (colin-king) wrote :

Do you have any info on the number of CPUs, memory and swap size of onibi? I can then see if I can reproduce the issue. Or better, access to onibi would be most helpful to see if I can repro this issue.

Changed in stress-ng:
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Colin Ian King (colin-king) wrote :

I can't reproduce this on the systems I'm using. Can I get access to onibi to try and reproduce this issue?

Changed in stress-ng:
status: In Progress → Incomplete
Colin Ian King (colin-king) wrote :

I spoke too soon. I was able to trip this 4.15.0-89.89 but not 4.15.0-88. So this looks like a regression.

Colin Ian King (colin-king) wrote :

To reproduce (on an 8 CPU VM):

sudo apt-get update && sudo apt-get dist-upgrade
sudo apt-get build-dep stress-ng
git clone git://kernel.ubuntu.com/cking/stress-ng
cd stress-ng
make
sudo ./stress-ng --vm-segv 0 -t 10 -v

Comment out a ptrace line and rebuild and re-run and the hang does not occur. So it's ptrace releated.

diff --git a/stress-vm-segv.c b/stress-vm-segv.c
index 39e4cbeb..54d590cd 100644
--- a/stress-vm-segv.c
+++ b/stress-vm-segv.c
@@ -129,7 +129,7 @@ kill_child:
    stress_process_dumpable(false);

 #if defined(HAVE_PTRACE)
- (void)ptrace(PTRACE_TRACEME);
+ //(void)ptrace(PTRACE_TRACEME);
    kill(getpid(), SIGSTOP);
 #endif
    (void)sigemptyset(&set);

Changed in stress-ng:
assignee: Colin Ian King (colin-king) → Kleber Sacilotto de Souza (kleber-souza)
Khaled El Mously (kmously) wrote :

The offending commit appears to be:

5b9276f0312a apparmor: don't try to replace stale label in ptrace access check

Which isn't surprising (it's both ptrace and apparmor related). Note however that I did not encounter the log-flooding that was seen in the ADT testing. The only symptom I encountered was an almost-immediate hard kernel lockup. Nothing seen in the logs.

Changed in linux (Ubuntu Bionic):
status: New → Confirmed
Colin Ian King (colin-king) wrote :

I confirm in my testing I get a hard kernel lockup with no log output.

Changed in linux (Ubuntu):
status: Incomplete → Invalid

On a arm64 node we had some more useful console logs:

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic/bionic/arm64/l/linux/20200223_153549_99119@/log.gz

[ 7400.666431] rcu_sched kthread starved for 240027 jiffies! g460241 c460240 f0x2 RCU_GP_WAIT_FQS(3)
 ->state=0x0 ->cpu=1
[ 7426.939201] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [stress-ng-vm-se:29260]
[ 7427.941599] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [stress-ng-vm-se:28674]
[ 7580.664232] INFO: rcu_sched self-detected stall on CPU
[ 7580.666689] 3-...!: (284881 ticks this GP) idle=cfa/140000000000001/0 softirq=1720701/1720703 fq
s=0
[ 7580.671124] (t=285031 jiffies g=460241 c=460240 q=979)
[ 7580.672223] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 7580.673660] rcu_sched kthread starved for 285032 jiffies! g460241 c460240 f0x2 RCU_GP_WAIT_FQS(3)
 ->state=0x0 ->cpu=1
[ 7580.675893] 3-...!: (284881 ticks this GP) idle=cfa/140000000000001/0 softirq=1720701/1720703 fqs=0
[ 7580.685152]
[ 7580.686074] rcu_sched kthread starved for 285035 jiffies! g460241 c460240 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1
[ 7606.937596] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [stress-ng-vm-se:29260]
[ 7607.947991] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [stress-ng-vm-se:29259]

Changed in linux (Ubuntu Bionic):
status: Confirmed → In Progress

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

Confirmed this is now fixed with bionic/linux 4.15.0-90.91:

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic/bionic/amd64/l/linux/20200227_101305_5debc@/log.gz

-------------------------------
10:00:50 DEBUG| [stdout] vm-segv STARTING
10:00:55 DEBUG| [stdout] vm-segv RETURNED 0
10:00:55 DEBUG| [stdout] vm-segv PASSED
-------------------------------

tags: added: verification-done-bionic
removed: verification-needed-bionic
Changed in stress-ng:
status: Incomplete → Invalid
Sean Feole (sfeole) on 2020-02-27
Changed in ubuntu-kernel-tests:
status: New → Triaged
Launchpad Janitor (janitor) wrote :
Download full text (44.4 KiB)

This bug was fixed in the package linux - 4.15.0-91.92

---------------
linux (4.15.0-91.92) bionic; urgency=medium

  * bionic/linux: 4.15.0-91.92 -proposed tracker (LP: #1865109)

  * CVE-2020-2732
    - KVM: x86: emulate RDPID
    - KVM: nVMX: Don't emulate instructions in guest mode
    - KVM: nVMX: Refactor IO bitmap checks into helper function
    - KVM: nVMX: Check IO instruction VM-exit conditions

linux (4.15.0-90.91) bionic; urgency=medium

  * bionic/linux: 4.15.0-90.91 -proposed tracker (LP: #1864753)

  * dkms artifacts may expire from the pool (LP: #1850958)
    - [Packaging] autoreconstruct -- manage executable debian files
    - [packaging] handle downloads from the librarian better

linux (4.15.0-90.90) bionic; urgency=medium

  * bionic/linux: 4.15.0-90.90 -proposed tracker (LP: #1864753)

  * vm-segv from ubuntu_stress_smoke_test failed on B (LP: #1864063)
    - Revert "apparmor: don't try to replace stale label in ptrace access check"

linux (4.15.0-89.89) bionic; urgency=medium

  * bionic/linux: 4.15.0-89.89 -proposed tracker (LP: #1863350)

  * [SRU][B/OEM-B] Fix multitouch support on some devices (LP: #1862567)
    - HID: core: move the dynamic quirks handling in core
    - HID: quirks: move the list of special devices into a quirk
    - HID: core: move the list of ignored devices in hid-quirks.c
    - HID: core: remove the absolute need of hid_have_special_driver[]

  * [linux] Patch to prevent possible data corruption (LP: #1848739)
    - blk-mq: silence false positive warnings in hctx_unlock()

  * Add bpftool to linux-tools-common (LP: #1774815)
    - tools/bpftool: fix bpftool build with bintutils >= 2.9
    - bpftool: make libbfd optional
    - [Debian] Remove binutils-dev build dependency
    - [Debian] package bpftool in linux-tools-common

  * Root can lift kernel lockdown via USB/IP (LP: #1861238)
    - Revert "UBUNTU: SAUCE: (efi-lockdown) Add a SysRq option to lift kernel
      lockdown"

  * [Bionic] i915 incomplete fix for CVE-2019-14615 (LP: #1862840) //
    CVE-2020-8832
    - drm/i915: Use same test for eviction and submitting kernel context
    - drm/i915: Define an engine class enum for the uABI
    - drm/i915: Force the switch to the i915->kernel_context
    - drm/i915: Move GT powersaving init to i915_gem_init()
    - drm/i915: Move intel_init_clock_gating() to i915_gem_init()
    - drm/i915: Inline intel_modeset_gem_init()
    - drm/i915: Mark the context state as dirty/written
    - drm/i915: Record the default hw state after reset upon load

  * Bionic update: upstream stable patchset 2020-02-12 (LP: #1863019)
    - xfs: Sanity check flags of Q_XQUOTARM call
    - mfd: intel-lpss: Add default I2C device properties for Gemini Lake
    - powerpc/archrandom: fix arch_get_random_seed_int()
    - tipc: fix wrong timeout input for tipc_wait_for_cond()
    - mt7601u: fix bbp version check in mt7601u_wait_bbp_ready
    - crypto: sun4i-ss - fix big endian issues
    - drm/sti: do not remove the drm_bridge that was never added
    - drm/virtio: fix bounds check in virtio_gpu_cmd_get_capset()
    - ALSA: hda: fix unused variable warning
    - apparmor: don't try to replace stale label in ptrace access chec...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Po-Hsu Lin (cypressyew) on 2020-03-27
Changed in ubuntu-kernel-tests:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers