ftrace in ubuntu_kernel_selftests hang with Cosmic kernel

Bug #1826385 reported by Po-Hsu Lin on 2019-04-25
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Undecided
Po-Hsu Lin
linux (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Po-Hsu Lin
Cosmic
Undecided
Po-Hsu Lin

Bug Description

== Justification ==
Running the ftrace in ubuntu_kernel_selftests repetitively against x86 Cosmic kernel will cause system hang.

When this happens, you won't be able to ssh into this system, and no log can be found in syslog.

This hang is caused by one of the sub-test: kprobe/multiple_kprobes

Masami's comment from upstream discussion (https://lkml.org/lkml/2018/12/3/1219):
In arch/x86/kernel/kprobes/opt.c, copy_optimized_instructions() does a copy loop, but only update src and dest cursors, but not update real address which is used for adjusting RIP relative instruction.

== Fix ==
43a1b0cb4 (kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction)

This patch is already in D.
For B/C, they all have this ill-commit 63fef14 and this patch can be cherry-picked. Note that for Bionic kernel it can only be triggered in this way with a kernel built with GCC-8.

Although it's a bit difficult to trigger this on Bionic, I think it worth this fix as it's quite straightforward.

For X, the ill-commit 63fef14 does not exist.

== Test ==
Test kernel for Cosmic and Bionic built with GCC-8:
http://people.canonical.com/~phlin/kernel/lp-1826385-ftrace-hang/

(To verify this for the Bionic, you will need to build a kernel with GCC-8.)

Patch tested with a bare-metal and a KVM node, both of them can pass the beating repetitively.

== Regression Potential ==
Low, upstream fix specific for kprobe and limited to x86 architecture.

== Original Bug Report ==
This issue is a bit strange.

The test has passed with Cosmic 4.18.0-18.19 generic kernel on AMD64 node during our SRU testing process.

https://pastebin.ubuntu.com/p/HN2vN6fCXn/

However, Tyler found that this test will hang after:
[30] Kretprobe dynamic event with maxactive [PASS]
[31] Register/unregister many kprobe events [PASS]

And this is 100% reproducible.

No relevant output in syslog.

This will need further investigation.

ProblemType: Bug
DistroRelease: Ubuntu 18.10
Package: linux-image-4.18.0-18-generic 4.18.0-18.19
ProcVersionSignature: User Name 4.18.0-18.19-generic 4.18.20
Uname: Linux 4.18.0-18-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Apr 25 07:16 seq
 crw-rw---- 1 root audio 116, 33 Apr 25 07:16 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.10-0ubuntu13.2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Thu Apr 25 07:18:51 2019
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
PciMultimedia:

ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.18.0-18-generic root=UUID=2f68c627-8ab4-40d5-8c06-6563436d0f96 ro
RelatedPackageVersions:
 linux-restricted-modules-4.18.0-18-generic N/A
 linux-backports-modules-4.18.0-18-generic N/A
 linux-firmware 1.175.3
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: Ubuntu-1.8.2-1ubuntu1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-xenial
dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-xenial:cvnQEMU:ct1:cvrpc-i440fx-xenial:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-xenial
dmi.sys.vendor: QEMU

Po-Hsu Lin (cypressyew) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Tyler Hicks (tyhicks) wrote :

I was able to verify that deleting ftrace/test.d/kprobe/multiple_kprobes.tc allows the remaining ftrace selftests (from the cosmic kernel tree) to pass. That suggests that the "Register/unregister many kprobe events" (multiple_kprobes.tc) test is the test that puts the system into a bad state.

Po-Hsu Lin (cypressyew) wrote :

Checked again on the KVM that I used yesterday for this:
  1. ftrace test finished as expected with upstream tree https://pastebin.ubuntu.com/p/QmQfspFFR5/
  2. ubuntu-cosmic master branch with tip at 2e8c30c5 (Ubuntu-4.18.0-18.19) will hang https://pastebin.ubuntu.com/p/69gHfNpMRF/
  3. ubuntu-cosmic master branch with tip at fc64292e (Ubuntu-4.18.0-17.18) will pass https://pastebin.ubuntu.com/p/pgjNvCVK5F/

However I didn't see any changes for ftrace test here between these two tags.

Po-Hsu Lin (cypressyew) wrote :

I think this patch can get this issue fixed on Cosmic:
https://github.com/torvalds/linux/commit/43a1b0cb4cd6dbfd3cd9c10da663368394d299d8
(found in this thread https://lkml.org/lkml/2018/12/3/1219)

Test kernel:
http://people.canonical.com/~phlin/kernel/lp-1826385-ftrace-hang/

Cosmic KVM and bare-metal passed the ftrace test with this test kernel.

The commit that fixed by this patch has already landed in Cosmic for a while, for now I can't tell why this was only gets triggered recently and why this is not failing in the last cycle. (The gcc version in the last cycle is already 8, 4:8.3.0-1ubuntu1.1 from -proposed.)

Po-Hsu Lin (cypressyew) wrote :

More tests on a bare-metal and a KVM node with 4.18.0-18 / -17 / -16 / -11 shows that:
* This issue can be reproduced on these kernels
* Bare metal system tend to pass with the ftrace test on the first run, but it will hang with the second / third attempt if you run it again
* KVM nodes tend to fail on the first attempt (3 out of 4 kernels), and if it has passed, the second attempt will reproduce this as well.

As the fact that the test tend to pass with the first attempt on bare metals, and we might just retry if has failed, I think all these combined leads to this situation.

For the patched kernel, I can re-run the ftrace test for 5 times without any interruptions on both of the bare-metal and the KVM node.

I will test Bionic next and prepare a SRU for Cosmic.

Po-Hsu Lin (cypressyew) on 2019-04-30
Changed in linux (Ubuntu Cosmic):
status: New → In Progress
assignee: nobody → Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
assignee: nobody → Po-Hsu Lin (cypressyew)
status: New → In Progress
Po-Hsu Lin (cypressyew) wrote :

Tried to reproduce this on Bionic with ubuntu-bionic source code and ubuntu-cosmic source code, but no luck.

I assume this has something to do with the tool chain, as mentioned in the mail thread this will get triggered after gcc upgraded from 7 to 8. Bionic comes with GCC-7

Po-Hsu Lin (cypressyew) on 2019-04-30
description: updated
Po-Hsu Lin (cypressyew) on 2019-04-30
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Po-Hsu Lin (cypressyew) wrote :

This can be triggered on Cosmic with a OEM kernel built with gcc-8:
$ cat /proc/version
Linux version 4.15.0-1036-oem (root@rumford) (gcc version 8.2.0 (Ubuntu 8.2.0-7ubuntu1)) #41 SMP Tue Apr 30 10:57:32 UTC 2019

I will see if I can verify this from this point.

Po-Hsu Lin (cypressyew) on 2019-05-07
description: updated
Po-Hsu Lin (cypressyew) on 2019-05-07
description: updated
Po-Hsu Lin (cypressyew) on 2019-05-07
Changed in linux (Ubuntu Bionic):
status: New → In Progress
description: updated
description: updated
Po-Hsu Lin (cypressyew) wrote :
Changed in linux (Ubuntu Bionic):
assignee: nobody → Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Po-Hsu Lin (cypressyew) on 2019-05-20
tags: added: verification-done-bionic
removed: verification-needed-bionic

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed-cosmic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-cosmic
Launchpad Janitor (janitor) wrote :
Download full text (3.8 KiB)

This bug was fixed in the package linux - 4.15.0-51.55

---------------
linux (4.15.0-51.55) bionic; urgency=medium

  * linux: 4.15.0-51.55 -proposed tracker (LP: #1829219)

  * disable a.out support (LP: #1818552)
    - [Config] Disable a.out support

  * [UBUNTU] qdio: clear intparm during shutdown (LP: #1828394)
    - s390/qdio: clear intparm during shutdown

  * ftrace in ubuntu_kernel_selftests hang with Cosmic kernel (LP: #1826385)
    - kprobes/x86: Fix instruction patching corruption when copying more than one
      RIP-relative instruction

  * touchpad not working on lenovo yoga 530 (LP: #1787775)
    - Revert "UBUNTU: SAUCE: i2c:amd Depends on ACPI"
    - Revert "UBUNTU: SAUCE: i2c:amd move out pointer in union i2c_event_base"
    - Revert "UBUNTU: SAUCE: i2c:amd I2C Driver based on PCI Interface for
      upcoming platform"
    - i2c: add helpers to ease DMA handling
    - i2c: add a message flag for DMA safe buffers
    - i2c: add extra check to safe DMA buffer helper
    - i2c: Add drivers for the AMD PCIe MP2 I2C controller
    - [Config] Update config for AMD MP2 I2C driver
    - [Config] Update I2C_AMD_MP2 annotations

  * tm-unavailable in powerpc/tm failed on Bionic Power9 (LP: #1813129)
    - selftests/powerpc: Check for pthread errors in tm-unavailable
    - selftests/powerpc: Skip tm-unavailable if TM is not enabled

  * cp_abort in powerpc/context_switch from ubunut_kernel_selftests failed on
    Bionic P9 (LP: #1813134)
    - selftests/powerpc: Remove redundant cp_abort test

  * bionic/linux: completely remove snapdragon files from sources (LP: #1827880)
    - [Packaging] remove snapdragon dead files
    - [Config] update configs after snapdragon removal

  * The noise keeps occurring when Headset is plugged in on a Dell machine
    (LP: #1827972)
    - ALSA: hda/realtek - Fixed Dell AIO speaker noise

  * Geneve tunnels don't work when ipv6 is disabled (LP: #1794232)
    - geneve: correctly handle ipv6.disable module parameter

  * There are 4 HDMI/Displayport audio output listed in sound setting without
    attach any HDMI/DP monitor (LP: #1827967)
    - ALSA: hda/hdmi - Read the pin sense from register when repolling
    - ALSA: hda/hdmi - Consider eld_valid when reporting jack event

  * Headphone jack switch sense is inverted: plugging in headphones disables
    headphone output (LP: #1824259)
    - ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board

  * CTAUTO:DevOps:860.50:devops4fp1:Error occurred during LINUX Dmesg error
    Checking for all LINUX clients for devops4p10 (LP: #1766201)
    - SAUCE: integrity: downgrade error to warning

  * Screen freeze after resume from S3 when HDMI monitor plugged on Dell
    Precision 7740 (LP: #1825958)
    - PCI: Restore resized BAR state on resume

  * potential memory corruption on arm64 on dev release (LP: #1827437)
    - driver core: Postpone DMA tear-down until after devres release

  * powerpc/pmu/ebb test in ubuntu_kernel_selftest failed with "error while
    loading shared libraries" on Bionic/Cosmic PowerPC (LP: #1812805)
    - selftests/powerpc/pmu: Link ebb tests with -no-pie

  * unnecessary request_queue freeze (LP: #1815733)
    - block: av...

Read more...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.18.0-21.22

---------------
linux (4.18.0-21.22) cosmic; urgency=medium

  * linux: 4.18.0-21.22 -proposed tracker (LP: #1829186)

  * disable a.out support (LP: #1818552)
    - [Config] Turn off a.out support

  * ftrace in ubuntu_kernel_selftests hang with Cosmic kernel (LP: #1826385)
    - kprobes/x86: Fix instruction patching corruption when copying more than one
      RIP-relative instruction

  * touchpad not working on lenovo yoga 530 (LP: #1787775)
    - Revert "UBUNTU: SAUCE: i2c:amd Depends on ACPI"
    - Revert "UBUNTU: SAUCE: i2c:amd move out pointer in union i2c_event_base"
    - Revert "UBUNTU: SAUCE: i2c:amd I2C Driver based on PCI Interface for
      upcoming platform"
    - i2c: add extra check to safe DMA buffer helper
    - i2c: Add drivers for the AMD PCIe MP2 I2C controller
    - [Config] Update config for AMD MP2 I2C driver
    - [Config] Update I2C_AMD_MP2 annotations

  * Geneve tunnels don't work when ipv6 is disabled (LP: #1794232)
    - geneve: correctly handle ipv6.disable module parameter

  * There are 4 HDMI/Displayport audio output listed in sound setting without
    attach any HDMI/DP monitor (LP: #1827967)
    - ALSA: hda/hdmi - Read the pin sense from register when repolling
    - ALSA: hda/hdmi - Consider eld_valid when reporting jack event

  * Headphone jack switch sense is inverted: plugging in headphones disables
    headphone output (LP: #1824259)
    - ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board

  * CTAUTO:DevOps:860.50:devops4fp1:Error occurred during LINUX Dmesg error
    Checking for all LINUX clients for devops4p10 (LP: #1766201)
    - SAUCE: integrity: downgrade error to warning

  * potential memory corruption on arm64 on dev release (LP: #1827437)
    - driver core: Postpone DMA tear-down until after devres release

  * powerpc/pmu/ebb test in ubuntu_kernel_selftest failed with "error while
    loading shared libraries" on Bionic/Cosmic PowerPC (LP: #1812805)
    - selftests/powerpc/pmu: Link ebb tests with -no-pie

  * unnecessary request_queue freeze (LP: #1815733)
    - block: avoid setting nr_requests to current value
    - block: avoid setting none scheduler if it's already none

  * Kprobe event string type argument failed in ftrace from
    ubuntu_kernel_selftests on B/C i386 (LP: #1825780)
    - selftests/ftrace: Fix kprobe string testcase to not probe notrace function

  * False positive test result in run_netsocktests from net in
    ubuntu_kernel_selftest (LP: #1825777)
    - selftests/net: correct the return value for run_netsocktests

 -- Stefan Bader <email address hidden> Wed, 15 May 2019 13:18:36 +0200

Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Po-Hsu Lin (cypressyew) on 2019-06-05
tags: added: verification-done-cosmic
removed: verification-needed-cosmic
Changed in ubuntu-kernel-tests:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers