risc-v 5.8 kernel oops on ftrace tests

Bug #1894613 reported by Colin Ian King
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
High
Unassigned
linux (Ubuntu)
High
Colin Ian King
Groovy
High
Colin Ian King

Bug Description

== SRU Groovy ==

Running the ftrace self tests results in null pointer dereference oops on RISC-V and also on ARM64.

== Fix ==

Upstream commit https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=4230e2deaa484b385aa01d598b2aea8e7f2660a6

== Testcase ==

Run the kernel ftrace selftest. Without the fix ftrace oopses on RISC-V and ARM64 and can also hang on ARM64 too. With the fix, tests run without oopsing or hanging.

== Regression Potential ==

This fix marks two functions as notrace, so the functionality of the functions is not actually altered so the risk is negligible. If there was a change in behaviour then RCU and stop machine operations will break causing machine hangs. We don't observe this and RCU is used heavily in the kernel so the code appears to not change the behaviour as expected.

The only change is the the functions are no longer traceable via ftrace, which is the desired operation.

-----------------

5.8.0-1-generic (buildd@riscv64-qemu-lcy01-015) (gcc (Ubuntu 10.2.0-5ubuntu2) 10.2.0, GNU ld (GNU Binutils for Ubuntu) 2.35) #1-Ubuntu SMP Thu Aug 27 19:51:38 UTC 2020 (Ubuntu 5.8.0-1.1-generic 5.8.4

18:30:06 DEBUG| [stdout] # selftests: ftrace: ftracetest^M^M
18:30:07 DEBUG| [stdout] # === Ftrace unit tests ===^M^M
18:30:10 DEBUG| [stdout] # [1] Basic trace file check [PASS]^M^M
[17433.113458] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000^M^M
[17433.113533] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000^M^M
[17433.113552] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000^M^M
[17433.113573] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000^M^M
[17433.113591] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000^M^M
[17433.114290] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000^M^M
[17433.114306] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000^M^M
[17433.114315] Oops [#1]^M^M
[17433.114630] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000^M^M
[17433.114690] Modules linked in: virtio_rng binfmt_misc uio_pdrv_genirq uio drm sch_fq_codel drm_panel_orientation_quirks backlight ip_tables x_tables autofs4 virtio_net net_failover virtio_blk failover [last unloaded: signpost]^M^M
[17433.115296] CPU: 1 PID: 15 Comm: migration/1 Tainted: G W OE 5.8.0-1-generic #1-Ubuntu^M^M
[17433.115419] epc: 0000000000000000 ra : 0000000000000000 sp : ffffffe1f5c67d90^M^M
[17433.115442] gp : ffffffe001722298 tp : ffffffe1f5c3ae00 t0 : 0000000000000000^M^M
[17433.115459] t1 : 0000000000006000 t2 : 00000000000bbc00 s0 : 0000000000000022^M^M
[17433.115475] s1 : ffffffe0002b7c12 a0 : ffffffe000963a64 a1 : 0000000000000022^M^M
[17433.115491] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000^M^M
[17433.115507] a5 : ffffffe1fec95580 a6 : 00000000000000ff a7 : 0000000000000001^M^M
[17433.115523] s2 : 0000000000000001 s3 : ffffffe00009d580 s4 : ffffffe001724210^M^M
[17433.115540] s5 : ffffffe1fec9a3b8 s6 : ffffffffffffffff s7 : 0000000000000001^M^M
[17433.115556] s8 : ffffffe0016f07cb s9 : ffffffe1e909bb80 s10: ffffffe0002b7ba6^M^M
[17433.115573] s11: ffffffe1e909bba8 t3 : 000000000000006c t4 : 00000000002c73ba^M^M
[17433.115586] t5 : 00000000001f7fa8 t6 : ffffffe000c02d1c^M^M
[17433.115603] status: 0000000000000120 badaddr: 0000000000000000 cause: 000000000000000c^M^M

CVE References

Changed in linux (Ubuntu):
importance: Undecided → High
Changed in ubuntu-kernel-tests:
importance: Undecided → High
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1894613

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: groovy
Revision history for this message
Colin Ian King (colin-king) wrote :

This has been a pain to track down on the emulator. In linux/tools/testing/selftests/ftrace run with sudo ftrace -vvv

function_graph ftrace test basic2.tc trips the oops on 5.8, 5.4.0-30-generic is OK though, so this is a regression

Changed in linux (Ubuntu):
assignee: nobody → Colin Ian King (colin-king)
status: Incomplete → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

regression between 5.6 (ok) and 5.7 (crashes)

Revision history for this message
Colin Ian King (colin-king) wrote :

git bisect good
cfafe260137418d0265d0df3bb18dc494af2b43e is the first bad commit
commit cfafe260137418d0265d0df3bb18dc494af2b43e
Author: Atish Patra <email address hidden>
Date: Tue Mar 17 18:11:43 2020 -0700

    RISC-V: Add supported for ordered booting method using HSM

    Currently, all harts have to jump Linux in RISC-V. This complicates the
    multi-stage boot process as every transient stage also has to ensure all
    harts enter to that stage and jump to Linux afterwards. It also obstructs
    a clean Kexec implementation.

    SBI HSM extension provides alternate solutions where only a single hart
    need to boot and enter Linux. The booting hart can bring up secondary
    harts one by one afterwards.

    Add SBI HSM based cpu_ops that implements an ordered booting method in
    RISC-V. This change is also backward compatible with older firmware not
    implementing HSM extension. If a latest kernel is used with older
    firmware, it will continue to use the default spinning booting method.

    Signed-off-by: Atish Patra <email address hidden>
    Reviewed-by: Anup Patel <email address hidden>
    Signed-off-by: Palmer Dabbelt <email address hidden>

 arch/riscv/kernel/Makefile | 3 ++
 arch/riscv/kernel/cpu_ops.c | 10 ++++-
 arch/riscv/kernel/cpu_ops_sbi.c | 81 +++++++++++++++++++++++++++++++++++++++++
 arch/riscv/kernel/head.S | 26 +++++++++++++
 arch/riscv/kernel/smpboot.c | 2 +-
 arch/riscv/kernel/traps.c | 2 +-
 6 files changed, 121 insertions(+), 3 deletions(-)
 create mode 100644 arch/riscv/kernel/cpu_ops_sbi.c

description: updated
description: updated
description: updated
Revision history for this message
Colin Ian King (colin-king) wrote :

Fix: commit 4230e2deaa484b385aa01d598b2aea8e7f2660a6 from https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

Stefan Bader (smb)
Changed in linux (Ubuntu Groovy):
assignee: nobody → Colin Ian King (colin-king)
importance: Undecided → High
status: New → In Progress
Ian May (ian-may)
Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
Revision history for this message
Colin Ian King (colin-king) wrote :
Download full text (6.5 KiB)

tested against 5.8 ftrace tests: NO crashing now, fixed.

cking@riscv64:~/linux/tools/testing/selftests/ftrace$ sudo ./ftracetest
=== Ftrace unit tests ===
[1] Basic trace file check [PASS]
[2] Basic test for tracers [PASS]
[3] Basic trace clock test [PASS]
[4] Basic event tracing check [PASS]
[5] Change the ringbuffer size [PASS]
[6] Snapshot and tracing setting [PASS]
[7] trace_pipe and trace_marker [PASS]
[8] Test ftrace direct functions against tracers [UNRESOLVED]
[9] Test ftrace direct functions against kprobes [UNSUPPORTED]
[10] Generic dynamic event - add/remove kprobe events [FAIL]
[11] Generic dynamic event - add/remove synthetic events [UNSUPPORTED]
[12] Generic dynamic event - selective clear (compatibility) [UNSUPPORTED]
[13] Generic dynamic event - generic clear event [UNSUPPORTED]
[14] event tracing - enable/disable with event level files [PASS]
[15] event tracing - restricts events based on pid notrace filtering [PASS]
[16] event tracing - restricts events based on pid [PASS]
[17] event tracing - enable/disable with subsystem level files [PASS]
[18] event tracing - enable/disable with top level files [PASS]
[19] Test trace_printk from module [UNRESOLVED]
[20] ftrace - function graph filters with stack tracer [PASS]
[21] ftrace - function graph filters [PASS]
[22] ftrace - function glob filters [PASS]
[23] ftrace - function pid notrace filters [PASS]
[24] ftrace - function pid filters [PASS]
[25] ftrace - stacktrace filter command [PASS]
[26] ftrace - function trace with cpumask [PASS]
[27] ftrace - test for function event triggers [PASS]
[28] ftrace - function trace on module [UNRESOLVED]
[29] ftrace - function profiling [PASS]
[30] ftrace - function profiler with function tracing [PASS]
[31] ftrace - test reading of set_ftrace_filter [PASS]
[32] ftrace - Max stack tracer [PASS]
[33] ftrace - test for function traceon/off triggers [PASS]
[34] ftrace - test tracing error log support [PASS]
[35] Test creation and deletion of trace instances while setting an event [PASS]
[36] Test creation and deletion of trace instances [PASS]
[37] Kprobe dynamic event - adding and removing [UNSUPPORTED]
[38] Kprobe dynamic event - busy event check [UNSUPPORTED]
[39] Kprobe dynamic event with arguments [UNSUPPORTED]
[40] Kprobe event with comm arguments [UNSUPPORTED]
[41] Kprobe event string type argument [UNSUPPORTED]
[42] Kprobe event symbol argument [UNSUPPORTED]
[43] Kprobe event argument syntax [UNSUPPORTED]
[44] Kprobes event arguments with types [UNSUPPORTED]
[45] Kprobe event user-memory access [UNSUPPORTED]
[46] Kprobe event auto/manual naming [UNSUPPORTED]
[47] Kprobe dynamic event with function tracer [UNSUPPORTED]
[48] Kprobe dynamic event - probing module [UNSUPPORTED]
[49] Create/delete multiprobe on kprobe event [UNSUPPORTED]
[50] Kprobe event parser error log check [UNSUPPORTED]
[51] Kretprobe dynamic event with arguments [UNSUPPORTED]
[52] Kretprobe dynamic event with maxactive [UNSUPPORTED]
[53] Register/unregister many kprobe events [UNSUPPORTED]
[54] Kprobe events - probe points [UNSUPPORTED]
[55] Kprobe dynamic event - adding and removing [UNSUPPORTED]
[56] Uprobe event parser error log check [UNSUPPORTED]
[57] test for the...

Read more...

tags: added: verification-done-groovy
removed: verification-needed-groovy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (50.5 KiB)

This bug was fixed in the package linux - 5.8.0-31.33

---------------
linux (5.8.0-31.33) groovy; urgency=medium

  * groovy/linux: 5.8.0-31.33 -proposed tracker (LP: #1905299)

  * Groovy 5.8 kernel hangs on boot on CPUs with eLLC (LP: #1903397)
    - drm/i915: Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup
      during fbdev init

  * CVE-2020-4788
    - selftests/powerpc: rfi_flush: disable entry flush if present
    - powerpc/64s: flush L1D on kernel entry
    - powerpc/64s: flush L1D after user accesses
    - selftests/powerpc: entry flush test

linux (5.8.0-30.32) groovy; urgency=medium

  * groovy/linux: 5.8.0-30.32 -proposed tracker (LP: #1903194)

  * Update kernel packaging to support forward porting kernels (LP: #1902957)
    - [Debian] Update for leader included in BACKPORT_SUFFIX

  * Avoid double newline when running insertchanges (LP: #1903293)
    - [Packaging] insertchanges: avoid double newline

  * EFI: Fails when BootCurrent entry does not exist (LP: #1899993)
    - efivarfs: Replace invalid slashes with exclamation marks in dentries.

  * raid10: Block discard is very slow, causing severe delays for mkfs and
    fstrim operations (LP: #1896578)
    - md: add md_submit_discard_bio() for submitting discard bio
    - md/raid10: extend r10bio devs to raid disks
    - md/raid10: pull codes that wait for blocked dev into one function
    - md/raid10: improve raid10 discard request
    - md/raid10: improve discard request for far layout
    - dm raid: fix discard limits for raid1 and raid10
    - dm raid: remove unnecessary discard limits for raid10

  * Bionic: btrfs: kernel BUG at /build/linux-
    eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254)
    - btrfs: extent_io: do extra check for extent buffer read write functions
    - btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()
    - btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref()
    - btrfs: ctree: check key order before merging tree blocks

  * Tiger Lake PMC core driver fixes (LP: #1899883)
    - platform/x86: intel_pmc_core: update TGL's LPM0 reg bit map name
    - platform/x86: intel_pmc_core: fix bound check in pmc_core_mphy_pg_show()
    - platform/x86: pmc_core: Use descriptive names for LPM registers
    - platform/x86: intel_pmc_core: Fix TigerLake power gating status map
    - platform/x86: intel_pmc_core: Fix the slp_s0 counter displayed value

  * drm/i915/dp_mst - System would hang during the boot up. (LP: #1902469)
    - Revert "UBUNTU: SAUCE: drm/i915/display: Fix null deref in
      intel_psr_atomic_check()"
    - drm/i915: Fix encoder lookup during PSR atomic check

  * Undetected Data corruption in MPI workloads that use VSX for reductions on
    POWER9 DD2.1 systems (LP: #1902694)
    - powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation
    - selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load
      workaround

  * [20.04 FEAT] Support/enhancement of NVMe IPL (LP: #1902179)
    - s390/ipl: support NVMe IPL kernel parameters

  * uvcvideo: add mapping for HEVC payloads (LP: #1895803)
    - media: uvcvideo: Add mapping for HEVC payloads

  * risc-v 5.8 ...

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
status: New → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.