ftrace:test.d--ftrace--func_traceonoff_triggers.tc in ubuntu_kselftests_ftrace triggers kernel NULL pointer dereference on node blanka

Bug #2072972 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned

Bug Description

Issue found on node "blanka" with Focal 5.4.0-190.210 in s2024.06.10

After some manual tests I noticed that this issue is not 100% reproducible:
  * Focal 5.4.0-189.209 + -189 source failed with 5 out of 5 attempts
  * Focal 5.4.0-190.210 + -190 source failed with 2 out of 5 attempts
  * Focal 5.4.0-192.212 + -192 source failed with 3 out of 5 attempts

Despite with this high fail rate, I can't see this failure in our test history all the way back to 2024.01.08 (except with 5.4.0-190.210), perhaps we retest it and it has passed?

$ sudo ./ftracetest -v test.d/ftrace/func_traceonoff_triggers.tc
=== Ftrace unit tests ===
[1] ftrace - test for function traceon/off triggers

dmesg output:
[ 7112.186092] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1
[ 7128.566260] BUG: kernel NULL pointer dereference, address: 0000000000000050
[ 7128.574031] #PF: supervisor read access in kernel mode
[ 7128.579763] #PF: error_code(0x0000) - not-present page
[ 7128.585495] PGD 0 P4D 0
[ 7128.588320] Oops: 0000 [#1] SMP NOPTI
[ 7128.592405] CPU: 129 PID: 0 Comm: swapper/129 Tainted: G OE 5.4.0-190-generic #210-Ubuntu
[ 7128.602887] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023
[ 7128.612119] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100
[ 7128.618820] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12
[ 7128.639774] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282
[ 7128.645604] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100
[ 7128.653565] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4
[ 7128.661525] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100
[ 7128.669486] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8
[ 7128.677445] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000
[ 7128.685409] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000
[ 7128.694437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7128.700847] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0
[ 7128.708809] Call Trace:
[ 7128.711535] <IRQ>
[ 7128.713782] ? show_regs.cold+0x1a/0x1f
[ 7128.718058] ? __die+0x90/0xd9
[ 7128.721467] ? no_context+0x196/0x380
[ 7128.725555] ? ring_buffer_lock_reserve+0x165/0x3d0
[ 7128.730996] ? ring_buffer_unlock_commit+0x27/0x130
[ 7128.736436] ? __bad_area_nosemaphore+0x50/0x1a0
[ 7128.741585] ? bad_area_nosemaphore+0x16/0x20
[ 7128.746445] ? do_user_addr_fault+0x267/0x450
[ 7128.751306] ? trace_event_raw_event_x86_exceptions+0x80/0xd0
[ 7128.757718] ? __do_page_fault+0x58/0x90
[ 7128.762092] ? do_page_fault+0x2c/0xe0
[ 7128.766276] ? page_fault+0x34/0x40
[ 7128.770167] ? trace_event_raw_event_wbt_timer+0x6f/0x100
[ 7128.776190] wb_timer_fn+0x1d6/0x3c0
[ 7128.780179] ? blk_mq_tag_update_depth+0x100/0x100
[ 7128.785522] blk_stat_timer_fn+0x13a/0x140
[ 7128.790094] call_timer_fn+0x32/0x130
[ 7128.794179] __run_timers.part.0+0x180/0x280
[ 7128.798945] ? trace_event_raw_event_softirq+0x5d/0xa0
[ 7128.804677] run_timer_softirq+0x2a/0x50
[ 7128.809053] __do_softirq+0xd1/0x2c1
[ 7128.813041] irq_exit+0xae/0xb0
[ 7128.816545] smp_apic_timer_interrupt+0x7b/0x140
[ 7128.821696] apic_timer_interrupt+0xf/0x20
[ 7128.826264] </IRQ>
[ 7128.828603] RIP: 0010:native_safe_halt+0xe/0x10
[ 7128.833655] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d a6 14 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 96 14 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 ed 46 61 ff 65
[ 7128.854607] RSP: 0018:ffffb177992fbe70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[ 7128.863053] RAX: 0000000000023800 RBX: ffff90676cadbf28 RCX: 000000000003514a
[ 7128.871013] RDX: 000000000003514a RSI: 0000000000000000 RDI: ffffffffb06c6120
[ 7128.878974] RBP: ffffb177992fbe90 R08: 00000000000001ac R09: 0000000000000000
[ 7128.886936] R10: 0000000000020000 R11: 0000000000000002 R12: 0000000000000081
[ 7128.894895] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 7128.902858] ? default_idle+0x20/0x140
[ 7128.907040] arch_cpu_idle+0x15/0x20
[ 7128.911027] default_idle_call+0x23/0x30
[ 7128.915403] do_idle+0x1fb/0x270
[ 7128.919004] ? complete+0x49/0x50
[ 7128.922699] cpu_startup_entry+0x20/0x30
[ 7128.927074] start_secondary+0x178/0x1d0
[ 7128.931452] secondary_startup_64+0xa4/0xb0
[ 7128.936116] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd kvm_amd kvm ipmi_ssif input_leds binfmt_misc mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr ramoops reed_solomon efi_pstore ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast crct10dif_pclmul crc32_pclmul drm_vram_helper ghash_clmulni_intel mlx5_core(OE) ttm aesni_intel crypto_simd drm_kms_helper pci_hyperv_intf mlxdevm(OE) cryptd syscopyarea sysfillrect auxiliary(OE) glue_helper igb tls sysimgblt uas hid_generic mpt3sas mlxfw(OE) psample dca raid_class usbhid fb_sys_fops scsi_transport_sas nvme i2c_algo_bit usb_storage hid drm mlx_compat(OE) nvme_core i2c_piix4
[ 7129.023659] CR2: 0000000000000050
[ 7129.027471] ---[ end trace 5d27e00102fa9701 ]---
[ 7129.052391] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100
[ 7129.059093] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12
[ 7129.080046] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282
[ 7129.085875] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100
[ 7129.093835] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4
[ 7129.101795] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100
[ 7129.109757] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8
[ 7129.117718] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000
[ 7129.125680] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000
[ 7129.134706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7129.141115] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0
[ 7129.149075] Kernel panic - not syncing: Fatal exception in interrupt
[ 7129.158266] Kernel Offset: 0x2dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 7129.193426] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Po-Hsu Lin (cypressyew)
description: updated
description: updated
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

A quick bisect shows this failure occurs between 5.4.0-181 and 5.4.0-182. And very likely a test case issue:

* 5.4.0-181 + 181 source code - OK
* 5.4.0-182 + 182 source code - NOT OK
* 5.4.0-181 + 182 source code - NOT OK
* 5.4.0-182 + 181 source code - OK

Next is to check the test case change between these two versions.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

On 5.4.0-182, it seems we don't have any test case change:
$ git log -1
commit ae6643c51e8eab08016ce89a7e0c1595d8572668 (HEAD -> Ubuntu-5.4.0-182.202, tag: Ubuntu-5.4.0-182.202)
Author: Stefan Bader <email address hidden>
Date: Fri Apr 26 13:36:15 2024 +0200

    UBUNTU: Ubuntu-5.4.0-182.202

    Signed-off-by: Stefan Bader <email address hidden>
$ git log tools/testing/selftests/ftrace
commit 025c9da92a6fd4d7118aef40da453da5f2f5d32c
Author: Francis Laniel <email address hidden>
Date: Fri Oct 20 13:42:50 2023 +0300

    selftests/ftrace: Add new test case which checks non unique symbol

    BugLink: https://bugs.launchpad.net/bugs/2049024

    [ Upstream commit 03b80ff8023adae6780e491f66e932df8165e3a0 ]

    If name_show() is non unique, this test will try to install a kprobe on this
    function which should fail returning EADDRNOTAVAIL.
    On kernel where name_show() is not unique, this test is skipped.

    Link: https://<email address hidden>/

    Cc: <email address hidden>
    Signed-off-by: Francis Laniel <email address hidden>
    Acked-by: Masami Hiramatsu (Google) <email address hidden>
    Signed-off-by: Masami Hiramatsu (Google) <email address hidden>
    Signed-off-by: Sasha Levin <email address hidden>
    Signed-off-by: Manuel Diewald <email address hidden>
    Signed-off-by: Roxana Nicolescu <email address hidden>
$ git tag --contains 025c9da92a6fd4d7118aef40da453da5f2f5d32c
Ubuntu-5.4.0-173.191
Ubuntu-5.4.0-174.192
Ubuntu-5.4.0-174.193
Ubuntu-5.4.0-175.193
Ubuntu-5.4.0-175.194
Ubuntu-5.4.0-175.195
Ubuntu-5.4.0-176.196
Ubuntu-5.4.0-177.197
Ubuntu-5.4.0-181.201
Ubuntu-5.4.0-182.202
Ubuntu-5.4.0-186.206
Ubuntu-5.4.0-187.207
Ubuntu-5.4.0-189.209
Ubuntu-5.4.0-190.210
Ubuntu-5.4.0-192.212

We will probably need to bisect the kernel between these two tags.

Po-Hsu Lin (cypressyew)
summary: ftrace:test.d--ftrace--func_traceonoff_triggers.tc in
- ubuntu_kselftests_ftrace triggers kernel NULL pointer dereference
+ ubuntu_kselftests_ftrace triggers kernel NULL pointer dereference on
+ node blanka
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Download full text (5.5 KiB)

There is not much delta between -181 and -182, and I successfully triggered this on 5.4.0-181 + 181 source code. So the bisect result in #1 is invalid.

As this test fail rate is not 100% reproducible, I am going to run it for multiple times.

I can reproduce this all the way back to 5.4.0-170-generic.

And even 5.4.0-150 is failing too, the trace is a bit different though:
[ 3706.368639] BUG: kernel NULL pointer dereference, address: 0000000000000050
[ 3706.376413] #PF: supervisor read access in kernel mode
[ 3706.382146] #PF: error_code(0x0000) - not-present page
[ 3706.387878] PGD 0 P4D 0
[ 3706.390704] Oops: 0000 [#1] SMP NOPTI
[ 3706.394789] CPU: 255 PID: 0 Comm: swapper/255 Tainted: G OE 5.4.0-150-generic #167-Ubuntu
[ 3706.405274] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023
[ 3706.414504] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100
[ 3706.421206] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 50 cb c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 99 4b
[ 3706.442162] RSP: 0018:ffffa29e5cad8da0 EFLAGS: 00010286
[ 3706.447993] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100
[ 3706.455956] RDX: ffff8eaf36ab00ac RSI: 0000000000000100 RDI: ffff8eaf36ab00a8
[ 3706.463918] RBP: ffffa29e5cad8e08 R08: ffff8eaf36ab00a8 R09: 0000000000000100
[ 3706.471878] R10: 0000000000000004 R11: 0000000000000004 R12: ffff8eaf36ab00ac
[ 3706.479840] R13: 0000000000000000 R14: 0000000000000003 R15: ffff8e0fc0048800
[ 3706.487804] FS: 0000000000000000(0000) GS:ffff8eaf7fbc0000(0000) knlGS:0000000000000000
[ 3706.496834] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3706.503243] CR2: 0000000000000050 CR3: 000000bd7e40a000 CR4: 0000000000340ee0
[ 3706.511204] Call Trace:
[ 3706.513932] <IRQ>
[ 3706.516177] wb_timer_fn+0x1d6/0x3c0
[ 3706.520169] ? blk_stat_free_callback_rcu+0x30/0x30
[ 3706.525610] blk_stat_timer_fn+0x134/0x140
[ 3706.530183] call_timer_fn+0x32/0x130
[ 3706.534261] __run_timers.part.0+0x180/0x280
[ 3706.539026] ? trace_event_raw_event_softirq+0x5d/0xa0
[ 3706.544759] run_timer_softirq+0x2a/0x50
[ 3706.549137] __do_softirq+0xd1/0x2c1
[ 3706.553125] irq_exit+0xae/0xb0
[ 3706.556629] smp_apic_timer_interrupt+0x7b/0x140
[ 3706.561780] apic_timer_interrupt+0xf/0x20
[ 3706.566348] </IRQ>
[ 3706.568689] RIP: 0010:native_safe_halt+0xe/0x10
[ 3706.573741] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d f6 14 51 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d e6 14 51 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 1d 3b 62 ff 65
[ 3706.594697] RSP: 0018:ffffa29e596ebe70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[ 3706.603145] RAX: 0000000000023800 RBX: ffff8e2f5d0a3768 RCX: 00000000000416f2
[ 3706.611108] RDX: 00000000000416f2 RSI: 0000000000000000 RDI: ffffffffb1ec5f60
[ 3706.619070] RBP: ffffa29e596ebe90 R08: 0000000000000000 R09: 0000000000000000
[ 3706.627031] R10: 0000000000000000 R11: 0000000000000004 R12: 00000000000000ff
[ 3706.634993] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3706.642957] ? default_idle+0x20/0x1...

Read more...

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Steps to run this:
# Boot the the kernel you want to test
git clone https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal
# cd focal, and switch to the tag that matches the current kernel
cd focal/tools/testing/selftests
make TARGETS=ftrace
cd ftrace
sudo ./ftracetest -v test.d/ftrace/func_traceonoff_triggers.tc

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.