Activity log for bug #2072972

Date Who What changed Old value New value Message
2024-07-12 13:11:45 Po-Hsu Lin bug added bug
2024-07-12 13:17:50 Po-Hsu Lin description Issue found on node "blanka" with Focal 5.4.0-190.210 in s2024.06.10 After some manual tests I noticed that this issue is not 100% reproducible: * Focal 5.4.0-189.209 + -189 source failed with 5 out of 5 attempts * Focal 5.4.0-190.210 + -190 source failed with 2 out of 5 attempts * Focal 5.4.0-192.212 + -192 source failed with 1 out of 1 attempts Despite with this high fail rate, I can't see this failure in our test history all the way back to 2024.01.08 (except with 5.4.0-190.210), perhaps we retest it and it has passed? $ sudo ./ftracetest -v test.d/ftrace/func_traceonoff_triggers.tc === Ftrace unit tests === [1] ftrace - test for function traceon/off triggers dmesg output: [ 7112.186092] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1 [ 7128.566260] BUG: kernel NULL pointer dereference, address: 0000000000000050 [ 7128.574031] #PF: supervisor read access in kernel mode [ 7128.579763] #PF: error_code(0x0000) - not-present page [ 7128.585495] PGD 0 P4D 0 [ 7128.588320] Oops: 0000 [#1] SMP NOPTI [ 7128.592405] CPU: 129 PID: 0 Comm: swapper/129 Tainted: G OE 5.4.0-190-generic #210-Ubuntu [ 7128.602887] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 7128.612119] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.618820] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7128.639774] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7128.645604] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7128.653565] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7128.661525] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7128.669486] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7128.677445] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7128.685409] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7128.694437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7128.700847] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7128.708809] Call Trace: [ 7128.711535] <IRQ> [ 7128.713782] ? show_regs.cold+0x1a/0x1f [ 7128.718058] ? __die+0x90/0xd9 [ 7128.721467] ? no_context+0x196/0x380 [ 7128.725555] ? ring_buffer_lock_reserve+0x165/0x3d0 [ 7128.730996] ? ring_buffer_unlock_commit+0x27/0x130 [ 7128.736436] ? __bad_area_nosemaphore+0x50/0x1a0 [ 7128.741585] ? bad_area_nosemaphore+0x16/0x20 [ 7128.746445] ? do_user_addr_fault+0x267/0x450 [ 7128.751306] ? trace_event_raw_event_x86_exceptions+0x80/0xd0 [ 7128.757718] ? __do_page_fault+0x58/0x90 [ 7128.762092] ? do_page_fault+0x2c/0xe0 [ 7128.766276] ? page_fault+0x34/0x40 [ 7128.770167] ? trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.776190] wb_timer_fn+0x1d6/0x3c0 [ 7128.780179] ? blk_mq_tag_update_depth+0x100/0x100 [ 7128.785522] blk_stat_timer_fn+0x13a/0x140 [ 7128.790094] call_timer_fn+0x32/0x130 [ 7128.794179] __run_timers.part.0+0x180/0x280 [ 7128.798945] ? trace_event_raw_event_softirq+0x5d/0xa0 [ 7128.804677] run_timer_softirq+0x2a/0x50 [ 7128.809053] __do_softirq+0xd1/0x2c1 [ 7128.813041] irq_exit+0xae/0xb0 [ 7128.816545] smp_apic_timer_interrupt+0x7b/0x140 [ 7128.821696] apic_timer_interrupt+0xf/0x20 [ 7128.826264] </IRQ> [ 7128.828603] RIP: 0010:native_safe_halt+0xe/0x10 [ 7128.833655] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d a6 14 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 96 14 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 ed 46 61 ff 65 [ 7128.854607] RSP: 0018:ffffb177992fbe70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 7128.863053] RAX: 0000000000023800 RBX: ffff90676cadbf28 RCX: 000000000003514a [ 7128.871013] RDX: 000000000003514a RSI: 0000000000000000 RDI: ffffffffb06c6120 [ 7128.878974] RBP: ffffb177992fbe90 R08: 00000000000001ac R09: 0000000000000000 [ 7128.886936] R10: 0000000000020000 R11: 0000000000000002 R12: 0000000000000081 [ 7128.894895] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 7128.902858] ? default_idle+0x20/0x140 [ 7128.907040] arch_cpu_idle+0x15/0x20 [ 7128.911027] default_idle_call+0x23/0x30 [ 7128.915403] do_idle+0x1fb/0x270 [ 7128.919004] ? complete+0x49/0x50 [ 7128.922699] cpu_startup_entry+0x20/0x30 [ 7128.927074] start_secondary+0x178/0x1d0 [ 7128.931452] secondary_startup_64+0xa4/0xb0 [ 7128.936116] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd kvm_amd kvm ipmi_ssif input_leds binfmt_misc mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr ramoops reed_solomon efi_pstore ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast crct10dif_pclmul crc32_pclmul drm_vram_helper ghash_clmulni_intel mlx5_core(OE) ttm aesni_intel crypto_simd drm_kms_helper pci_hyperv_intf mlxdevm(OE) cryptd syscopyarea sysfillrect auxiliary(OE) glue_helper igb tls sysimgblt uas hid_generic mpt3sas mlxfw(OE) psample dca raid_class usbhid fb_sys_fops scsi_transport_sas nvme i2c_algo_bit usb_storage hid drm mlx_compat(OE) nvme_core i2c_piix4 [ 7129.023659] CR2: 0000000000000050 [ 7129.027471] ---[ end trace 5d27e00102fa9701 ]--- [ 7129.052391] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7129.059093] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7129.080046] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7129.085875] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7129.093835] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7129.101795] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7129.109757] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7129.117718] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7129.125680] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7129.134706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7129.141115] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7129.149075] Kernel panic - not syncing: Fatal exception in interrupt [ 7129.158266] Kernel Offset: 0x2dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 7129.193426] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Issue found on node "blanka" with Focal 5.4.0-190.210 in s2024.06.10 After some manual tests I noticed that this issue is not 100% reproducible:   * Focal 5.4.0-189.209 + -189 source failed with 5 out of 5 attempts   * Focal 5.4.0-190.210 + -190 source failed with 2 out of 5 attempts   * Focal 5.4.0-192.212 + -192 source failed with 1 out of 2 attempts Despite with this high fail rate, I can't see this failure in our test history all the way back to 2024.01.08 (except with 5.4.0-190.210), perhaps we retest it and it has passed? $ sudo ./ftracetest -v test.d/ftrace/func_traceonoff_triggers.tc === Ftrace unit tests === [1] ftrace - test for function traceon/off triggers dmesg output: [ 7112.186092] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1 [ 7128.566260] BUG: kernel NULL pointer dereference, address: 0000000000000050 [ 7128.574031] #PF: supervisor read access in kernel mode [ 7128.579763] #PF: error_code(0x0000) - not-present page [ 7128.585495] PGD 0 P4D 0 [ 7128.588320] Oops: 0000 [#1] SMP NOPTI [ 7128.592405] CPU: 129 PID: 0 Comm: swapper/129 Tainted: G OE 5.4.0-190-generic #210-Ubuntu [ 7128.602887] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 7128.612119] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.618820] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7128.639774] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7128.645604] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7128.653565] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7128.661525] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7128.669486] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7128.677445] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7128.685409] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7128.694437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7128.700847] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7128.708809] Call Trace: [ 7128.711535] <IRQ> [ 7128.713782] ? show_regs.cold+0x1a/0x1f [ 7128.718058] ? __die+0x90/0xd9 [ 7128.721467] ? no_context+0x196/0x380 [ 7128.725555] ? ring_buffer_lock_reserve+0x165/0x3d0 [ 7128.730996] ? ring_buffer_unlock_commit+0x27/0x130 [ 7128.736436] ? __bad_area_nosemaphore+0x50/0x1a0 [ 7128.741585] ? bad_area_nosemaphore+0x16/0x20 [ 7128.746445] ? do_user_addr_fault+0x267/0x450 [ 7128.751306] ? trace_event_raw_event_x86_exceptions+0x80/0xd0 [ 7128.757718] ? __do_page_fault+0x58/0x90 [ 7128.762092] ? do_page_fault+0x2c/0xe0 [ 7128.766276] ? page_fault+0x34/0x40 [ 7128.770167] ? trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.776190] wb_timer_fn+0x1d6/0x3c0 [ 7128.780179] ? blk_mq_tag_update_depth+0x100/0x100 [ 7128.785522] blk_stat_timer_fn+0x13a/0x140 [ 7128.790094] call_timer_fn+0x32/0x130 [ 7128.794179] __run_timers.part.0+0x180/0x280 [ 7128.798945] ? trace_event_raw_event_softirq+0x5d/0xa0 [ 7128.804677] run_timer_softirq+0x2a/0x50 [ 7128.809053] __do_softirq+0xd1/0x2c1 [ 7128.813041] irq_exit+0xae/0xb0 [ 7128.816545] smp_apic_timer_interrupt+0x7b/0x140 [ 7128.821696] apic_timer_interrupt+0xf/0x20 [ 7128.826264] </IRQ> [ 7128.828603] RIP: 0010:native_safe_halt+0xe/0x10 [ 7128.833655] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d a6 14 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 96 14 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 ed 46 61 ff 65 [ 7128.854607] RSP: 0018:ffffb177992fbe70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 7128.863053] RAX: 0000000000023800 RBX: ffff90676cadbf28 RCX: 000000000003514a [ 7128.871013] RDX: 000000000003514a RSI: 0000000000000000 RDI: ffffffffb06c6120 [ 7128.878974] RBP: ffffb177992fbe90 R08: 00000000000001ac R09: 0000000000000000 [ 7128.886936] R10: 0000000000020000 R11: 0000000000000002 R12: 0000000000000081 [ 7128.894895] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 7128.902858] ? default_idle+0x20/0x140 [ 7128.907040] arch_cpu_idle+0x15/0x20 [ 7128.911027] default_idle_call+0x23/0x30 [ 7128.915403] do_idle+0x1fb/0x270 [ 7128.919004] ? complete+0x49/0x50 [ 7128.922699] cpu_startup_entry+0x20/0x30 [ 7128.927074] start_secondary+0x178/0x1d0 [ 7128.931452] secondary_startup_64+0xa4/0xb0 [ 7128.936116] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd kvm_amd kvm ipmi_ssif input_leds binfmt_misc mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr ramoops reed_solomon efi_pstore ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast crct10dif_pclmul crc32_pclmul drm_vram_helper ghash_clmulni_intel mlx5_core(OE) ttm aesni_intel crypto_simd drm_kms_helper pci_hyperv_intf mlxdevm(OE) cryptd syscopyarea sysfillrect auxiliary(OE) glue_helper igb tls sysimgblt uas hid_generic mpt3sas mlxfw(OE) psample dca raid_class usbhid fb_sys_fops scsi_transport_sas nvme i2c_algo_bit usb_storage hid drm mlx_compat(OE) nvme_core i2c_piix4 [ 7129.023659] CR2: 0000000000000050 [ 7129.027471] ---[ end trace 5d27e00102fa9701 ]--- [ 7129.052391] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7129.059093] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7129.080046] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7129.085875] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7129.093835] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7129.101795] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7129.109757] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7129.117718] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7129.125680] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7129.134706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7129.141115] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7129.149075] Kernel panic - not syncing: Fatal exception in interrupt [ 7129.158266] Kernel Offset: 0x2dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 7129.193426] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
2024-07-12 13:19:03 Po-Hsu Lin description Issue found on node "blanka" with Focal 5.4.0-190.210 in s2024.06.10 After some manual tests I noticed that this issue is not 100% reproducible:   * Focal 5.4.0-189.209 + -189 source failed with 5 out of 5 attempts   * Focal 5.4.0-190.210 + -190 source failed with 2 out of 5 attempts   * Focal 5.4.0-192.212 + -192 source failed with 1 out of 2 attempts Despite with this high fail rate, I can't see this failure in our test history all the way back to 2024.01.08 (except with 5.4.0-190.210), perhaps we retest it and it has passed? $ sudo ./ftracetest -v test.d/ftrace/func_traceonoff_triggers.tc === Ftrace unit tests === [1] ftrace - test for function traceon/off triggers dmesg output: [ 7112.186092] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1 [ 7128.566260] BUG: kernel NULL pointer dereference, address: 0000000000000050 [ 7128.574031] #PF: supervisor read access in kernel mode [ 7128.579763] #PF: error_code(0x0000) - not-present page [ 7128.585495] PGD 0 P4D 0 [ 7128.588320] Oops: 0000 [#1] SMP NOPTI [ 7128.592405] CPU: 129 PID: 0 Comm: swapper/129 Tainted: G OE 5.4.0-190-generic #210-Ubuntu [ 7128.602887] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 7128.612119] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.618820] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7128.639774] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7128.645604] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7128.653565] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7128.661525] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7128.669486] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7128.677445] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7128.685409] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7128.694437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7128.700847] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7128.708809] Call Trace: [ 7128.711535] <IRQ> [ 7128.713782] ? show_regs.cold+0x1a/0x1f [ 7128.718058] ? __die+0x90/0xd9 [ 7128.721467] ? no_context+0x196/0x380 [ 7128.725555] ? ring_buffer_lock_reserve+0x165/0x3d0 [ 7128.730996] ? ring_buffer_unlock_commit+0x27/0x130 [ 7128.736436] ? __bad_area_nosemaphore+0x50/0x1a0 [ 7128.741585] ? bad_area_nosemaphore+0x16/0x20 [ 7128.746445] ? do_user_addr_fault+0x267/0x450 [ 7128.751306] ? trace_event_raw_event_x86_exceptions+0x80/0xd0 [ 7128.757718] ? __do_page_fault+0x58/0x90 [ 7128.762092] ? do_page_fault+0x2c/0xe0 [ 7128.766276] ? page_fault+0x34/0x40 [ 7128.770167] ? trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.776190] wb_timer_fn+0x1d6/0x3c0 [ 7128.780179] ? blk_mq_tag_update_depth+0x100/0x100 [ 7128.785522] blk_stat_timer_fn+0x13a/0x140 [ 7128.790094] call_timer_fn+0x32/0x130 [ 7128.794179] __run_timers.part.0+0x180/0x280 [ 7128.798945] ? trace_event_raw_event_softirq+0x5d/0xa0 [ 7128.804677] run_timer_softirq+0x2a/0x50 [ 7128.809053] __do_softirq+0xd1/0x2c1 [ 7128.813041] irq_exit+0xae/0xb0 [ 7128.816545] smp_apic_timer_interrupt+0x7b/0x140 [ 7128.821696] apic_timer_interrupt+0xf/0x20 [ 7128.826264] </IRQ> [ 7128.828603] RIP: 0010:native_safe_halt+0xe/0x10 [ 7128.833655] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d a6 14 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 96 14 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 ed 46 61 ff 65 [ 7128.854607] RSP: 0018:ffffb177992fbe70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 7128.863053] RAX: 0000000000023800 RBX: ffff90676cadbf28 RCX: 000000000003514a [ 7128.871013] RDX: 000000000003514a RSI: 0000000000000000 RDI: ffffffffb06c6120 [ 7128.878974] RBP: ffffb177992fbe90 R08: 00000000000001ac R09: 0000000000000000 [ 7128.886936] R10: 0000000000020000 R11: 0000000000000002 R12: 0000000000000081 [ 7128.894895] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 7128.902858] ? default_idle+0x20/0x140 [ 7128.907040] arch_cpu_idle+0x15/0x20 [ 7128.911027] default_idle_call+0x23/0x30 [ 7128.915403] do_idle+0x1fb/0x270 [ 7128.919004] ? complete+0x49/0x50 [ 7128.922699] cpu_startup_entry+0x20/0x30 [ 7128.927074] start_secondary+0x178/0x1d0 [ 7128.931452] secondary_startup_64+0xa4/0xb0 [ 7128.936116] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd kvm_amd kvm ipmi_ssif input_leds binfmt_misc mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr ramoops reed_solomon efi_pstore ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast crct10dif_pclmul crc32_pclmul drm_vram_helper ghash_clmulni_intel mlx5_core(OE) ttm aesni_intel crypto_simd drm_kms_helper pci_hyperv_intf mlxdevm(OE) cryptd syscopyarea sysfillrect auxiliary(OE) glue_helper igb tls sysimgblt uas hid_generic mpt3sas mlxfw(OE) psample dca raid_class usbhid fb_sys_fops scsi_transport_sas nvme i2c_algo_bit usb_storage hid drm mlx_compat(OE) nvme_core i2c_piix4 [ 7129.023659] CR2: 0000000000000050 [ 7129.027471] ---[ end trace 5d27e00102fa9701 ]--- [ 7129.052391] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7129.059093] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7129.080046] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7129.085875] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7129.093835] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7129.101795] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7129.109757] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7129.117718] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7129.125680] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7129.134706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7129.141115] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7129.149075] Kernel panic - not syncing: Fatal exception in interrupt [ 7129.158266] Kernel Offset: 0x2dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 7129.193426] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Issue found on node "blanka" with Focal 5.4.0-190.210 in s2024.06.10 After some manual tests I noticed that this issue is not 100% reproducible:   * Focal 5.4.0-189.209 + -189 source failed with 5 out of 5 attempts   * Focal 5.4.0-190.210 + -190 source failed with 2 out of 5 attempts   * Focal 5.4.0-192.212 + -192 source failed with 1 out of 3 attempts Despite with this high fail rate, I can't see this failure in our test history all the way back to 2024.01.08 (except with 5.4.0-190.210), perhaps we retest it and it has passed? $ sudo ./ftracetest -v test.d/ftrace/func_traceonoff_triggers.tc === Ftrace unit tests === [1] ftrace - test for function traceon/off triggers dmesg output: [ 7112.186092] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1 [ 7128.566260] BUG: kernel NULL pointer dereference, address: 0000000000000050 [ 7128.574031] #PF: supervisor read access in kernel mode [ 7128.579763] #PF: error_code(0x0000) - not-present page [ 7128.585495] PGD 0 P4D 0 [ 7128.588320] Oops: 0000 [#1] SMP NOPTI [ 7128.592405] CPU: 129 PID: 0 Comm: swapper/129 Tainted: G OE 5.4.0-190-generic #210-Ubuntu [ 7128.602887] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 7128.612119] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.618820] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7128.639774] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7128.645604] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7128.653565] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7128.661525] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7128.669486] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7128.677445] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7128.685409] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7128.694437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7128.700847] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7128.708809] Call Trace: [ 7128.711535] <IRQ> [ 7128.713782] ? show_regs.cold+0x1a/0x1f [ 7128.718058] ? __die+0x90/0xd9 [ 7128.721467] ? no_context+0x196/0x380 [ 7128.725555] ? ring_buffer_lock_reserve+0x165/0x3d0 [ 7128.730996] ? ring_buffer_unlock_commit+0x27/0x130 [ 7128.736436] ? __bad_area_nosemaphore+0x50/0x1a0 [ 7128.741585] ? bad_area_nosemaphore+0x16/0x20 [ 7128.746445] ? do_user_addr_fault+0x267/0x450 [ 7128.751306] ? trace_event_raw_event_x86_exceptions+0x80/0xd0 [ 7128.757718] ? __do_page_fault+0x58/0x90 [ 7128.762092] ? do_page_fault+0x2c/0xe0 [ 7128.766276] ? page_fault+0x34/0x40 [ 7128.770167] ? trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.776190] wb_timer_fn+0x1d6/0x3c0 [ 7128.780179] ? blk_mq_tag_update_depth+0x100/0x100 [ 7128.785522] blk_stat_timer_fn+0x13a/0x140 [ 7128.790094] call_timer_fn+0x32/0x130 [ 7128.794179] __run_timers.part.0+0x180/0x280 [ 7128.798945] ? trace_event_raw_event_softirq+0x5d/0xa0 [ 7128.804677] run_timer_softirq+0x2a/0x50 [ 7128.809053] __do_softirq+0xd1/0x2c1 [ 7128.813041] irq_exit+0xae/0xb0 [ 7128.816545] smp_apic_timer_interrupt+0x7b/0x140 [ 7128.821696] apic_timer_interrupt+0xf/0x20 [ 7128.826264] </IRQ> [ 7128.828603] RIP: 0010:native_safe_halt+0xe/0x10 [ 7128.833655] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d a6 14 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 96 14 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 ed 46 61 ff 65 [ 7128.854607] RSP: 0018:ffffb177992fbe70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 7128.863053] RAX: 0000000000023800 RBX: ffff90676cadbf28 RCX: 000000000003514a [ 7128.871013] RDX: 000000000003514a RSI: 0000000000000000 RDI: ffffffffb06c6120 [ 7128.878974] RBP: ffffb177992fbe90 R08: 00000000000001ac R09: 0000000000000000 [ 7128.886936] R10: 0000000000020000 R11: 0000000000000002 R12: 0000000000000081 [ 7128.894895] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 7128.902858] ? default_idle+0x20/0x140 [ 7128.907040] arch_cpu_idle+0x15/0x20 [ 7128.911027] default_idle_call+0x23/0x30 [ 7128.915403] do_idle+0x1fb/0x270 [ 7128.919004] ? complete+0x49/0x50 [ 7128.922699] cpu_startup_entry+0x20/0x30 [ 7128.927074] start_secondary+0x178/0x1d0 [ 7128.931452] secondary_startup_64+0xa4/0xb0 [ 7128.936116] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd kvm_amd kvm ipmi_ssif input_leds binfmt_misc mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr ramoops reed_solomon efi_pstore ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast crct10dif_pclmul crc32_pclmul drm_vram_helper ghash_clmulni_intel mlx5_core(OE) ttm aesni_intel crypto_simd drm_kms_helper pci_hyperv_intf mlxdevm(OE) cryptd syscopyarea sysfillrect auxiliary(OE) glue_helper igb tls sysimgblt uas hid_generic mpt3sas mlxfw(OE) psample dca raid_class usbhid fb_sys_fops scsi_transport_sas nvme i2c_algo_bit usb_storage hid drm mlx_compat(OE) nvme_core i2c_piix4 [ 7129.023659] CR2: 0000000000000050 [ 7129.027471] ---[ end trace 5d27e00102fa9701 ]--- [ 7129.052391] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7129.059093] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7129.080046] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7129.085875] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7129.093835] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7129.101795] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7129.109757] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7129.117718] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7129.125680] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7129.134706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7129.141115] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7129.149075] Kernel panic - not syncing: Fatal exception in interrupt [ 7129.158266] Kernel Offset: 0x2dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 7129.193426] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
2024-07-12 14:50:31 Po-Hsu Lin description Issue found on node "blanka" with Focal 5.4.0-190.210 in s2024.06.10 After some manual tests I noticed that this issue is not 100% reproducible:   * Focal 5.4.0-189.209 + -189 source failed with 5 out of 5 attempts   * Focal 5.4.0-190.210 + -190 source failed with 2 out of 5 attempts   * Focal 5.4.0-192.212 + -192 source failed with 1 out of 3 attempts Despite with this high fail rate, I can't see this failure in our test history all the way back to 2024.01.08 (except with 5.4.0-190.210), perhaps we retest it and it has passed? $ sudo ./ftracetest -v test.d/ftrace/func_traceonoff_triggers.tc === Ftrace unit tests === [1] ftrace - test for function traceon/off triggers dmesg output: [ 7112.186092] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1 [ 7128.566260] BUG: kernel NULL pointer dereference, address: 0000000000000050 [ 7128.574031] #PF: supervisor read access in kernel mode [ 7128.579763] #PF: error_code(0x0000) - not-present page [ 7128.585495] PGD 0 P4D 0 [ 7128.588320] Oops: 0000 [#1] SMP NOPTI [ 7128.592405] CPU: 129 PID: 0 Comm: swapper/129 Tainted: G OE 5.4.0-190-generic #210-Ubuntu [ 7128.602887] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 7128.612119] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.618820] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7128.639774] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7128.645604] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7128.653565] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7128.661525] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7128.669486] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7128.677445] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7128.685409] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7128.694437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7128.700847] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7128.708809] Call Trace: [ 7128.711535] <IRQ> [ 7128.713782] ? show_regs.cold+0x1a/0x1f [ 7128.718058] ? __die+0x90/0xd9 [ 7128.721467] ? no_context+0x196/0x380 [ 7128.725555] ? ring_buffer_lock_reserve+0x165/0x3d0 [ 7128.730996] ? ring_buffer_unlock_commit+0x27/0x130 [ 7128.736436] ? __bad_area_nosemaphore+0x50/0x1a0 [ 7128.741585] ? bad_area_nosemaphore+0x16/0x20 [ 7128.746445] ? do_user_addr_fault+0x267/0x450 [ 7128.751306] ? trace_event_raw_event_x86_exceptions+0x80/0xd0 [ 7128.757718] ? __do_page_fault+0x58/0x90 [ 7128.762092] ? do_page_fault+0x2c/0xe0 [ 7128.766276] ? page_fault+0x34/0x40 [ 7128.770167] ? trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.776190] wb_timer_fn+0x1d6/0x3c0 [ 7128.780179] ? blk_mq_tag_update_depth+0x100/0x100 [ 7128.785522] blk_stat_timer_fn+0x13a/0x140 [ 7128.790094] call_timer_fn+0x32/0x130 [ 7128.794179] __run_timers.part.0+0x180/0x280 [ 7128.798945] ? trace_event_raw_event_softirq+0x5d/0xa0 [ 7128.804677] run_timer_softirq+0x2a/0x50 [ 7128.809053] __do_softirq+0xd1/0x2c1 [ 7128.813041] irq_exit+0xae/0xb0 [ 7128.816545] smp_apic_timer_interrupt+0x7b/0x140 [ 7128.821696] apic_timer_interrupt+0xf/0x20 [ 7128.826264] </IRQ> [ 7128.828603] RIP: 0010:native_safe_halt+0xe/0x10 [ 7128.833655] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d a6 14 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 96 14 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 ed 46 61 ff 65 [ 7128.854607] RSP: 0018:ffffb177992fbe70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 7128.863053] RAX: 0000000000023800 RBX: ffff90676cadbf28 RCX: 000000000003514a [ 7128.871013] RDX: 000000000003514a RSI: 0000000000000000 RDI: ffffffffb06c6120 [ 7128.878974] RBP: ffffb177992fbe90 R08: 00000000000001ac R09: 0000000000000000 [ 7128.886936] R10: 0000000000020000 R11: 0000000000000002 R12: 0000000000000081 [ 7128.894895] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 7128.902858] ? default_idle+0x20/0x140 [ 7128.907040] arch_cpu_idle+0x15/0x20 [ 7128.911027] default_idle_call+0x23/0x30 [ 7128.915403] do_idle+0x1fb/0x270 [ 7128.919004] ? complete+0x49/0x50 [ 7128.922699] cpu_startup_entry+0x20/0x30 [ 7128.927074] start_secondary+0x178/0x1d0 [ 7128.931452] secondary_startup_64+0xa4/0xb0 [ 7128.936116] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd kvm_amd kvm ipmi_ssif input_leds binfmt_misc mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr ramoops reed_solomon efi_pstore ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast crct10dif_pclmul crc32_pclmul drm_vram_helper ghash_clmulni_intel mlx5_core(OE) ttm aesni_intel crypto_simd drm_kms_helper pci_hyperv_intf mlxdevm(OE) cryptd syscopyarea sysfillrect auxiliary(OE) glue_helper igb tls sysimgblt uas hid_generic mpt3sas mlxfw(OE) psample dca raid_class usbhid fb_sys_fops scsi_transport_sas nvme i2c_algo_bit usb_storage hid drm mlx_compat(OE) nvme_core i2c_piix4 [ 7129.023659] CR2: 0000000000000050 [ 7129.027471] ---[ end trace 5d27e00102fa9701 ]--- [ 7129.052391] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7129.059093] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7129.080046] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7129.085875] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7129.093835] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7129.101795] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7129.109757] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7129.117718] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7129.125680] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7129.134706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7129.141115] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7129.149075] Kernel panic - not syncing: Fatal exception in interrupt [ 7129.158266] Kernel Offset: 0x2dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 7129.193426] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Issue found on node "blanka" with Focal 5.4.0-190.210 in s2024.06.10 After some manual tests I noticed that this issue is not 100% reproducible:   * Focal 5.4.0-189.209 + -189 source failed with 5 out of 5 attempts   * Focal 5.4.0-190.210 + -190 source failed with 2 out of 5 attempts   * Focal 5.4.0-192.212 + -192 source failed with 2 out of 4 attempts Despite with this high fail rate, I can't see this failure in our test history all the way back to 2024.01.08 (except with 5.4.0-190.210), perhaps we retest it and it has passed? $ sudo ./ftracetest -v test.d/ftrace/func_traceonoff_triggers.tc === Ftrace unit tests === [1] ftrace - test for function traceon/off triggers dmesg output: [ 7112.186092] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1 [ 7128.566260] BUG: kernel NULL pointer dereference, address: 0000000000000050 [ 7128.574031] #PF: supervisor read access in kernel mode [ 7128.579763] #PF: error_code(0x0000) - not-present page [ 7128.585495] PGD 0 P4D 0 [ 7128.588320] Oops: 0000 [#1] SMP NOPTI [ 7128.592405] CPU: 129 PID: 0 Comm: swapper/129 Tainted: G OE 5.4.0-190-generic #210-Ubuntu [ 7128.602887] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 7128.612119] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.618820] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7128.639774] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7128.645604] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7128.653565] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7128.661525] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7128.669486] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7128.677445] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7128.685409] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7128.694437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7128.700847] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7128.708809] Call Trace: [ 7128.711535] <IRQ> [ 7128.713782] ? show_regs.cold+0x1a/0x1f [ 7128.718058] ? __die+0x90/0xd9 [ 7128.721467] ? no_context+0x196/0x380 [ 7128.725555] ? ring_buffer_lock_reserve+0x165/0x3d0 [ 7128.730996] ? ring_buffer_unlock_commit+0x27/0x130 [ 7128.736436] ? __bad_area_nosemaphore+0x50/0x1a0 [ 7128.741585] ? bad_area_nosemaphore+0x16/0x20 [ 7128.746445] ? do_user_addr_fault+0x267/0x450 [ 7128.751306] ? trace_event_raw_event_x86_exceptions+0x80/0xd0 [ 7128.757718] ? __do_page_fault+0x58/0x90 [ 7128.762092] ? do_page_fault+0x2c/0xe0 [ 7128.766276] ? page_fault+0x34/0x40 [ 7128.770167] ? trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.776190] wb_timer_fn+0x1d6/0x3c0 [ 7128.780179] ? blk_mq_tag_update_depth+0x100/0x100 [ 7128.785522] blk_stat_timer_fn+0x13a/0x140 [ 7128.790094] call_timer_fn+0x32/0x130 [ 7128.794179] __run_timers.part.0+0x180/0x280 [ 7128.798945] ? trace_event_raw_event_softirq+0x5d/0xa0 [ 7128.804677] run_timer_softirq+0x2a/0x50 [ 7128.809053] __do_softirq+0xd1/0x2c1 [ 7128.813041] irq_exit+0xae/0xb0 [ 7128.816545] smp_apic_timer_interrupt+0x7b/0x140 [ 7128.821696] apic_timer_interrupt+0xf/0x20 [ 7128.826264] </IRQ> [ 7128.828603] RIP: 0010:native_safe_halt+0xe/0x10 [ 7128.833655] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d a6 14 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 96 14 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 ed 46 61 ff 65 [ 7128.854607] RSP: 0018:ffffb177992fbe70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 7128.863053] RAX: 0000000000023800 RBX: ffff90676cadbf28 RCX: 000000000003514a [ 7128.871013] RDX: 000000000003514a RSI: 0000000000000000 RDI: ffffffffb06c6120 [ 7128.878974] RBP: ffffb177992fbe90 R08: 00000000000001ac R09: 0000000000000000 [ 7128.886936] R10: 0000000000020000 R11: 0000000000000002 R12: 0000000000000081 [ 7128.894895] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 7128.902858] ? default_idle+0x20/0x140 [ 7128.907040] arch_cpu_idle+0x15/0x20 [ 7128.911027] default_idle_call+0x23/0x30 [ 7128.915403] do_idle+0x1fb/0x270 [ 7128.919004] ? complete+0x49/0x50 [ 7128.922699] cpu_startup_entry+0x20/0x30 [ 7128.927074] start_secondary+0x178/0x1d0 [ 7128.931452] secondary_startup_64+0xa4/0xb0 [ 7128.936116] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd kvm_amd kvm ipmi_ssif input_leds binfmt_misc mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr ramoops reed_solomon efi_pstore ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast crct10dif_pclmul crc32_pclmul drm_vram_helper ghash_clmulni_intel mlx5_core(OE) ttm aesni_intel crypto_simd drm_kms_helper pci_hyperv_intf mlxdevm(OE) cryptd syscopyarea sysfillrect auxiliary(OE) glue_helper igb tls sysimgblt uas hid_generic mpt3sas mlxfw(OE) psample dca raid_class usbhid fb_sys_fops scsi_transport_sas nvme i2c_algo_bit usb_storage hid drm mlx_compat(OE) nvme_core i2c_piix4 [ 7129.023659] CR2: 0000000000000050 [ 7129.027471] ---[ end trace 5d27e00102fa9701 ]--- [ 7129.052391] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7129.059093] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7129.080046] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7129.085875] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7129.093835] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7129.101795] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7129.109757] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7129.117718] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7129.125680] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7129.134706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7129.141115] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7129.149075] Kernel panic - not syncing: Fatal exception in interrupt [ 7129.158266] Kernel Offset: 0x2dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 7129.193426] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
2024-07-12 15:07:50 Po-Hsu Lin description Issue found on node "blanka" with Focal 5.4.0-190.210 in s2024.06.10 After some manual tests I noticed that this issue is not 100% reproducible:   * Focal 5.4.0-189.209 + -189 source failed with 5 out of 5 attempts   * Focal 5.4.0-190.210 + -190 source failed with 2 out of 5 attempts   * Focal 5.4.0-192.212 + -192 source failed with 2 out of 4 attempts Despite with this high fail rate, I can't see this failure in our test history all the way back to 2024.01.08 (except with 5.4.0-190.210), perhaps we retest it and it has passed? $ sudo ./ftracetest -v test.d/ftrace/func_traceonoff_triggers.tc === Ftrace unit tests === [1] ftrace - test for function traceon/off triggers dmesg output: [ 7112.186092] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1 [ 7128.566260] BUG: kernel NULL pointer dereference, address: 0000000000000050 [ 7128.574031] #PF: supervisor read access in kernel mode [ 7128.579763] #PF: error_code(0x0000) - not-present page [ 7128.585495] PGD 0 P4D 0 [ 7128.588320] Oops: 0000 [#1] SMP NOPTI [ 7128.592405] CPU: 129 PID: 0 Comm: swapper/129 Tainted: G OE 5.4.0-190-generic #210-Ubuntu [ 7128.602887] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 7128.612119] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.618820] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7128.639774] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7128.645604] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7128.653565] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7128.661525] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7128.669486] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7128.677445] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7128.685409] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7128.694437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7128.700847] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7128.708809] Call Trace: [ 7128.711535] <IRQ> [ 7128.713782] ? show_regs.cold+0x1a/0x1f [ 7128.718058] ? __die+0x90/0xd9 [ 7128.721467] ? no_context+0x196/0x380 [ 7128.725555] ? ring_buffer_lock_reserve+0x165/0x3d0 [ 7128.730996] ? ring_buffer_unlock_commit+0x27/0x130 [ 7128.736436] ? __bad_area_nosemaphore+0x50/0x1a0 [ 7128.741585] ? bad_area_nosemaphore+0x16/0x20 [ 7128.746445] ? do_user_addr_fault+0x267/0x450 [ 7128.751306] ? trace_event_raw_event_x86_exceptions+0x80/0xd0 [ 7128.757718] ? __do_page_fault+0x58/0x90 [ 7128.762092] ? do_page_fault+0x2c/0xe0 [ 7128.766276] ? page_fault+0x34/0x40 [ 7128.770167] ? trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.776190] wb_timer_fn+0x1d6/0x3c0 [ 7128.780179] ? blk_mq_tag_update_depth+0x100/0x100 [ 7128.785522] blk_stat_timer_fn+0x13a/0x140 [ 7128.790094] call_timer_fn+0x32/0x130 [ 7128.794179] __run_timers.part.0+0x180/0x280 [ 7128.798945] ? trace_event_raw_event_softirq+0x5d/0xa0 [ 7128.804677] run_timer_softirq+0x2a/0x50 [ 7128.809053] __do_softirq+0xd1/0x2c1 [ 7128.813041] irq_exit+0xae/0xb0 [ 7128.816545] smp_apic_timer_interrupt+0x7b/0x140 [ 7128.821696] apic_timer_interrupt+0xf/0x20 [ 7128.826264] </IRQ> [ 7128.828603] RIP: 0010:native_safe_halt+0xe/0x10 [ 7128.833655] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d a6 14 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 96 14 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 ed 46 61 ff 65 [ 7128.854607] RSP: 0018:ffffb177992fbe70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 7128.863053] RAX: 0000000000023800 RBX: ffff90676cadbf28 RCX: 000000000003514a [ 7128.871013] RDX: 000000000003514a RSI: 0000000000000000 RDI: ffffffffb06c6120 [ 7128.878974] RBP: ffffb177992fbe90 R08: 00000000000001ac R09: 0000000000000000 [ 7128.886936] R10: 0000000000020000 R11: 0000000000000002 R12: 0000000000000081 [ 7128.894895] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 7128.902858] ? default_idle+0x20/0x140 [ 7128.907040] arch_cpu_idle+0x15/0x20 [ 7128.911027] default_idle_call+0x23/0x30 [ 7128.915403] do_idle+0x1fb/0x270 [ 7128.919004] ? complete+0x49/0x50 [ 7128.922699] cpu_startup_entry+0x20/0x30 [ 7128.927074] start_secondary+0x178/0x1d0 [ 7128.931452] secondary_startup_64+0xa4/0xb0 [ 7128.936116] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd kvm_amd kvm ipmi_ssif input_leds binfmt_misc mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr ramoops reed_solomon efi_pstore ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast crct10dif_pclmul crc32_pclmul drm_vram_helper ghash_clmulni_intel mlx5_core(OE) ttm aesni_intel crypto_simd drm_kms_helper pci_hyperv_intf mlxdevm(OE) cryptd syscopyarea sysfillrect auxiliary(OE) glue_helper igb tls sysimgblt uas hid_generic mpt3sas mlxfw(OE) psample dca raid_class usbhid fb_sys_fops scsi_transport_sas nvme i2c_algo_bit usb_storage hid drm mlx_compat(OE) nvme_core i2c_piix4 [ 7129.023659] CR2: 0000000000000050 [ 7129.027471] ---[ end trace 5d27e00102fa9701 ]--- [ 7129.052391] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7129.059093] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7129.080046] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7129.085875] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7129.093835] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7129.101795] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7129.109757] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7129.117718] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7129.125680] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7129.134706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7129.141115] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7129.149075] Kernel panic - not syncing: Fatal exception in interrupt [ 7129.158266] Kernel Offset: 0x2dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 7129.193426] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Issue found on node "blanka" with Focal 5.4.0-190.210 in s2024.06.10 After some manual tests I noticed that this issue is not 100% reproducible:   * Focal 5.4.0-189.209 + -189 source failed with 5 out of 5 attempts   * Focal 5.4.0-190.210 + -190 source failed with 2 out of 5 attempts   * Focal 5.4.0-192.212 + -192 source failed with 3 out of 5 attempts Despite with this high fail rate, I can't see this failure in our test history all the way back to 2024.01.08 (except with 5.4.0-190.210), perhaps we retest it and it has passed? $ sudo ./ftracetest -v test.d/ftrace/func_traceonoff_triggers.tc === Ftrace unit tests === [1] ftrace - test for function traceon/off triggers dmesg output: [ 7112.186092] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1 [ 7128.566260] BUG: kernel NULL pointer dereference, address: 0000000000000050 [ 7128.574031] #PF: supervisor read access in kernel mode [ 7128.579763] #PF: error_code(0x0000) - not-present page [ 7128.585495] PGD 0 P4D 0 [ 7128.588320] Oops: 0000 [#1] SMP NOPTI [ 7128.592405] CPU: 129 PID: 0 Comm: swapper/129 Tainted: G OE 5.4.0-190-generic #210-Ubuntu [ 7128.602887] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 7128.612119] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.618820] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7128.639774] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7128.645604] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7128.653565] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7128.661525] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7128.669486] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7128.677445] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7128.685409] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7128.694437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7128.700847] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7128.708809] Call Trace: [ 7128.711535] <IRQ> [ 7128.713782] ? show_regs.cold+0x1a/0x1f [ 7128.718058] ? __die+0x90/0xd9 [ 7128.721467] ? no_context+0x196/0x380 [ 7128.725555] ? ring_buffer_lock_reserve+0x165/0x3d0 [ 7128.730996] ? ring_buffer_unlock_commit+0x27/0x130 [ 7128.736436] ? __bad_area_nosemaphore+0x50/0x1a0 [ 7128.741585] ? bad_area_nosemaphore+0x16/0x20 [ 7128.746445] ? do_user_addr_fault+0x267/0x450 [ 7128.751306] ? trace_event_raw_event_x86_exceptions+0x80/0xd0 [ 7128.757718] ? __do_page_fault+0x58/0x90 [ 7128.762092] ? do_page_fault+0x2c/0xe0 [ 7128.766276] ? page_fault+0x34/0x40 [ 7128.770167] ? trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7128.776190] wb_timer_fn+0x1d6/0x3c0 [ 7128.780179] ? blk_mq_tag_update_depth+0x100/0x100 [ 7128.785522] blk_stat_timer_fn+0x13a/0x140 [ 7128.790094] call_timer_fn+0x32/0x130 [ 7128.794179] __run_timers.part.0+0x180/0x280 [ 7128.798945] ? trace_event_raw_event_softirq+0x5d/0xa0 [ 7128.804677] run_timer_softirq+0x2a/0x50 [ 7128.809053] __do_softirq+0xd1/0x2c1 [ 7128.813041] irq_exit+0xae/0xb0 [ 7128.816545] smp_apic_timer_interrupt+0x7b/0x140 [ 7128.821696] apic_timer_interrupt+0xf/0x20 [ 7128.826264] </IRQ> [ 7128.828603] RIP: 0010:native_safe_halt+0xe/0x10 [ 7128.833655] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d a6 14 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 96 14 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 ed 46 61 ff 65 [ 7128.854607] RSP: 0018:ffffb177992fbe70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 7128.863053] RAX: 0000000000023800 RBX: ffff90676cadbf28 RCX: 000000000003514a [ 7128.871013] RDX: 000000000003514a RSI: 0000000000000000 RDI: ffffffffb06c6120 [ 7128.878974] RBP: ffffb177992fbe90 R08: 00000000000001ac R09: 0000000000000000 [ 7128.886936] R10: 0000000000020000 R11: 0000000000000002 R12: 0000000000000081 [ 7128.894895] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 7128.902858] ? default_idle+0x20/0x140 [ 7128.907040] arch_cpu_idle+0x15/0x20 [ 7128.911027] default_idle_call+0x23/0x30 [ 7128.915403] do_idle+0x1fb/0x270 [ 7128.919004] ? complete+0x49/0x50 [ 7128.922699] cpu_startup_entry+0x20/0x30 [ 7128.927074] start_secondary+0x178/0x1d0 [ 7128.931452] secondary_startup_64+0xa4/0xb0 [ 7128.936116] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd kvm_amd kvm ipmi_ssif input_leds binfmt_misc mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr ramoops reed_solomon efi_pstore ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast crct10dif_pclmul crc32_pclmul drm_vram_helper ghash_clmulni_intel mlx5_core(OE) ttm aesni_intel crypto_simd drm_kms_helper pci_hyperv_intf mlxdevm(OE) cryptd syscopyarea sysfillrect auxiliary(OE) glue_helper igb tls sysimgblt uas hid_generic mpt3sas mlxfw(OE) psample dca raid_class usbhid fb_sys_fops scsi_transport_sas nvme i2c_algo_bit usb_storage hid drm mlx_compat(OE) nvme_core i2c_piix4 [ 7129.023659] CR2: 0000000000000050 [ 7129.027471] ---[ end trace 5d27e00102fa9701 ]--- [ 7129.052391] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 7129.059093] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 b0 ab c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 12 [ 7129.080046] RSP: 0018:ffffb1779b140da0 EFLAGS: 00010282 [ 7129.085875] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000080000100 [ 7129.093835] RDX: ffff8fe7ae7052f8 RSI: 0000000000000100 RDI: ffff8fe7ae7052f4 [ 7129.101795] RBP: ffffb1779b140e08 R08: ffff8fe7ae7052f4 R09: 0000000000000100 [ 7129.109757] R10: ffffd1777fdcb960 R11: 0000000000000000 R12: ffff8fe7ae7052f8 [ 7129.117718] R13: 00000000ffffffff R14: 0000000000000002 R15: ffff9047fe8f5000 [ 7129.125680] FS: 0000000000000000(0000) GS:ffff8fe80f840000(0000) knlGS:0000000000000000 [ 7129.134706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7129.141115] CR2: 0000000000000050 CR3: 000000cb2760a000 CR4: 0000000000340ee0 [ 7129.149075] Kernel panic - not syncing: Fatal exception in interrupt [ 7129.158266] Kernel Offset: 0x2dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 7129.193426] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
2024-07-18 02:17:44 Po-Hsu Lin summary ftrace:test.d--ftrace--func_traceonoff_triggers.tc in ubuntu_kselftests_ftrace triggers kernel NULL pointer dereference ftrace:test.d--ftrace--func_traceonoff_triggers.tc in ubuntu_kselftests_ftrace triggers kernel NULL pointer dereference on node blanka