After splitting ubuntu_kselftests_ftrace out and run test cases one-by-one, we can see it's failing with the second test case, ftrace:test.d--00basic--basic2.tc, on J-intel-iotg-5.15.0-1048.54 with node rizzo.
However I was unable to reproduce this manually on rizzo:
* Passed with running just the ftrace:test.d--00basic--basic2.tc, with "./ftracetest -vvv test.d/00basic/basic2.tc"
* Passed with running basic2.tc multiple times.
* Passed with running the 1st test case and the offending basic2.tc test case.
* Passed with running the whole test suite.
But if you try to run this remotely from out build server:
SRU_CYCLE="2024.01.08-1" INSTANCE_TYPE="rizzo" timeout 180m $KT/sut-test --nc --region kernel $DEBUG metal $SUT jammy ubuntu_kselftests_ftrace $HOME
It will panic right away when hitting the second test case. It looks like it has something to do with CPU hotplug:
[ 5990.967618] mmiotrace: Disabling non-boot CPUs...
[ 5991.032796] smpboot: CPU 1 is now offline
[ 5991.052877] mmiotrace: CPU1 is down.
[ 5991.124833] smpboot: CPU 2 is now offline
[ 5991.140709] mmiotrace: CPU2 is down.
[ 5991.196717] smpboot: CPU 3 is now offline
[ 5991.216486] mmiotrace: CPU3 is down.
[ 5991.233400] smpboot: CPU 4 is now offline
[ 5991.272507] mmiotrace: CPU4 is down.
[ 5991.313356] smpboot: CPU 5 is now offline
[ 5991.328204] mmiotrace: CPU5 is down.
[ 5991.353591] smpboot: CPU 6 is now offline
[ 5991.376155] mmiotrace: CPU6 is down.
[ 5991.393484] smpboot: CPU 7 is now offline
[ 5991.394580] mmiotrace: CPU7 is down.
[ 5991.394586] mmiotrace: enabled.
[ 5991.394693] mmiotrace: Re-enabling CPUs...
[ 5991.394761] x86: Booting SMP configuration:
[ 5991.394763] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 5991.432595] mmiotrace: enabled CPU1.
[ 5991.479537] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 5991.508524] mmiotrace: enabled CPU2.
[ 5991.547586] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 5991.576690] mmiotrace: enabled CPU3.
[ 5991.619582] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 5991.639516] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 5991.646618] #PF: supervisor instruction fetch in kernel mode
[ 5991.652336] #PF: error_code(0x0010) - not-present page
[ 5991.657530] PGD 0 P4D 0
[ 5991.660096] Oops: 0010 [#1] SMP PTI
[ 5991.663626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-1048-intel-iotg #54-Ubuntu
[ 5991.671709] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.12.0 09/06/2013
[ 5991.679350] RIP: 0010:0x0
[ 5991.682010] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 5991.688955] RSP: 0018:ffffb92a40003e90 EFLAGS: 00010097
[ 5991.694233] RAX: 0000000000000000 RBX: 00000000000231f0 RCX: 0000000000000004
[ 5991.701435] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff9c20c007b990
[ 5991.708639] RBP: ffffb92a40003eb8 R08: ffff9c20c007b990 R09: 0000000000000001
[ 5991.715842] R10: 0000000000000020 R11: ffffffffffffffff R12: ffff9c20c007b990
[ 5991.723050] R13: 00000572e77e8500 R14: 0000000000000004 R15: 0000000000000000
[ 5991.730252] FS: 0000000000000000(0000) GS:ffff9c21f7600000(0000) knlGS:0000000000000000
[ 5991.738417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5991.744218] CR2: ffffffffffffffd6 CR3: 0000000010c10000 CR4: 00000000000006f0
[ 5991.751425] Call Trace:
[ 5991.753903] <IRQ>
[ 5991.755945] ? show_trace_log_lvl+0x1d6/0x2ea
[ 5991.760362] ? show_trace_log_lvl+0x1d6/0x2ea
[ 5991.764773] ? tick_do_broadcast+0xa1/0xd0
[ 5991.768922] ? show_regs.part.0+0x23/0x29
[ 5991.773026] ? __die_body.cold+0x8/0xd
[ 5991.776821] ? __die+0x2b/0x37
[ 5991.779918] ? page_fault_oops+0x13b/0x170
[ 5991.784063] ? do_user_addr_fault+0x321/0x670
[ 5991.788476] ? obj_cgroup_uncharge_pages+0x68/0xf0
[ 5991.793324] ? exc_page_fault+0x77/0x170
[ 5991.797293] ? asm_exc_page_fault+0x27/0x30
[ 5991.801529] tick_do_broadcast+0xa1/0xd0
[ 5991.805501] tick_handle_oneshot_broadcast+0x14d/0x200
[ 5991.810694] timer_interrupt+0x18/0x30
[ 5991.814495] __handle_irq_event_percpu+0x42/0x170
[ 5991.819255] handle_irq_event+0x59/0xb0
[ 5991.823136] handle_edge_irq+0x8c/0x230
[ 5991.827019] __common_interrupt+0x52/0xe0
[ 5991.831078] common_interrupt+0x89/0xa0
[ 5991.834966] </IRQ>
[ 5991.837098] <TASK>
[ 5991.839247] asm_common_interrupt+0x27/0x40
After splitting ubuntu_ kselftests_ ftrace out and run test cases one-by-one, we can see it's failing with the second test case, ftrace: test.d- -00basic- -basic2. tc, on J-intel- iotg-5. 15.0-1048. 54 with node rizzo.
However I was unable to reproduce this manually on rizzo: test.d- -00basic- -basic2. tc, with "./ftracetest -vvv test.d/ 00basic/ basic2. tc"
* Passed with running just the ftrace:
* Passed with running basic2.tc multiple times.
* Passed with running the 1st test case and the offending basic2.tc test case.
* Passed with running the whole test suite.
But if you try to run this remotely from out build server: "2024.01. 08-1" INSTANCE_ TYPE="rizzo" timeout 180m $KT/sut-test --nc --region kernel $DEBUG metal $SUT jammy ubuntu_ kselftests_ ftrace $HOME
SRU_CYCLE=
It will panic right away when hitting the second test case. It looks like it has something to do with CPU hotplug: 1048-intel- iotg #54-Ubuntu 003e90 EFLAGS: 00010097 0(0000) GS:ffff9c21f760 0000(0000) knlGS:000000000 0000000 log_lvl+ 0x1d6/0x2ea log_lvl+ 0x1d6/0x2ea broadcast+ 0xa1/0xd0 part.0+ 0x23/0x29 cold+0x8/ 0xd oops+0x13b/ 0x170 addr_fault+ 0x321/0x670 uncharge_ pages+0x68/ 0xf0 fault+0x77/ 0x170 page_fault+ 0x27/0x30 broadcast+ 0xa1/0xd0 oneshot_ broadcast+ 0x14d/0x200 +0x18/0x30 irq_event_ percpu+ 0x42/0x170 irq_event+ 0x59/0xb0 edge_irq+ 0x8c/0x230 interrupt+ 0x52/0xe0 interrupt+ 0x89/0xa0 interrupt+ 0x27/0x40
[ 5990.967618] mmiotrace: Disabling non-boot CPUs...
[ 5991.032796] smpboot: CPU 1 is now offline
[ 5991.052877] mmiotrace: CPU1 is down.
[ 5991.124833] smpboot: CPU 2 is now offline
[ 5991.140709] mmiotrace: CPU2 is down.
[ 5991.196717] smpboot: CPU 3 is now offline
[ 5991.216486] mmiotrace: CPU3 is down.
[ 5991.233400] smpboot: CPU 4 is now offline
[ 5991.272507] mmiotrace: CPU4 is down.
[ 5991.313356] smpboot: CPU 5 is now offline
[ 5991.328204] mmiotrace: CPU5 is down.
[ 5991.353591] smpboot: CPU 6 is now offline
[ 5991.376155] mmiotrace: CPU6 is down.
[ 5991.393484] smpboot: CPU 7 is now offline
[ 5991.394580] mmiotrace: CPU7 is down.
[ 5991.394586] mmiotrace: enabled.
[ 5991.394693] mmiotrace: Re-enabling CPUs...
[ 5991.394761] x86: Booting SMP configuration:
[ 5991.394763] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 5991.432595] mmiotrace: enabled CPU1.
[ 5991.479537] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 5991.508524] mmiotrace: enabled CPU2.
[ 5991.547586] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 5991.576690] mmiotrace: enabled CPU3.
[ 5991.619582] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 5991.639516] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 5991.646618] #PF: supervisor instruction fetch in kernel mode
[ 5991.652336] #PF: error_code(0x0010) - not-present page
[ 5991.657530] PGD 0 P4D 0
[ 5991.660096] Oops: 0010 [#1] SMP PTI
[ 5991.663626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-
[ 5991.671709] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.12.0 09/06/2013
[ 5991.679350] RIP: 0010:0x0
[ 5991.682010] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 5991.688955] RSP: 0018:ffffb92a40
[ 5991.694233] RAX: 0000000000000000 RBX: 00000000000231f0 RCX: 0000000000000004
[ 5991.701435] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff9c20c007b990
[ 5991.708639] RBP: ffffb92a40003eb8 R08: ffff9c20c007b990 R09: 0000000000000001
[ 5991.715842] R10: 0000000000000020 R11: ffffffffffffffff R12: ffff9c20c007b990
[ 5991.723050] R13: 00000572e77e8500 R14: 0000000000000004 R15: 0000000000000000
[ 5991.730252] FS: 000000000000000
[ 5991.738417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5991.744218] CR2: ffffffffffffffd6 CR3: 0000000010c10000 CR4: 00000000000006f0
[ 5991.751425] Call Trace:
[ 5991.753903] <IRQ>
[ 5991.755945] ? show_trace_
[ 5991.760362] ? show_trace_
[ 5991.764773] ? tick_do_
[ 5991.768922] ? show_regs.
[ 5991.773026] ? __die_body.
[ 5991.776821] ? __die+0x2b/0x37
[ 5991.779918] ? page_fault_
[ 5991.784063] ? do_user_
[ 5991.788476] ? obj_cgroup_
[ 5991.793324] ? exc_page_
[ 5991.797293] ? asm_exc_
[ 5991.801529] tick_do_
[ 5991.805501] tick_handle_
[ 5991.810694] timer_interrupt
[ 5991.814495] __handle_
[ 5991.819255] handle_
[ 5991.823136] handle_
[ 5991.827019] __common_
[ 5991.831078] common_
[ 5991.834966] </IRQ>
[ 5991.837098] <TASK>
[ 5991.839247] asm_common_