Comment 3 for bug 2020607

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

After splitting ubuntu_kselftests_ftrace out and run test cases one-by-one, we can see it's failing with the second test case, ftrace:test.d--00basic--basic2.tc, on J-intel-iotg-5.15.0-1048.54 with node rizzo.

However I was unable to reproduce this manually on rizzo:
  * Passed with running just the ftrace:test.d--00basic--basic2.tc, with "./ftracetest -vvv test.d/00basic/basic2.tc"
  * Passed with running basic2.tc multiple times.
  * Passed with running the 1st test case and the offending basic2.tc test case.
  * Passed with running the whole test suite.

But if you try to run this remotely from out build server:
  SRU_CYCLE="2024.01.08-1" INSTANCE_TYPE="rizzo" timeout 180m $KT/sut-test --nc --region kernel $DEBUG metal $SUT jammy ubuntu_kselftests_ftrace $HOME

It will panic right away when hitting the second test case. It looks like it has something to do with CPU hotplug:
[ 5990.967618] mmiotrace: Disabling non-boot CPUs...
[ 5991.032796] smpboot: CPU 1 is now offline
[ 5991.052877] mmiotrace: CPU1 is down.
[ 5991.124833] smpboot: CPU 2 is now offline
[ 5991.140709] mmiotrace: CPU2 is down.
[ 5991.196717] smpboot: CPU 3 is now offline
[ 5991.216486] mmiotrace: CPU3 is down.
[ 5991.233400] smpboot: CPU 4 is now offline
[ 5991.272507] mmiotrace: CPU4 is down.
[ 5991.313356] smpboot: CPU 5 is now offline
[ 5991.328204] mmiotrace: CPU5 is down.
[ 5991.353591] smpboot: CPU 6 is now offline
[ 5991.376155] mmiotrace: CPU6 is down.
[ 5991.393484] smpboot: CPU 7 is now offline
[ 5991.394580] mmiotrace: CPU7 is down.
[ 5991.394586] mmiotrace: enabled.
[ 5991.394693] mmiotrace: Re-enabling CPUs...
[ 5991.394761] x86: Booting SMP configuration:
[ 5991.394763] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 5991.432595] mmiotrace: enabled CPU1.
[ 5991.479537] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 5991.508524] mmiotrace: enabled CPU2.
[ 5991.547586] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 5991.576690] mmiotrace: enabled CPU3.
[ 5991.619582] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 5991.639516] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 5991.646618] #PF: supervisor instruction fetch in kernel mode
[ 5991.652336] #PF: error_code(0x0010) - not-present page
[ 5991.657530] PGD 0 P4D 0
[ 5991.660096] Oops: 0010 [#1] SMP PTI
[ 5991.663626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-1048-intel-iotg #54-Ubuntu
[ 5991.671709] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.12.0 09/06/2013
[ 5991.679350] RIP: 0010:0x0
[ 5991.682010] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 5991.688955] RSP: 0018:ffffb92a40003e90 EFLAGS: 00010097
[ 5991.694233] RAX: 0000000000000000 RBX: 00000000000231f0 RCX: 0000000000000004
[ 5991.701435] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff9c20c007b990
[ 5991.708639] RBP: ffffb92a40003eb8 R08: ffff9c20c007b990 R09: 0000000000000001
[ 5991.715842] R10: 0000000000000020 R11: ffffffffffffffff R12: ffff9c20c007b990
[ 5991.723050] R13: 00000572e77e8500 R14: 0000000000000004 R15: 0000000000000000
[ 5991.730252] FS: 0000000000000000(0000) GS:ffff9c21f7600000(0000) knlGS:0000000000000000
[ 5991.738417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5991.744218] CR2: ffffffffffffffd6 CR3: 0000000010c10000 CR4: 00000000000006f0
[ 5991.751425] Call Trace:
[ 5991.753903] <IRQ>
[ 5991.755945] ? show_trace_log_lvl+0x1d6/0x2ea
[ 5991.760362] ? show_trace_log_lvl+0x1d6/0x2ea
[ 5991.764773] ? tick_do_broadcast+0xa1/0xd0
[ 5991.768922] ? show_regs.part.0+0x23/0x29
[ 5991.773026] ? __die_body.cold+0x8/0xd
[ 5991.776821] ? __die+0x2b/0x37
[ 5991.779918] ? page_fault_oops+0x13b/0x170
[ 5991.784063] ? do_user_addr_fault+0x321/0x670
[ 5991.788476] ? obj_cgroup_uncharge_pages+0x68/0xf0
[ 5991.793324] ? exc_page_fault+0x77/0x170
[ 5991.797293] ? asm_exc_page_fault+0x27/0x30
[ 5991.801529] tick_do_broadcast+0xa1/0xd0
[ 5991.805501] tick_handle_oneshot_broadcast+0x14d/0x200
[ 5991.810694] timer_interrupt+0x18/0x30
[ 5991.814495] __handle_irq_event_percpu+0x42/0x170
[ 5991.819255] handle_irq_event+0x59/0xb0
[ 5991.823136] handle_edge_irq+0x8c/0x230
[ 5991.827019] __common_interrupt+0x52/0xe0
[ 5991.831078] common_interrupt+0x89/0xa0
[ 5991.834966] </IRQ>
[ 5991.837098] <TASK>
[ 5991.839247] asm_common_interrupt+0x27/0x40