dev test from ubuntu_stress_smoke_tests cause kernel oops on F-5.4 xilinx ZCU106

Bug #1998738 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned
linux-xilinx-zynqmp (Ubuntu)
New
Undecided
Unassigned
Focal
New
Undecided
Unassigned

Bug Description

This issue can only be reproduced on ZCU106, it will cause some leftover processes running and eventually cause the jenkins job hang.

stress-ng with commit 91ec6bccd7 (V0.15.00)

 stress-ng: invoked with './stress-ng -v -t 5 --dev 4 --dev-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
 stress-ng: system: '202008-28164-ZCU106' Linux 5.4.0-1019-xilinx-zynqmp #22-Ubuntu SMP Thu Nov 17 05:04:22 UTC 2022 aarch64
 stress-ng: memory (MB): total 3929.76, free 2479.07, shared 4.30, buffer 59.98, swap 0.00, free swap 0.00
 stress-ng: info: [3037] setting to a 5 second run per stressor
 stress-ng: info: [3037] dispatching hogs: 4 dev
 kernel: [ 981.702313] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance created
 kernel: [ 981.702829] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance released
 kernel: [ 981.708039] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance created
 kernel: [ 981.708569] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance released
 kernel: [ 981.709027] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance created
 kernel: [ 981.709501] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance released
 kernel: [ 981.734320] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance created
 kernel: [ 981.734859] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance released

Message from syslogd@202008-28164-ZCU106 at Dec 5 05:11:01 ...
 kernel:[ 981.797006] Internal error: Oops: 96000004 [#1] SMP
 kernel: [ 981.768878] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance created
 kernel: [ 981.768958] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.768961] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000087000000f48
 kernel: [ 981.768966] Mem abort info:
 kernel: [ 981.779704] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.782475] ESR = 0x96000004
 kernel: [ 981.782478] EC = 0x25: DABT (current EL), IL = 32 bits
 kernel: [ 981.782480] SET = 0, FnV = 0
 kernel: [ 981.782484] EA = 0, S1PTW = 0
 kernel: [ 981.785524] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.790822] Data abort info:
 kernel: [ 981.790824] ISV = 0, ISS = 0x00000004
 kernel: [ 981.790826] CM = 0, WnR = 0
 kernel: [ 981.790830] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000838768000
 kernel: [ 981.790833] [0000087000000f48] pgd=0000000000000000
 kernel: [ 981.793875] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.797006] Internal error: Oops: 96000004 [#1] SMP
 kernel: [ 981.797010] Modules linked in: xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_nat xt_CHECKSUM iptable_nat xt_MASQUERADE nf_nat iptable_filter fuse dm_multipath dm_mod al5e al5d allegro xlnx_vcu_clk xlnx_vcu xilinx_hdmi_tx xilinx_hdmi_rx xlnx_vcu_core dp159 xilinx_vphy lm63 ina2xx_adc mali dmaproxy nfsd zocl
 kernel: [ 981.805628] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.808485] CPU: 1 PID: 3044 Comm: stress-ng-dev Not tainted 5.4.0-1019-xilinx-zynqmp #22-Ubuntu
 kernel: [ 981.808487] Hardware name: ZynqMP ZCU106 RevA (DT)
 kernel: [ 981.808491] pstate: 00400005 (nzcv daif +PAN -UAO)
 kernel: [ 981.812321] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.815269] pc : __mutex_lock.isra.0+0x170/0x510
 kernel: [ 981.815273] lr : __mutex_lock_slowpath+0x28/0x38
 kernel: [ 981.815276] sp : ffff800017c3bb30
 kernel: [ 981.821772] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.826563] x29: ffff800017c3bb30 x28: ffff00083460ec00
 kernel: [ 981.826567] x27: 0000ffffb3f2f000 x26: ffff000855fda500
 kernel: [ 981.826571] x25: 0000000000000000 x24: ffff0008498fd400
 kernel: [ 981.826574] x23: 0000000000000031 x22: ffff000875878750
 kernel: [ 981.826578] x21: 0000000000000002 x20: ffff0008385d4e40
 kernel: [ 981.835222] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.840035] x19: ffff0008758787f0 x18: 0000000000000000
 kernel: [ 981.840039] x17: 0000000000000000 x16: 0000000000000000
 kernel: [ 981.840042] x15: 0000000000000000 x14: 0000000000000000
 kernel: [ 981.840046] x13: 0000000000000000 x12: 0000000000000000
 kernel: [ 981.868428] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.875905] x11: 0000000000000000 x10: 0000000000100000
 kernel: [ 981.875909] x9 : 00000000000000fb x8 : 0000000010044400
 kernel: [ 981.875912] x7 : 0000000000000000 x6 : ffff00083460e0c0
 kernel: [ 981.875915] x5 : 0000000000000015 x4 : 0000000000000014
 kernel: [ 981.875919] x3 : 0000087000000f00 x2 : ffff0008385d4e40
 kernel: [ 981.875922] x1 : 0000087000000f00 x0 : 0000087000000f00
 kernel: [ 981.875926] Call trace:
 kernel: [ 981.875933] __mutex_lock.isra.0+0x170/0x510
 kernel: [ 981.875939] __mutex_lock_slowpath+0x28/0x38
 kernel: [ 981.885784] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.889485] mutex_lock+0x48/0x58
 kernel: [ 981.889491] xm2msc_mmap+0x38/0x68
 kernel: [ 981.889497] v4l2_mmap+0x7c/0xb8
 kernel: [ 981.889504] mmap_region+0x364/0x5b0
 kernel: [ 981.889511] do_mmap+0x294/0x478
 kernel: [ 981.894358] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.902880] vm_mmap_pgoff+0xf4/0x120
 kernel: [ 981.902885] ksys_mmap_pgoff+0x1ac/0x240
 kernel: [ 981.902891] __arm64_sys_mmap+0x38/0x50
 kernel: [ 981.902897] el0_svc_common.constprop.0+0x78/0x180
 kernel: [ 981.902903] el0_svc_handler+0x84/0xa0

Message from syslogd@202008-28164-ZCU106 at Dec 5 05:11:01 ...
 kernel:[ 981.912115] Code: a94153f3 a9425bf5 a8c97bfd d65f03c0 (b9404801)
 kernel: [ 981.907665] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
 kernel: [ 981.912107] el0_svc+0x8/0x1c0
 kernel: [ 981.912115] Code: a94153f3 a9425bf5 a8c97bfd d65f03c0 (b9404801)
 kernel: [ 981.912121] ---[ end trace bab66edb32cbb4db ]---

Here is the output when running this test:
$ time sudo ./stress-ng -v -t 5 --dev 4 --dev-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable
stress-ng: debug: [3037] invoked with './stress-ng -v -t 5 --dev 4 --dev-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
stress-ng: debug: [3037] stress-ng 0.15.00 g91ec6bccd7e9
stress-ng: debug: [3037] system: Linux 202008-28164-ZCU106 5.4.0-1019-xilinx-zynqmp #22-Ubuntu SMP Thu Nov 17 05:04:22 UTC 2022 aarch64
stress-ng: debug: [3037] RAM total: 3.8G, RAM free: 2.4G, swap free: 0.0
stress-ng: debug: [3037] temporary file path: '.', filesystem type: ext2
stress-ng: debug: [3037] 4 processors online, 4 processors configured
stress-ng: info: [3037] setting to a 5 second run per stressor
stress-ng: info: [3037] dispatching hogs: 4 dev
stress-ng: debug: [3037] cache allocate: using defaults, cannot determine cache level details
stress-ng: debug: [3037] cache allocate: shared cache buffer size: 2048K
stress-ng: debug: [3037] starting stressors
stress-ng: debug: [3039] dev: started [3039] (instance 0)
stress-ng: debug: [3040] dev: started [3040] (instance 1)
stress-ng: debug: [3037] 4 stressors started
stress-ng: debug: [3041] dev: started [3041] (instance 2)
stress-ng: debug: [3042] dev: started [3042] (instance 3)

Message from syslogd@202008-28164-ZCU106 at Dec 5 05:11:01 ...
 kernel:[ 981.797006] Internal error: Oops: 96000004 [#1] SMP

Message from syslogd@202008-28164-ZCU106 at Dec 5 05:11:01 ...
 kernel:[ 981.912115] Code: a94153f3 a9425bf5 a8c97bfd d65f03c0 (b9404801)
stress-ng: debug: [3042] dev: exited [3042] (instance 3)
stress-ng: debug: [3041] dev: exited [3041] (instance 2)
stress-ng: info: [3039] dev: 19 of 383 devices opened and exercised
stress-ng: debug: [3039] dev: exited [3039] (instance 0)
stress-ng: debug: [3037] process [3039] terminated
(hung here)

You can see process 3040 did not exit here.

strace output:
$ sudo strace -p 3040
strace: Process 3040 attached
wait4(3044, 0xffffda2c3214, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
getpid() = 3040
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=1, tv_usec=0}}, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}) = 0
rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
kill(3044, SIGALRM) = 0
kill(3044, SIGKILL) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, {tv_sec=0, tv_nsec=989179}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
getpid() = 3040
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=1, tv_usec=0}}, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}) = 0
rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
wait4(3044, 0xffffda2c3214, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
getpid() = 3040
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=1, tv_usec=0}}, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}) = 0
rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
kill(3044, SIGALRM) = 0
kill(3044, SIGKILL) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, {tv_sec=0, tv_nsec=505466}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
(repeats)

Po-Hsu Lin (cypressyew)
tags: added: 5.4 focal ubuntu-stress-smoke-test
Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Tested with older version of stress-ng (cacea49)[1]
This issue still exist.

[1] https://lists.ubuntu.com/archives/kernel-team/2022-November/134872.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.