dev test from ubuntu_stress_smoke_tests cause kernel oops on F-5.4 xilinx ZCU106
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
New
|
Undecided
|
Unassigned | ||
linux-xilinx-zynqmp (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Focal |
New
|
Undecided
|
Unassigned |
Bug Description
This issue can only be reproduced on ZCU106, it will cause some leftover processes running and eventually cause the jenkins job hang.
stress-ng with commit 91ec6bccd7 (V0.15.00)
stress-ng: invoked with './stress-ng -v -t 5 --dev 4 --dev-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
stress-ng: system: '202008-
stress-ng: memory (MB): total 3929.76, free 2479.07, shared 4.30, buffer 59.98, swap 0.00, free swap 0.00
stress-ng: info: [3037] setting to a 5 second run per stressor
stress-ng: info: [3037] dispatching hogs: 4 dev
kernel: [ 981.702313] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance created
kernel: [ 981.702829] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance released
kernel: [ 981.708039] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance created
kernel: [ 981.708569] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance released
kernel: [ 981.709027] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance created
kernel: [ 981.709501] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance released
kernel: [ 981.734320] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance created
kernel: [ 981.734859] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance released
Message from syslogd@
kernel:[ 981.797006] Internal error: Oops: 96000004 [#1] SMP
kernel: [ 981.768878] xilinx-multiscaler a00e0000.v_multi: Channel 0 instance created
kernel: [ 981.768958] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.768961] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000087000000f48
kernel: [ 981.768966] Mem abort info:
kernel: [ 981.779704] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.782475] ESR = 0x96000004
kernel: [ 981.782478] EC = 0x25: DABT (current EL), IL = 32 bits
kernel: [ 981.782480] SET = 0, FnV = 0
kernel: [ 981.782484] EA = 0, S1PTW = 0
kernel: [ 981.785524] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.790822] Data abort info:
kernel: [ 981.790824] ISV = 0, ISS = 0x00000004
kernel: [ 981.790826] CM = 0, WnR = 0
kernel: [ 981.790830] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000838
kernel: [ 981.790833] [0000087000000f48] pgd=00000000000
kernel: [ 981.793875] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.797006] Internal error: Oops: 96000004 [#1] SMP
kernel: [ 981.797010] Modules linked in: xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_nat xt_CHECKSUM iptable_nat xt_MASQUERADE nf_nat iptable_filter fuse dm_multipath dm_mod al5e al5d allegro xlnx_vcu_clk xlnx_vcu xilinx_hdmi_tx xilinx_hdmi_rx xlnx_vcu_core dp159 xilinx_vphy lm63 ina2xx_adc mali dmaproxy nfsd zocl
kernel: [ 981.805628] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.808485] CPU: 1 PID: 3044 Comm: stress-ng-dev Not tainted 5.4.0-1019-
kernel: [ 981.808487] Hardware name: ZynqMP ZCU106 RevA (DT)
kernel: [ 981.808491] pstate: 00400005 (nzcv daif +PAN -UAO)
kernel: [ 981.812321] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.815269] pc : __mutex_
kernel: [ 981.815273] lr : __mutex_
kernel: [ 981.815276] sp : ffff800017c3bb30
kernel: [ 981.821772] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.826563] x29: ffff800017c3bb30 x28: ffff00083460ec00
kernel: [ 981.826567] x27: 0000ffffb3f2f000 x26: ffff000855fda500
kernel: [ 981.826571] x25: 0000000000000000 x24: ffff0008498fd400
kernel: [ 981.826574] x23: 0000000000000031 x22: ffff000875878750
kernel: [ 981.826578] x21: 0000000000000002 x20: ffff0008385d4e40
kernel: [ 981.835222] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.840035] x19: ffff0008758787f0 x18: 0000000000000000
kernel: [ 981.840039] x17: 0000000000000000 x16: 0000000000000000
kernel: [ 981.840042] x15: 0000000000000000 x14: 0000000000000000
kernel: [ 981.840046] x13: 0000000000000000 x12: 0000000000000000
kernel: [ 981.868428] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.875905] x11: 0000000000000000 x10: 0000000000100000
kernel: [ 981.875909] x9 : 00000000000000fb x8 : 0000000010044400
kernel: [ 981.875912] x7 : 0000000000000000 x6 : ffff00083460e0c0
kernel: [ 981.875915] x5 : 0000000000000015 x4 : 0000000000000014
kernel: [ 981.875919] x3 : 0000087000000f00 x2 : ffff0008385d4e40
kernel: [ 981.875922] x1 : 0000087000000f00 x0 : 0000087000000f00
kernel: [ 981.875926] Call trace:
kernel: [ 981.875933] __mutex_
kernel: [ 981.875939] __mutex_
kernel: [ 981.885784] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.889485] mutex_lock+
kernel: [ 981.889491] xm2msc_
kernel: [ 981.889497] v4l2_mmap+0x7c/0xb8
kernel: [ 981.889504] mmap_region+
kernel: [ 981.889511] do_mmap+0x294/0x478
kernel: [ 981.894358] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.902880] vm_mmap_
kernel: [ 981.902885] ksys_mmap_
kernel: [ 981.902891] __arm64_
kernel: [ 981.902897] el0_svc_
kernel: [ 981.902903] el0_svc_
Message from syslogd@
kernel:[ 981.912115] Code: a94153f3 a9425bf5 a8c97bfd d65f03c0 (b9404801)
kernel: [ 981.907665] xilinx-multiscaler a00e0000.v_multi: xm2msc_open Chan already opened for minor = 1
kernel: [ 981.912107] el0_svc+0x8/0x1c0
kernel: [ 981.912115] Code: a94153f3 a9425bf5 a8c97bfd d65f03c0 (b9404801)
kernel: [ 981.912121] ---[ end trace bab66edb32cbb4db ]---
Here is the output when running this test:
$ time sudo ./stress-ng -v -t 5 --dev 4 --dev-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable
stress-ng: debug: [3037] invoked with './stress-ng -v -t 5 --dev 4 --dev-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
stress-ng: debug: [3037] stress-ng 0.15.00 g91ec6bccd7e9
stress-ng: debug: [3037] system: Linux 202008-28164-ZCU106 5.4.0-1019-
stress-ng: debug: [3037] RAM total: 3.8G, RAM free: 2.4G, swap free: 0.0
stress-ng: debug: [3037] temporary file path: '.', filesystem type: ext2
stress-ng: debug: [3037] 4 processors online, 4 processors configured
stress-ng: info: [3037] setting to a 5 second run per stressor
stress-ng: info: [3037] dispatching hogs: 4 dev
stress-ng: debug: [3037] cache allocate: using defaults, cannot determine cache level details
stress-ng: debug: [3037] cache allocate: shared cache buffer size: 2048K
stress-ng: debug: [3037] starting stressors
stress-ng: debug: [3039] dev: started [3039] (instance 0)
stress-ng: debug: [3040] dev: started [3040] (instance 1)
stress-ng: debug: [3037] 4 stressors started
stress-ng: debug: [3041] dev: started [3041] (instance 2)
stress-ng: debug: [3042] dev: started [3042] (instance 3)
Message from syslogd@
kernel:[ 981.797006] Internal error: Oops: 96000004 [#1] SMP
Message from syslogd@
kernel:[ 981.912115] Code: a94153f3 a9425bf5 a8c97bfd d65f03c0 (b9404801)
stress-ng: debug: [3042] dev: exited [3042] (instance 3)
stress-ng: debug: [3041] dev: exited [3041] (instance 2)
stress-ng: info: [3039] dev: 19 of 383 devices opened and exercised
stress-ng: debug: [3039] dev: exited [3039] (instance 0)
stress-ng: debug: [3037] process [3039] terminated
(hung here)
You can see process 3040 did not exit here.
strace output:
$ sudo strace -p 3040
strace: Process 3040 attached
wait4(3044, 0xffffda2c3214, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
getpid() = 3040
setitimer(
rt_sigreturn(
kill(3044, SIGALRM) = 0
kill(3044, SIGKILL) = 0
clock_nanosleep
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
getpid() = 3040
setitimer(
rt_sigreturn(
wait4(3044, 0xffffda2c3214, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
getpid() = 3040
setitimer(
rt_sigreturn(
kill(3044, SIGALRM) = 0
kill(3044, SIGKILL) = 0
clock_nanosleep
(repeats)
tags: | added: 5.4 focal ubuntu-stress-smoke-test |
description: | updated |
Changed in linux-xilinx-zynqmp (Ubuntu): | |
status: | New → Invalid |
Tested with older version of stress-ng (cacea49)[1]
This issue still exist.
[1] https:/ /lists. ubuntu. com/archives/ kernel- team/2022- November/ 134872. html