ubuntu_stress_smoke_tests hangs with swapoff command on F-390x
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
New
|
Undecided
|
Unassigned |
Bug Description
Issue found on kernel04 with kernel 5.4.0-110-generic
Test passed, but the test will gets killed by the autotest timeout setting (2100 seconds) in the end.
Summary:
Stressors run: 229
Skipped: 3, binderfs pci smi
Failed: 0,
Oopsed: 0,
Oomed: 0,
Passed: 226, access af-alg affinity aio aiol alarm bad-altstack bad-ioctl bigheap branch brk cache cap chattr chdir chmod chown chroot clock close context cpu crypt cyclic daemon dccp dentry dev dev-shm dir dirdeep dirmany dnotify dup dynlib enosys env epoll eventfd exit-group fallocate fanotify fault fcntl fiemap fifo file-ioctl filename flock fork fp-error fpunch fstat full funcret futex get getdent getrandom goto handle hash hdd hrtimers icache icmp-flood inode-flags inotify io iomix ioprio io-uring ipsec-mb itimer judy key kill klog kvm landlock lease link list loadavg locka lockbus lockf lockofd loop madvise malloc mcontend membarrier memfd memhotplug memrate memthrash mergesort mincore misaligned mknod mlock mmap mmapaddr mmapfixed mmapfork mmaphuge mmapmany mq mremap msg msync munmap nanosleep netdev netlink-proc netlink-task nice null open pageswap personality physpage pidfd ping-sock pipe pipeherd pkey poll prctl prefetch procfs pthread ptrace pty radixsort randlist ramfs rawdev rawpkt rawsock rawudp readahead reboot rename resched revio rlimit rmap rseq rtc schedpolicy sctp seal seccomp secretmem seek sem sem-sysv sendfile session set shellsort shm shm-sysv sigabrt sigchld sigfd sigfpe sigio signal signest sigpending sigpipe sigq sigrt sigsegv sigsuspend sigtrap skiplist sleep sock sockabuse sockdiag sockmany softlockup sparsematrix splice stackmmap stream swap switch symlink sync-file syncload sysbadaddr sysfs tee timer timerfd tlb-shootdown tmpfs tree tsearch tun udp udp-flood unshare urandom userfaultfd usersyscall utime vdso vecwide verity vfork vm vm-addr vm-rw vm-segv vm-splice wait x86syscall yield zero zombie
Badret: 0,
Tests took 471 seconds to run
There are some stress-ng related process in D state:
$ ps aux | grep stress
root 2247 0.0 0.0 27280 984 pts/1 S 02:28 0:00 /usr/bin/python2 -u autotest/
root 2248 0.0 0.0 27280 984 pts/1 S 02:28 0:00 /usr/bin/python2 -u autotest/
root 152197 0.0 0.5 46080 21204 ? D 02:34 0:00 /usr/bin/python3 /usr/share/
root 152201 0.0 0.7 46080 27428 ? D 02:34 0:00 /usr/bin/python3 /usr/share/
root 152377 0.0 0.7 46080 27384 ? D 02:34 0:00 /usr/bin/python3 /usr/share/
root 152407 0.0 0.6 46080 27020 ? D 02:34 0:00 /usr/bin/python3 /usr/share/
root 152431 0.0 0.7 46080 28092 ? D 02:34 0:00 /usr/bin/python3 /usr/share/
root 152505 0.0 0.5 36364 21268 ? D 02:34 0:00 /usr/bin/python3 /usr/share/
root 191513 0.0 0.0 9480 964 pts/1 D 02:37 0:00 swapoff -a /home/ubuntu/
Looks like it's because of the swapoff command went wrong, I can see the following trace in dmesg:
[ 664.537025] ------------[ cut here ]------------
[ 664.537029] kernel BUG at mm/zswap.c:896!
[ 664.537079] illegal operation: 0001 ilc:1 [#1] SMP
[ 664.537083] Modules linked in: sctp vhost_net tap vhost_vsock vmw_vsock_
[ 664.537139] scsi_dh_emc scsi_dh_alua vmur vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel nfsd drm drm_panel_
[ 664.537198] CPU: 2 PID: 152087 Comm: apport Tainted: P O 5.4.0-110-generic #124-Ubuntu
[ 664.537200] Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
[ 664.537202] Krnl PSW : 0704c00180000000 0000000034e6beba (zswap_
[ 664.537209] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 664.537212] Krnl GPRS: 0000000000000003 00000000e597a200 ffffffffffffffea 0000000000000316
[ 664.537213] 0000000000000cea 0000000000000000 000003e0ffffffea 00000000f1a60328
[ 664.537215] 000003d080a46940 00000000f1a60320 000000000000c410 000000008d7c8b98
[ 664.537216] 00000000e597a200 000000009a353b00 0000000034e6bd78 000003e00e4630a8
[ 664.537225] Krnl Code: 0000000034e6beac: c0e50032c4aa brasl %r14,0000000035
[ 664.537240] Call Trace:
[ 664.537242] ([<0000000034e6
[ 664.537246] [<0000000034eab
[ 664.537248] [<0000000034eab
[ 664.537249] [<0000000034e6c
[ 664.537252] [<0000000034e6a
[ 664.537255] [<0000000034e61
[ 664.537259] [<0000000034e07
[ 664.537261] [<0000000034e0b
[ 664.537264] [<0000000034e0c
[ 664.537266] [<0000000034e0d
[ 664.537268] [<0000000034e0d
[ 664.537270] [<0000000034e0d
[ 664.537272] [<0000000034e0e
[ 664.537274] [<0000000034e5a
[ 664.537276] [<0000000034e5b
[ 664.537279] [<0000000034e77
[ 664.537281] [<0000000034e62
[ 664.537283] [<0000000034e62
[ 664.537285] [<0000000034e62
[ 664.537287] [<0000000034e63
[ 664.537292] [<0000000034e37
[ 664.537294] [<0000000034e39
[ 664.537296] [<0000000034e39
[ 664.537299] [<0000000034c11
[ 664.537305] [<00000000354d7
[ 664.537306] Last Breaking-
[ 664.537307] [<0000000034e6b
[ 664.537310] ---[ end trace 2f637439fb06e842 ]---
When this happens, the reboot process will be blocking by the following message for a while:
A stop job is running for /swapfile
tags: | added: sru-20220418 |
tags: | added: 5.4 focal s390x ubuntu-stress-smoke-test |
description: | updated |
Looking back into history, kernel04 was not properly tested in sru-20220321
* 5.4.0-109.123 - test in I state, probably because of this issue
* 5.4.0-108.122 - test NA
* 5.4.0-106.120 - test NA
With cycle sru-20220221, it's good with older version of stress-ng:
* 5.4.0-105.119 - OK with 48be8ff in stress-ng