ubuntu_stress_smoke_tests hangs with swapoff command on F-390x

Bug #1969873 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned

Bug Description

Issue found on kernel04 with kernel 5.4.0-110-generic

Test passed, but the test will gets killed by the autotest timeout setting (2100 seconds) in the end.

 Summary:
   Stressors run: 229
   Skipped: 3, binderfs pci smi
   Failed: 0,
   Oopsed: 0,
   Oomed: 0,
   Passed: 226, access af-alg affinity aio aiol alarm bad-altstack bad-ioctl bigheap branch brk cache cap chattr chdir chmod chown chroot clock close context cpu crypt cyclic daemon dccp dentry dev dev-shm dir dirdeep dirmany dnotify dup dynlib enosys env epoll eventfd exit-group fallocate fanotify fault fcntl fiemap fifo file-ioctl filename flock fork fp-error fpunch fstat full funcret futex get getdent getrandom goto handle hash hdd hrtimers icache icmp-flood inode-flags inotify io iomix ioprio io-uring ipsec-mb itimer judy key kill klog kvm landlock lease link list loadavg locka lockbus lockf lockofd loop madvise malloc mcontend membarrier memfd memhotplug memrate memthrash mergesort mincore misaligned mknod mlock mmap mmapaddr mmapfixed mmapfork mmaphuge mmapmany mq mremap msg msync munmap nanosleep netdev netlink-proc netlink-task nice null open pageswap personality physpage pidfd ping-sock pipe pipeherd pkey poll prctl prefetch procfs pthread ptrace pty radixsort randlist ramfs rawdev rawpkt rawsock rawudp readahead reboot rename resched revio rlimit rmap rseq rtc schedpolicy sctp seal seccomp secretmem seek sem sem-sysv sendfile session set shellsort shm shm-sysv sigabrt sigchld sigfd sigfpe sigio signal signest sigpending sigpipe sigq sigrt sigsegv sigsuspend sigtrap skiplist sleep sock sockabuse sockdiag sockmany softlockup sparsematrix splice stackmmap stream swap switch symlink sync-file syncload sysbadaddr sysfs tee timer timerfd tlb-shootdown tmpfs tree tsearch tun udp udp-flood unshare urandom userfaultfd usersyscall utime vdso vecwide verity vfork vm vm-addr vm-rw vm-segv vm-splice wait x86syscall yield zero zombie
   Badret: 0,

 Tests took 471 seconds to run

There are some stress-ng related process in D state:
$ ps aux | grep stress
root 2247 0.0 0.0 27280 984 pts/1 S 02:28 0:00 /usr/bin/python2 -u autotest/client/autotest-local --verbose autotest/client/tests/ubuntu_stress_smoke_test/control
root 2248 0.0 0.0 27280 984 pts/1 S 02:28 0:00 /usr/bin/python2 -u autotest/client/autotest-local --verbose autotest/client/tests/ubuntu_stress_smoke_test/control
root 152197 0.0 0.5 46080 21204 ? D 02:34 0:00 /usr/bin/python3 /usr/share/apport/apport 152196 6 0 1 152196 !home!ubuntu!autotest!client!tmp!ubuntu_stress_smoke_test!src!stress-ng!stress-ng
root 152201 0.0 0.7 46080 27428 ? D 02:34 0:00 /usr/bin/python3 /usr/share/apport/apport 152200 6 0 1 152200 !home!ubuntu!autotest!client!tmp!ubuntu_stress_smoke_test!src!stress-ng!stress-ng
root 152377 0.0 0.7 46080 27384 ? D 02:34 0:00 /usr/bin/python3 /usr/share/apport/apport 152376 6 0 1 152376 !home!ubuntu!autotest!client!tmp!ubuntu_stress_smoke_test!src!stress-ng!stress-ng
root 152407 0.0 0.6 46080 27020 ? D 02:34 0:00 /usr/bin/python3 /usr/share/apport/apport 152406 6 0 1 152406 !home!ubuntu!autotest!client!tmp!ubuntu_stress_smoke_test!src!stress-ng!stress-ng
root 152431 0.0 0.7 46080 28092 ? D 02:34 0:00 /usr/bin/python3 /usr/share/apport/apport 152430 6 0 1 152430 !home!ubuntu!autotest!client!tmp!ubuntu_stress_smoke_test!src!stress-ng!stress-ng
root 152505 0.0 0.5 36364 21268 ? D 02:34 0:00 /usr/bin/python3 /usr/share/apport/apport 152504 6 0 1 152504 !home!ubuntu!autotest!client!tmp!ubuntu_stress_smoke_test!src!stress-ng!stress-ng
root 191513 0.0 0.0 9480 964 pts/1 D 02:37 0:00 swapoff -a /home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img

Looks like it's because of the swapoff command went wrong, I can see the following trace in dmesg:
[ 664.537025] ------------[ cut here ]------------
[ 664.537029] kernel BUG at mm/zswap.c:896!
[ 664.537079] illegal operation: 0001 ilc:1 [#1] SMP
[ 664.537083] Modules linked in: sctp vhost_net tap vhost_vsock vmw_vsock_virtio_transport_common vhost vsock zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) dccp_ipv4 dccp tgr192 streebog_generic sm3_generic sha3_generic rmd320 rmd256 rmd160 rmd128 nhpoly1305 poly1305_generic michael_mic md4 cmac ccm algif_rng twofish_generic twofish_common tea sm4_generic serpent_generic seed khazad fcrypt des_generic cast6_generic cast5_generic cast_common camellia_generic blowfish_generic blowfish_common anubis algif_skcipher algif_hash aegis128 algif_aead af_alg nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter overlay xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat nf_tables nfnetlink ip6table_filter ip6_tables aufs iptable_mangle xt_CHECKSUM iptable_nat xt_MASQUERADE xt_tcpudp bridge stp llc iptable_filter bpfilter openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 binfmt_misc dm_multipath scsi_dh_rdac
[ 664.537139] scsi_dh_emc scsi_dh_alua vmur vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel nfsd drm drm_panel_orientation_quirks auth_rpcgss nfs_acl lockd i2c_core grace sunrpc ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes qeth_l2 sha512_s390 sha256_s390 sha1_s390 qeth sha_common dasd_eckd_mod qdio dasd_mod ccwgroup
[ 664.537198] CPU: 2 PID: 152087 Comm: apport Tainted: P O 5.4.0-110-generic #124-Ubuntu
[ 664.537200] Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
[ 664.537202] Krnl PSW : 0704c00180000000 0000000034e6beba (zswap_writeback_entry+0x32a/0x340)
[ 664.537209] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 664.537212] Krnl GPRS: 0000000000000003 00000000e597a200 ffffffffffffffea 0000000000000316
[ 664.537213] 0000000000000cea 0000000000000000 000003e0ffffffea 00000000f1a60328
[ 664.537215] 000003d080a46940 00000000f1a60320 000000000000c410 000000008d7c8b98
[ 664.537216] 00000000e597a200 000000009a353b00 0000000034e6bd78 000003e00e4630a8
[ 664.537225] Krnl Code: 0000000034e6beac: c0e50032c4aa brasl %r14,00000000354c4800
                          0000000034e6beb2: a7f4ffaa brc 15,0000000034e6be06
                         #0000000034e6beb6: a7f40001 brc 15,0000000034e6beb8
                         >0000000034e6beba: ec81ffff00d9 aghik %r8,%r1,-1
                          0000000034e6bec0: a7f4fefe brc 15,0000000034e6bcbc
                          0000000034e6bec4: 0707 bcr 0,%r7
                          0000000034e6bec6: 0707 bcr 0,%r7
                          0000000034e6bec8: 0707 bcr 0,%r7
[ 664.537240] Call Trace:
[ 664.537242] ([<0000000034e6bd78>] zswap_writeback_entry+0x1e8/0x340)
[ 664.537246] [<0000000034eab55c>] zbud_reclaim_page+0x14c/0x2b0
[ 664.537248] [<0000000034eab700>] zbud_zpool_shrink+0x40/0x80
[ 664.537249] [<0000000034e6c162>] zswap_frontswap_store+0x292/0x5c0
[ 664.537252] [<0000000034e6a494>] __frontswap_store+0xb4/0x1a0
[ 664.537255] [<0000000034e618aa>] swap_writepage+0x5a/0xc0
[ 664.537259] [<0000000034e07bf8>] pageout.isra.0+0x118/0x3e0
[ 664.537261] [<0000000034e0bcec>] shrink_page_list+0xa3c/0xe90
[ 664.537264] [<0000000034e0cd40>] shrink_inactive_list+0x1f0/0x4e0
[ 664.537266] [<0000000034e0d934>] shrink_node_memcg+0x2f4/0x3f0
[ 664.537268] [<0000000034e0daf4>] shrink_node+0xc4/0x4d0
[ 664.537270] [<0000000034e0dfde>] do_try_to_free_pages+0xde/0x400
[ 664.537272] [<0000000034e0e3ee>] try_to_free_pages+0xee/0x230
[ 664.537274] [<0000000034e5af7c>] __alloc_pages_slowpath+0x31c/0xe90
[ 664.537276] [<0000000034e5bda4>] __alloc_pages_nodemask+0x2b4/0x330
[ 664.537279] [<0000000034e77e2a>] alloc_pages_vma+0x9a/0x280
[ 664.537281] [<0000000034e62ad4>] __read_swap_cache_async+0x194/0x270
[ 664.537283] [<0000000034e62bda>] read_swap_cache_async+0x2a/0x60
[ 664.537285] [<0000000034e62e3a>] swap_cluster_readahead+0x22a/0x2e0
[ 664.537287] [<0000000034e6331e>] swapin_readahead+0x2ce/0x410
[ 664.537292] [<0000000034e37594>] do_swap_page+0x1f4/0x880
[ 664.537294] [<0000000034e3938e>] __handle_mm_fault+0x7ee/0x910
[ 664.537296] [<0000000034e39576>] handle_mm_fault+0xc6/0x180
[ 664.537299] [<0000000034c11910>] do_dat_exception+0x120/0x3d0
[ 664.537305] [<00000000354d75de>] pgm_check_handler+0x1da/0x230
[ 664.537306] Last Breaking-Event-Address:
[ 664.537307] [<0000000034e6beb6>] zswap_writeback_entry+0x326/0x340
[ 664.537310] ---[ end trace 2f637439fb06e842 ]---

When this happens, the reboot process will be blocking by the following message for a while:
  A stop job is running for /swapfile

Po-Hsu Lin (cypressyew)
tags: added: sru-20220418
tags: added: 5.4 focal s390x ubuntu-stress-smoke-test
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Looking back into history, kernel04 was not properly tested in sru-20220321
* 5.4.0-109.123 - test in I state, probably because of this issue
* 5.4.0-108.122 - test NA
* 5.4.0-106.120 - test NA

With cycle sru-20220221, it's good with older version of stress-ng:
* 5.4.0-105.119 - OK with 48be8ff in stress-ng

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I can reproduce this hang issue with 5.4.0-110 + 48be8ff in stress-ng (V0.13.11) as well

Please find attachment for the dmesg output.

Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote (last edit ):
Download full text (14.9 KiB)

I can reproduce this with 5.4.0-105.119 + stress-ng 48be8ff (V0.13.11)...
Not sure what's going on here.

It looks like this was failed after the sigabrt test:
Apr 22 09:04:48 kernel04 stress-ng: info: [154111] dispatching hogs: 4 shm-sysv
Apr 22 09:04:51 kernel04 stress-ng: info: [154111] successful run completed in 2.71s
Apr 22 09:04:51 kernel04 stress-ng: invoked with './stress-ng -v -t 5 --sigabrt 4 --sigabrt-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
Apr 22 09:04:51 kernel04 stress-ng: system: 'kernel04' Linux 5.4.0-105-generic #119-Ubuntu SMP Mon Mar 7 18:48:48 UTC 2022 s390x
Apr 22 09:04:51 kernel04 stress-ng: memory (MB): total 3803.07, free 3044.49, shared 0.52, buffer 70.57, swap 2241.13, free swap 2174.96
Apr 22 09:04:51 kernel04 stress-ng: info: [155627] setting to a 5 second run per stressor
Apr 22 09:04:51 kernel04 stress-ng: info: [155627] dispatching hogs: 4 sigabrt
Apr 22 09:04:58 kernel04 stress-ng: info: [155627] successful run completed in 7.73s

Apr 22 09:05:31 kernel04 kernel: [ 2447.055320] ------------[ cut here ]------------
Apr 22 09:05:31 kernel04 kernel: [ 2447.055324] kernel BUG at mm/zswap.c:896!
Apr 22 09:05:31 kernel04 kernel: [ 2447.055375] illegal operation: 0001 ilc:1 [#1] SMP
Apr 22 09:05:31 kernel04 kernel: [ 2447.055379] Modules linked in: sctp vhost_net tap vhost_vsock vmw_vsock_virtio_transport_common vhost vsock zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) dccp_ipv4 dccp tgr192 streebog_generic sm3_generic sha3_generic rmd320 rmd256 rmd160 rmd128 nhpoly1305 poly1305_generic michael_mic md4 cmac ccm algif_rng twofish_generic twofish_common tea sm4_generic serpent_generic seed khazad fcrypt des_generic cast6_generic cast5_generic cast_common camellia_generic blowfish_generic blowfish_common anubis algif_skcipher algif_hash aegis128 algif_aead af_alg nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter overlay xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat nf_tables nfnetlink aufs ip6table_filter ip6_tables iptable_mangle xt_CHECKSUM iptable_nat xt_MASQUERADE xt_tcpudp bridge stp llc iptable_filter bpfilter openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 binfmt_misc dm_multipath scsi_dh_rdac
Apr 22 09:05:31 kernel04 kernel: [ 2447.055441] scsi_dh_emc scsi_dh_alua vmur vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel nfsd auth_rpcgss drm nfs_acl lockd drm_panel_orientation_quirks i2c_core grace sunrpc ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 sha256_s390 dasd_eckd_mod qeth_l2 sha1_s390 sha_common qeth qdio dasd_mod ccwgroup
Apr 22 09:05:31 kernel04 kernel: [ 2447.055499] CPU: 4 PID: 155935 Comm: apport Tainted: P O 5.4.0-105-generic #119-Ubuntu
Apr 22 09:05:31 kernel04 kernel: [ 2447.055501] Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
Apr 22 09:05:31 kernel04 kernel: [ 2447.055503] Krnl PSW : 0704c00180000000 00000000423aabca (zswap_writeback...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.