bad-ioctl test in ubuntu_stress_smoke_test triggers kernel bug on Focal s390x

Bug #1931983 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
Invalid
Undecided
Unassigned
ubuntu-kernel-tests
Invalid
Undecided
Unassigned

Bug Description

Issue found on 5.4.0-75-generic #84-Ubuntu with s390x zVM kernel04
Test suite HEAD SHA1: 95d7a60

From jenkins it looks like the test hang with the dev test:
09:11:18 DEBUG| [stdout] dentry STARTING
09:11:18 DEBUG| [stdout] dentry RETURNED 0
09:11:18 DEBUG| [stdout] dentry PASSED
09:11:18 DEBUG| [stdout] dev STARTING

However from a console on kernel04 you will see kernel bug tripped with bad-ioctl instead:

 stress-ng: invoked with './stress-ng -v -t 5 --bad-altstack 4 --bad-altstack-ops 5000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
 stress-ng: system: 'kernel04' Linux 5.4.0-75-generic #84-Ubuntu SMP Fri May 28 16:27:37 UTC 2021 s390x
 stress-ng: memory (MB): total 3806.19, free 1497.89, shared 0.55, buffer 47.88, swap 2241.13, free swap 2241.13
 stress-ng: info: [15292] dispatching hogs: 4 bad-altstack
 stress-ng: info: [15292] successful run completed in 6.94s
 stress-ng: invoked with './stress-ng -v -t 5 --bad-ioctl 4 --bad-ioctl-ops 5000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
[ 251.035269] ------------[ cut here ]------------
[ 251.035274] kernel BUG at mm/zswap.c:896!
[ 251.035331] illegal operation: 0001 ilc:1 [#1] SMP
[ 251.035335] Modules linked in: des_generic algif_rng aegis128 algif_aead anubis fcrypt khazad seed sm4_generic tea ccm cmac md4 michael_mic nhpoly1305 poly1305_generic rmd128 rmd160 rmd256 rmd320 sha3_generic sm3_generic streebog_generic tgr192 wp512 xxhash_generic algif_hash blowfish_generic blowfish_common cast5_generic salsa20_generic chacha_generic camellia_generic cast6_generic cast_common serpent_generic twofish_generic twofish_common algif_skcipher af_alg nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter overlay xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat nf_tables nfnetlink ip6table_filter ip6_tables aufs iptable_mangle xt_CHECKSUM iptable_nat xt_MASQUERADE xt_tcpudp bridge stp llc iptable_filter bpfilter openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 binfmt_misc zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) dm_multipath zavl(PO) scsi_dh_rdac icp(PO) scsi_dh_emc spl(O) scsi_dh_alua vmur vfio_ccw
[ 251.035396] vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel nfsd drm auth_rpcgss drm_panel_orientation_quirks i2c_core nfs_acl lockd grace sunrpc ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes qeth_l2 dasd_eckd_mod sha512_s390 qeth qdio ccwgroup sha256_s390 sha1_s390 sha_common dasd_mod
[ 251.035536] CPU: 0 PID: 15344 Comm: apport Tainted: P O 5.4.0-75-generic #84-Ubuntu
[ 251.035538] Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
[ 251.035540] Krnl PSW : 0704c00180000000 00000000c0bfb87a (zswap_writeback_entry+0x32a/0x340)
[ 251.035570] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 251.035575] Krnl GPRS: 0000000000000000 000000005e636600 ffffffffffffffea 0000000000001000
[ 251.035579] 0000000040767000 0000000040768000 000003e0ffffffea 00000000e80c81b8
[ 251.035582] 000003d08101d9c0 00000000e80c81b0 000000000004a111 000000000a48c4d0
[ 251.035622] 000000005e636600 000000002ff97a00 00000000c0bfb738 000003e0017df0a8
[ 251.035631] Krnl Code: 00000000c0bfb86c: c0e50032935a brasl %r14,00000000c124df20
[ 251.035631] 00000000c0bfb872: a7f4ffaa brc 15,00000000c0bfb7c6
[ 251.035631] #00000000c0bfb876: a7f40001 brc 15,00000000c0bfb878
[ 251.035631] >00000000c0bfb87a: ec81ffff00d9 aghik %r8,%r1,-1
[ 251.035631] 00000000c0bfb880: a7f4fefe brc 15,00000000c0bfb67c
[ 251.035631] 00000000c0bfb884: 0707 bcr 0,%r7
[ 251.035631] 00000000c0bfb886: 0707 bcr 0,%r7
[ 251.035631] 00000000c0bfb888: 0707 bcr 0,%r7
[ 251.035645] Call Trace:
[ 251.035648] ([<00000000c0bfb738>] zswap_writeback_entry+0x1e8/0x340)
[ 251.035670] [<00000000c0c3aeec>] zbud_reclaim_page+0x14c/0x2b0
[ 251.035672] [<00000000c0c3b090>] zbud_zpool_shrink+0x40/0x80
[ 251.035674] [<00000000c0bfbb22>] zswap_frontswap_store+0x292/0x5c0
[ 251.035678] [<00000000c0bf9e54>] __frontswap_store+0xb4/0x1a0
[ 251.035680] [<00000000c0bf126a>] swap_writepage+0x5a/0xc0
[ 251.035685] [<00000000c0b97c28>] pageout.isra.0+0x118/0x3e0
[ 251.035688] [<00000000c0b9bcec>] shrink_page_list+0xa3c/0xe90
[ 251.035690] [<00000000c0b9cd40>] shrink_inactive_list+0x1f0/0x4e0
[ 251.035691] [<00000000c0b9d934>] shrink_node_memcg+0x2f4/0x3f0
[ 251.035693] [<00000000c0b9daf4>] shrink_node+0xc4/0x4d0
[ 251.035695] [<00000000c0b9dfde>] do_try_to_free_pages+0xde/0x400
[ 251.035697] [<00000000c0b9e3ee>] try_to_free_pages+0xee/0x230
[ 251.035699] [<00000000c0be899c>] __alloc_pages_slowpath+0x31c/0xe90
[ 251.035702] [<00000000c0be97c4>] __alloc_pages_nodemask+0x2b4/0x330
[ 251.035705] [<00000000c0c077ba>] alloc_pages_vma+0x9a/0x280
[ 251.035707] [<00000000c0bf2494>] __read_swap_cache_async+0x194/0x270
[ 251.035709] [<00000000c0bf259a>] read_swap_cache_async+0x2a/0x60
[ 251.035711] [<00000000c0bf27fa>] swap_cluster_readahead+0x22a/0x2e0
[ 251.035713] [<00000000c0bf2cde>] swapin_readahead+0x2ce/0x410
[ 251.035718] [<00000000c0bc7314>] do_swap_page+0x1f4/0x880
[ 251.035720] [<00000000c0bc910e>] __handle_mm_fault+0x7ee/0x910
[ 251.035722] [<00000000c0bc92f6>] handle_mm_fault+0xc6/0x180
[ 251.035726] [<00000000c09a56b0>] do_dat_exception+0x120/0x3d0
[ 251.035732] [<00000000c1260e5e>] pgm_check_handler+0x1da/0x230
[ 251.035733] Last Breaking-Event-Address:
[ 251.035735] [<00000000c0bfb876>] zswap_writeback_entry+0x326/0x340
[ 251.035738] ---[ end trace 88ea2865acaf58aa ]---
[ 251.035743] ------------[ cut here ]------------
[ 251.035747] WARNING: CPU: 0 PID: 15344 at kernel/exit.c:726 do_exit+0x40/0xb60
[ 251.035748] Modules linked in: des_generic algif_rng aegis128 algif_aead anubis fcrypt khazad seed sm4_generic tea ccm cmac md4 michael_mic nhpoly1305 poly1305_generic rmd128 rmd160 rmd256 rmd320 sha3_generic sm3_generic streebog_generic tgr192 wp512 xxhash_generic algif_hash blowfish_generic blowfish_common cast5_generic salsa20_generic chacha_generic camellia_generic cast6_generic cast_common serpent_generic twofish_generic twofish_common algif_skcipher af_alg nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter overlay xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat nf_tables nfnetlink ip6table_filter ip6_tables aufs iptable_mangle xt_CHECKSUM iptable_nat xt_MASQUERADE xt_tcpudp bridge stp llc iptable_filter bpfilter openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 binfmt_misc zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) dm_multipath zavl(PO) scsi_dh_rdac icp(PO) scsi_dh_emc spl(O) scsi_dh_alua vmur vfio_ccw
[ 251.035778] vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel nfsd drm auth_rpcgss drm_panel_orientation_quirks i2c_core nfs_acl lockd grace sunrpc ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes qeth_l2 dasd_eckd_mod sha512_s390 qeth qdio ccwgroup sha256_s390 sha1_s390 sha_common dasd_mod
[ 251.035798] CPU: 0 PID: 15344 Comm: apport Tainted: P D O 5.4.0-75-generic #84-Ubuntu
[ 251.035800] Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
[ 251.035801] Krnl PSW : 0704e00180000000 00000000c09ff8f0 (do_exit+0x40/0xb60)
[ 251.035804] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 251.035806] Krnl GPRS: 000000000000000a 000003e0017df698 00000000e64f8f48 0000000000000007
[ 251.035807] 0000000000000007 00000000c1512098 0000000000000001 000000000000000b
[ 251.035809] 0704c00180000000 00000000c0981200 00000000c13f0060 000000005e636600
[ 251.035810] 000000005e636600 0000000000000000 000003e0017dee78 000003e0017dee08
[ 251.035815] Krnl Code: 00000000c09ff8e0: e32010000004 lg %r2,0(%r1)
[ 251.035815] 00000000c09ff8e6: ec1203598064 cgrj %r1,%r2,8,00000000c09fff98
[ 251.035815] #00000000c09ff8ec: a7f40001 brc 15,00000000c09ff8ee
[ 251.035815] >00000000c09ff8f0: 588003a8 l %r8,936
[ 251.035815] 00000000c09ff8f4: ecd82bb70055 risbg %r13,%r8,43,183,0
[ 251.035815] 00000000c09ff8fa: a774055b brc 7,00000000c0a003b0
[ 251.035815] 00000000c09ff8fe: e310b6b00012 lt %r1,1712(%r11)
[ 251.035815] 00000000c09ff904: a7840550 brc 8,00000000c0a003a4
[ 251.035831] Call Trace:
[ 251.035834] ([<88ea2865acaf58aa>] 0x88ea2865acaf58aa)
[ 251.035839] [<00000000c09943da>] die+0x14a/0x170
[ 251.035841] [<00000000c0981340>] illegal_op+0x140/0x150
[ 251.035843] [<00000000c1260e5e>] pgm_check_handler+0x1da/0x230
[ 251.035845] [<00000000c0bfb87a>] zswap_writeback_entry+0x32a/0x340
[ 251.035846] ([<00000000c0bfb738>] zswap_writeback_entry+0x1e8/0x340)
[ 251.035848] [<00000000c0c3aeec>] zbud_reclaim_page+0x14c/0x2b0
[ 251.035850] [<00000000c0c3b090>] zbud_zpool_shrink+0x40/0x80
[ 251.035852] [<00000000c0bfbb22>] zswap_frontswap_store+0x292/0x5c0
[ 251.035854] [<00000000c0bf9e54>] __frontswap_store+0xb4/0x1a0
[ 251.035856] [<00000000c0bf126a>] swap_writepage+0x5a/0xc0
[ 251.035858] [<00000000c0b97c28>] pageout.isra.0+0x118/0x3e0
[ 251.035859] [<00000000c0b9bcec>] shrink_page_list+0xa3c/0xe90
[ 251.035861] [<00000000c0b9cd40>] shrink_inactive_list+0x1f0/0x4e0
[ 251.035863] [<00000000c0b9d934>] shrink_node_memcg+0x2f4/0x3f0
[ 251.035865] [<00000000c0b9daf4>] shrink_node+0xc4/0x4d0
[ 251.035867] [<00000000c0b9dfde>] do_try_to_free_pages+0xde/0x400
[ 251.035868] [<00000000c0b9e3ee>] try_to_free_pages+0xee/0x230
[ 251.035870] [<00000000c0be899c>] __alloc_pages_slowpath+0x31c/0xe90
[ 251.035872] [<00000000c0be97c4>] __alloc_pages_nodemask+0x2b4/0x330
[ 251.035874] [<00000000c0c077ba>] alloc_pages_vma+0x9a/0x280
[ 251.035876] [<00000000c0bf2494>] __read_swap_cache_async+0x194/0x270
[ 251.035877] [<00000000c0bf259a>] read_swap_cache_async+0x2a/0x60
[ 251.035880] [<00000000c0bf27fa>] swap_cluster_readahead+0x22a/0x2e0
[ 251.035882] [<00000000c0bf2cde>] swapin_readahead+0x2ce/0x410
[ 251.035883] [<00000000c0bc7314>] do_swap_page+0x1f4/0x880
[ 251.035885] [<00000000c0bc910e>] __handle_mm_fault+0x7ee/0x910
[ 251.035887] [<00000000c0bc92f6>] handle_mm_fault+0xc6/0x180
[ 251.035889] [<00000000c09a56b0>] do_dat_exception+0x120/0x3d0
[ 251.035891] [<00000000c1260e5e>] pgm_check_handler+0x1da/0x230
[ 251.035892] Last Breaking-Event-Address:
[ 251.035893] [<00000000c09ff8ec>] do_exit+0x3c/0xb60
[ 251.035894] ---[ end trace 88ea2865acaf58ab ]---

Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
summary: - dev test in ubuntu_stress_smoke_test crash Focal s390x
+ bad-ioctl test in ubuntu_stress_smoke_test triggers kernel bug on Focal
+ s390x
description: updated
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Output with the dev test:
 stress-ng: invoked with './stress-ng -v -t 5 --dev 4 --dev-ops 5000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
 stress-ng: system: 'kernel04' Linux 5.4.0-75-generic #84-Ubuntu SMP Fri May 28 16:27:37 UTC 2021 s390x
 stress-ng: memory (MB): total 3806.19, free 2890.46, shared 0.13, buffer 155.32, swap 2241.13, free swap 2073.50
 stress-ng: info: [26412] dispatching hogs: 4 dev
 systemd-udevd[26474]: dasda: Failed to process device, ignoring: Resource temporarily unavailable
 kernel: [ 333.581289] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
 kernel: [ 333.581298] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
 kernel: [ 333.581306] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
 kernel: [ 333.581314] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
 kernel: [ 333.581499] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
 kernel: [ 333.581562] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
 kernel: [ 333.581571] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
 kernel: [ 333.581586] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
 kernel: [ 333.581641] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
 kernel: [ 333.581653] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0

Revision history for this message
Colin Ian King (colin-king) wrote :

The dev stressor should be run for a long duration (say 5 minutes) for a thorough run to re-trigger issues since it uses /dev operations that try to trigger race conditions - also the order that /dev is being scanned may vary from run to run due to timing issues, so running it for long durations is a good bet at hitting these bugs.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Please find attachment for the complete syslog output since the beginning of the test.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

The dev test hangs for about 50 minutes on that node. It's a bit too long.

Revision history for this message
Colin Ian King (colin-king) wrote :

A long hang like that indicates a broken kernel

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I am trying to run this again manually with -74 and -75

Changed in ubuntu-kernel-tests:
status: New → In Progress
assignee: nobody → Po-Hsu Lin (cypressyew)
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Tried to run stress smoke test twice manually
* On 5.4.0-74, I didn't see this issue, test finished without this bug in syslog.
* On 5.4.0-75, passed without any issue as well.

However, when I run this over our jenkins server (with 5.4.0-75), the first run it passed without any problem. But the second run this error pops up in the end of the test (after the test summary, with no test failed)

This is pretty odd. Might need some more tests.

Changed in ubuntu-kernel-tests:
assignee: Po-Hsu Lin (cypressyew) → nobody
status: In Progress → New
Revision history for this message
Colin Ian King (colin-king) wrote (last edit ):

When a stress-ng dev test hangs, use:

ps -augx | grep stress-ng

this may show the name of device stress-ng was hung on in the stress-ng command line (I add that info in to help debug such issues).

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

ubuntu_stress_smoke_test passed with Focal s390x in recent cycles, closing this bug.

Changed in stress-ng:
status: New → Invalid
Changed in ubuntu-kernel-tests:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.