stress-ng sysinfo stressor fails on ppc64el with linux 5.4.0-9.12

Bug #1856900 reported by Seth Forshee
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Colin Ian King

Bug Description

During autopkgtest testing the sysinfo stressor failed, causing the kernel to oops.

16:20:34 DEBUG| [stdout] sysinfo STARTING
16:20:39 DEBUG| [stdout] sysinfo RETURNED 0
16:20:39 DEBUG| [stdout] sysinfo FAILED (kernel oopsed)
16:20:39 DEBUG| [stdout] [ 6521.203448] kernel tried to execute exec-protected page (c0000000c25ffce0) - exploit attempt? (uid: 0)
16:20:39 DEBUG| [stdout] [ 6521.207260] BUG: Unable to handle kernel instruction fetch
16:20:39 DEBUG| [stdout] [ 6521.207307] Faulting instruction address: 0xc0000000c25ffce0
16:20:39 DEBUG| [stdout] [ 6521.207367] Oops: Kernel access of bad area, sig: 11 [#1]
16:20:39 DEBUG| [stdout] [ 6521.207416] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
16:20:39 DEBUG| [stdout] [ 6521.207481] Modules linked in: unix_diag sctp vhost_vsock vmw_vsock_virtio_transport_common vsock zfs(PO) zunicode(PO) zavl(PO) icp(PO) zlua(PO) userio zcommon(PO) znvpair(PO) cuse spl(O) kvm_pr kvm snd_seq snd_seq_device snd_timer snd soundcore hci_vhci bluetooth ecdh_generic ecc uhid hid vhost_net vhost tap atm algif_rng aegis128 algif_aead anubis fcrypt khazad seed sm4_generic tea crc32_generic md4 michael_mic nhpoly1305 poly1305_generic rmd128 rmd160 rmd256 rmd320 sha3_generic sm3_generic streebog_generic tgr192 wp512 xxhash_generic blowfish_generic blowfish_common cast5_generic des_generic libdes salsa20_generic chacha_generic camellia_generic cast6_generic cast_common serpent_generic twofish_generic twofish_common algif_skcipher aufs sch_etf sch_fq dccp_ipv6 dccp_ipv4 dccp ip6table_nat ip6_tables iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 algif_hash af_alg ip_vti ip6_vti fou6 sit ipip tunnel4 fou geneve act_mirred cls_basic esp6 authenc echainiv
16:20:39 DEBUG| [stdout] [ 6521.208045] iptable_filter xt_policy veth esp4_offload esp4 xfrm_user xfrm_algo macsec vxlan ip6_udp_tunnel udp_tunnel vrf 8021q garp mrp bridge stp llc ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre cls_u32 sch_htb dummy tls binfmt_misc af_packet_diag tcp_diag udp_diag raw_diag inet_diag iptable_mangle xt_TCPMSS xt_tcpudp bpfilter dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmx_crypto crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c crc32c_vpmsum virtio_blk virtio_net net_failover failover [last unloaded: trace_printk]
16:20:39 DEBUG| [stdout] [ 6521.209360] CPU: 1 PID: 2647099 Comm: fuse_mnt Tainted: P OE 5.4.0-9-generic #12-Ubuntu
16:20:39 DEBUG| [stdout] [ 6521.209457] NIP: c0000000c25ffce0 LR: c00000000063f058 CTR: c0000000c25ffce0
16:20:39 DEBUG| [stdout] [ 6521.209528] REGS: c000000109703810 TRAP: 0400 Tainted: P OE (5.4.0-9-generic)
16:20:39 DEBUG| [stdout] [ 6521.209608] MSR: 8000000010009033 <SF,EE,ME,IR,DR,RI,LE> CR: 88002440 XER: 20000000
16:20:39 DEBUG| [stdout] [ 6521.209681] CFAR: c00000000063f054 IRQMASK: 0
16:20:39 DEBUG| [stdout] GPR00: c00000000063f034 c000000109703aa0 c000000001a4bb00 c00000007cef3000
16:20:39 DEBUG| [stdout] GPR04: c0000000c25ffc18 0000000000000000 0000000000000000 0000000000000000
16:20:39 DEBUG| [stdout] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16:20:39 DEBUG| [stdout] GPR12: c0000000c25ffce0 c00000003fffee00 000079b6987b4410 0000000000000000
16:20:39 DEBUG| [stdout] GPR16: 000079b698b30000 000079b6987b0320 000079b69771f240 000079b6987b4420
16:20:39 DEBUG| [stdout] GPR20: 0000000000000000 0000000000000000 000079b6880010a0 000079b698a4d3a0
16:20:39 DEBUG| [stdout] GPR24: c000000109d56cc0 c0000001fde0cd8c c0000000c25ffce0 c000000109d56ca0
16:20:39 DEBUG| [stdout] GPR28: c000000109d56cc0 0000000000000000 c00000007cef3000 c000000109d56c90
16:20:39 DEBUG| [stdout] [ 6521.210276] NIP [c0000000c25ffce0] 0xc0000000c25ffce0
16:20:39 DEBUG| [stdout] [ 6521.210355] LR [c00000000063f058] fuse_request_end+0x128/0x2f0
16:20:39 DEBUG| [stdout] [ 6521.210423] Call Trace:
16:20:39 DEBUG| [stdout] [ 6521.210448] [c000000109703aa0] [c00000000063f034] fuse_request_end+0x104/0x2f0 (unreliable)
16:20:39 DEBUG| [stdout] [ 6521.210520] [c000000109703af0] [c000000000642ebc] fuse_dev_do_write+0x2cc/0x5c0
16:20:39 DEBUG| [stdout] [ 6521.210591] [c000000109703b70] [c000000000643654] fuse_dev_write+0x74/0xd0
16:20:39 DEBUG| [stdout] [ 6521.210660] [c000000109703c00] [c0000000004707c0] do_iter_readv_writev+0x240/0x290
16:20:39 DEBUG| [stdout] [ 6521.210735] [c000000109703c70] [c0000000004730d8] do_iter_write+0xc8/0x280
16:20:39 DEBUG| [stdout] [ 6521.210794] [c000000109703cc0] [c0000000004733a0] vfs_writev+0xe0/0x180
16:20:39 DEBUG| [stdout] [ 6521.210854] [c000000109703dc0] [c0000000004734dc] do_writev+0x9c/0x1a0
16:20:39 DEBUG| [stdout] [ 6521.210915] [c000000109703e20] [c00000000000b278] system_call+0x5c/0x68
16:20:39 DEBUG| [stdout] [ 6521.210973] Instruction dump:
16:20:39 DEBUG| [stdout] [ 6521.211018] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
16:20:39 DEBUG| [stdout] [ 6521.211089] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
16:20:39 DEBUG| [stdout] [ 6521.211168] ---[ end trace 141e6d1cc5d48ea2 ]---

Full testing log:

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/ppc64el/l/linux/20191218_163756_0e1f6@/log.gz

Tags: focal
Revision history for this message
Colin Ian King (colin-king) wrote :

I've seen something very similar to this on this platform and I believe it's a combination of previous regressions tests and the stress-ng sysinfo test that triggers this. Running the stress-ng stressor after a clean boot won't trigger this issue.

Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :

I believe this is because a FUSE based file system is being used in the prior ADT testing and sysinfo is breaking on the FUSE filesystem, so it may be a problem with with the fuse fs itself or the fuse file system that is using the kernel fuse core.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1856900

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: focal
Revision history for this message
Colin Ian King (colin-king) wrote :

I've manually re-run the tests on 5.4.0-18-generic on modoc several times and can't trip this issue. Is it possible to re-run all tests again to see if it fails when run from the test infrastructure?

Revision history for this message
Seth Forshee (sforshee) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

@Seth looking at the log it is run:

23:52:22 INFO | START ---- ---- timestamp=1585612342 localtime=Mar 30 23:52:22
23:52:22 DEBUG| Persistent state client._record_indent now set to 1
23:52:22 INFO | START ubuntu_stress_smoke_test.setup ubuntu_stress_smoke_test.setup timestamp=1585612342 localtime=Mar 30 23:52:22
23:52:22 DEBUG| Persistent state client._record_indent now set to 2
..
..
..
00:08:23 DEBUG| [stdout]
00:08:23 DEBUG| [stdout] Summary:
00:08:23 DEBUG| [stdout] Stressors run: 191
00:08:23 DEBUG| [stdout] Skipped: 1, binderfs
00:08:23 DEBUG| [stdout] Failed: 0,
00:08:23 DEBUG| [stdout] Oopsed: 0,
00:08:23 DEBUG| [stdout] Oomed: 1, dev-shm
00:08:23 DEBUG| [stdout] Passed: 189, access af-alg affinity aio aiol bad-altstack bigheap branch brk cache cap chattr chdir chmod chown chroot clock close context cpu crypt cyclic daemon dccp dentry dev dir dirdeep dnotify dup dynlib enosys env epoll eventfd fallocate fanotify fault fcntl fiemap fifo file-ioctl filename flock fork fp-error fstat full funcret futex get getdent getrandom handle hdd hrtimers icache icmp-flood inode-flags inotify io iomix ioprio ipsec-mb itimer judy key kill klog lease link locka lockbus lockf lockofd loop madvise malloc mcontend membarrier memfd memhotplug memrate memthrash mergesort mincore mknod mlock mmap mmapaddr mmapfixed mmapfork mmapmany mq mremap msg msync netdev netlink-proc netlink-task nice null open personality physpage pidfd pipe pipeherd pkey poll prctl procfs pthread ptrace pty radixsort ramfs rawdev rawsock readahead reboot rename revio rlimit rmap rtc schedpolicy sctp seal seccomp seek sem sem-sysv sendfile set shellsort shm shm-sysv sigfd sigfpe sigio sigpending sigpipe sigq sigrt sigsegv sigsuspend skiplist sleep sock sockabuse sockdiag sockmany softlockup splice stackmmap stream swap switch symlink sync-file sysbadaddr sysfs tee timer timerfd tlb-shootdown tmpfs tree tsearch tun udp udp-flood unshare urandom userfaultfd utime vdso vfork vm vm-addr vm-rw vm-segv vm-splice wait x86syscall yield zero zombie
00:08:23 DEBUG| [stdout] Badret: 0,
00:08:23 DEBUG| [stdout]

Revision history for this message
Colin Ian King (colin-king) wrote :

So I think we're in good condition now for 5.4.0-21 on ppc64el.

Seth Forshee (sforshee)
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.