accessing /dev/hvc1 with stress-ng on Ubuntu xenial causes crash

Bug #1711401 reported by Colin Ian King
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Colin Ian King
Xenial
Fix Released
Undecided
Unassigned

Bug Description

[SRU REQUEST][XENIAL]

When running stress-ng --dev stressor accessing /dev/hvc1 causes the stressor to lock up and never terminal.

[FIX]
Upstream commmit:From bbc3dfe8805de86874b1a1b1429a002e8670043e Mon Sep 17 00:00:00 2001
From: Sam Mendoza-Jonas <email address hidden>
Date: Mon, 11 Jul 2016 13:38:57 +1000
Subject: [PATCH] tty/hvc: Use IRQF_SHARED for OPAL hvc consoles

[TESTING]
Without the fix, stress-ng --dev run as root hangs, and one gets an irq failed error and an oops:

[ 2096.476610] hvc_open: request_irq failed with rc -16.
[ 2096.476634] Unable to handle kernel paging request for data at address 0x000000a8
[ 2096.476641] Faulting instruction address: 0xc000000000b0d8a4
[ 2096.476647] Oops: Kernel access of bad area, sig: 11 [#1]
[ 2096.476650] SMP NR_CPUS=2048 NUMA PowerNV

With the fix, the stressor runs w/o any issues

[REGRESSION POTENTIAL]
This is specific to one driver for ppc64el, so the risk potential is limited. The fix has limited regression potential as it essentially marks a flag to allow interrupt to be shared and fundamentally this is a very minor change to the interrupt handling functionality in the driver.

-------------------------------

stress-ng run as root, --dev stressor, accessing /dev/hvc1:

[ 2096.476610] hvc_open: request_irq failed with rc -16.
[ 2096.476634] Unable to handle kernel paging request for data at address 0x000000a8
[ 2096.476641] Faulting instruction address: 0xc000000000b0d8a4
[ 2096.476647] Oops: Kernel access of bad area, sig: 11 [#1]
[ 2096.476650] SMP NR_CPUS=2048 NUMA PowerNV
[ 2096.476657] Modules linked in: kvm_hv kvm_pr kvm cuse userio hci_vhci bluetooth uhid hid vhost_net vhost macvtap macvlan snd_seq snd_seq_device snd_timer snd soundcore ipmi_powernv ipmi_msghandler vmx_crypto leds_powernv uio_pdrv_genirq uio powernv_rng ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear dm_round_robin ses enclosure ipr scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[ 2096.476718] CPU: 74 PID: 23074 Comm: stress-ng-dev Not tainted 4.4.0-92-generic #115-Ubuntu
[ 2096.476724] task: c000000fe8499030 ti: c000000f0fa0c000 task.ti: c000000f0fa0c000
[ 2096.476728] NIP: c000000000b0d8a4 LR: c00000000069648c CTR: c000000000696450
[ 2096.476732] REGS: c000000f0fa0f6b0 TRAP: 0300 Not tainted (4.4.0-92-generic)
[ 2096.476736] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24444248 XER: 00000000
[ 2096.476749] CFAR: c000000000008468 DAR: 00000000000000a8 DSISR: 40000000 SOFTE: 0
               GPR00: c00000000069648c c000000f0fa0f930 c0000000015f8000 00000000000000a8
               GPR04: c000000ff0082400 0000000000000002 0000001ffe7d0000 0000000000000036
               GPR08: c000000003236b90 0000000000000000 000000008000004a c000001e4e568880
               GPR12: c000000000696450 c00000000fb6bf00 000000000000000f 0000010023d22660
               GPR16: 00003fff9ca50000 0000000000800000 c000001e43dde100 c000000fe459c800
               GPR20: 0000000000000000 0000000000000001 000000000e500001 0000000000010004
               GPR24: 0000000000010800 ffffffffffffffff fffffffffffff000 c000000001892448
               GPR28: c000000fef98da28 c000000fe459c800 0000000000000001 00000000000000a8
[ 2096.476805] NIP [c000000000b0d8a4] _raw_spin_lock_irqsave+0x44/0x130
[ 2096.476812] LR [c00000000069648c] hvc_open+0x3c/0x1a0
[ 2096.476815] Call Trace:
[ 2096.476818] [c000000f0fa0f930] [c000000f0fa0f970] 0xc000000f0fa0f970 (unreliable)
[ 2096.476824] [c000000f0fa0f970] [c00000000069648c] hvc_open+0x3c/0x1a0
[ 2096.476829] [c000000f0fa0f9f0] [c00000000066c514] tty_open+0x194/0x7c0
[ 2096.476836] [c000000f0fa0fa90] [c0000000002ec234] chrdev_open+0x114/0x270
[ 2096.476841] [c000000f0fa0faf0] [c0000000002e1480] do_dentry_open+0x2c0/0x460
[ 2096.476846] [c000000f0fa0fb50] [c0000000002f9aa8] do_last+0x178/0xff0
[ 2096.476851] [c000000f0fa0fc10] [c0000000002fab3c] path_openat+0xcc/0x3c0
[ 2096.476857] [c000000f0fa0fc90] [c0000000002fca5c] do_filp_open+0xfc/0x170
[ 2096.476861] [c000000f0fa0fdb0] [c0000000002e3270] do_sys_open+0x1c0/0x3b0
[ 2096.476867] [c000000f0fa0fe30] [c000000000009204] system_call+0x38/0xb4
[ 2096.476870] Instruction dump:
[ 2096.476873] fbe1fff8 f8010010 f821ffc1 7c7f1b78 60000000 60000000 39200000 8bcd02ca
[ 2096.476881] 992d02ca 39400000 994d02cc 814d0008 <7d20f829> 2c090000 40c20010 7d40f92d
[ 2096.476893] ---[ end trace c902066e4cc54f9e ]---

CVE References

Changed in linux (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :
Download full text (5.1 KiB)

Device is OK with 4.12, however, with 4.12 we get another issue:

[ 3996.652718] INFO: rcu_sched self-detected stall on CPU
[ 3996.652730] 23-...: (1 GPs behind) idle=73a/140000000000002/0 softirq=5179/5179 fqs=2475
[ 3996.652735] (t=5250 jiffies g=6889 c=6888 q=189)
[ 3996.652748] Task dump for CPU 23:
[ 3996.652749] kopald S 0 975 2 0x00000804
[ 3996.652752] Call Trace:
[ 3996.652759] [c000000fefafb0b0] [c000000000135a24] sched_show_task+0xd4/0x150 (unreliable)
[ 3996.652763] [c000000fefafb120] [c000000000c36b3c] rcu_dump_cpu_stacks+0xd0/0x134
[ 3996.652767] [c000000fefafb170] [c0000000001867a0] rcu_check_callbacks+0x8a0/0xb10
[ 3996.652769] [c000000fefafb2a0] [c000000000192958] update_process_times+0x48/0x90
[ 3996.652773] [c000000fefafb2d0] [c0000000001a9f60] tick_sched_handle.isra.8+0x30/0xb0
[ 3996.652775] [c000000fefafb300] [c0000000001aa044] tick_sched_timer+0x64/0xd0
[ 3996.652778] [c000000fefafb340] [c000000000193424] __hrtimer_run_queues+0x144/0x370
[ 3996.652780] [c000000fefafb3c0] [c00000000019449c] hrtimer_interrupt+0xfc/0x350
[ 3996.652784] [c000000fefafb490] [c000000000024658] __timer_interrupt+0x88/0x250
[ 3996.652787] [c000000fefafb4e0] [c000000000024a30] timer_interrupt+0x90/0xe0
[ 3996.652791] [c000000fefafb510] [c00000000000b6e0] restore_check_irq_replay+0x54/0x70
[ 3996.652795] --- interrupt: 901 at arch_local_irq_restore+0x74/0x90
                   LR = arch_local_irq_restore+0x74/0x90
[ 3996.652797] [c000000fefafb800] [c000000000194530] hrtimer_interrupt+0x190/0x350 (unreliable)
[ 3996.652801] [c000000fefafb820] [c000000000c3298c] __do_softirq+0xcc/0x41c
[ 3996.652804] [c000000fefafb900] [c0000000000fafa8] irq_exit+0xe8/0x120
[ 3996.652807] [c000000fefafb920] [c000000000024a34] timer_interrupt+0x94/0xe0
[ 3996.652810] [c000000fefafb950] [c00000000000b6e0] restore_check_irq_replay+0x54/0x70
[ 3996.652815] --- interrupt: 901 at lock_timer_base+0x70/0x100
                   LR = schedule_timeout+0x334/0x420
[ 3996.652817] [c000000fefafbc40] [0000000000000001] 0x1 (unreliable)
[ 3996.652821] [c000000fefafbca0] [c000000000c30a14] schedule_timeout+0x334/0x420
[ 3996.652824] [c000000fefafbd80] [c000000000095bf0] kopald+0x90/0xe0
[ 3996.652827] [c000000fefafbdc0] [c0000000001214cc] kthread+0x1ac/0x1c0
[ 3996.652831] [c000000fefafbe30] [c00000000000b2e8] ret_from_kernel_thread+0x5c/0x74
[ 4000.060643] NMI watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [kopald:975]
[ 4000.060653] Modules linked in: kvm_hv kvm_pr kvm cuse userio hci_vhci bluetooth ecdh_generic uhid hid vhost_net vhost tap snd_seq snd_seq_device snd_timer snd soundcore powernv_op_panel uio_pdrv_genirq ipmi_powernv ipmi_devintf powernv_rng leds_powernv ibmpowernv vmx_crypto ipmi_msghandler uio ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear dm_round_robin ses enclosure scsi_transport_sas crct10dif_vpmsum crc32c_vpmsum tg3 ipr scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[ 4000.060735] CPU: 23 PID: 975 Comm: kopald Not tainted 4.12.0-11-generic #12
[ 4000...

Read more...

summary: - accessing /dev/hvc1 with stress-ng on Ubuntu xenia causes crash
+ accessing /dev/hvc1 with stress-ng on Ubuntu xenial causes crash
Revision history for this message
Colin Ian King (colin-king) wrote :

OK in 4.10

Revision history for this message
Colin Ian King (colin-king) wrote :

Fix is upstream commit:

From bbc3dfe8805de86874b1a1b1429a002e8670043e Mon Sep 17 00:00:00 2001
From: Sam Mendoza-Jonas <email address hidden>
Date: Mon, 11 Jul 2016 13:38:57 +1000
Subject: [PATCH] tty/hvc: Use IRQF_SHARED for OPAL hvc consoles

Revision history for this message
Colin Ian King (colin-king) wrote :

Fix only required in Xenial, issue does not occur in Zesty upwards.

description: updated
Changed in linux (Ubuntu Xenial):
status: New → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

The smoke test has passed on the ppc64le system "modoc" with Xenial kernel, thanks!

00:26:13 DEBUG| [stdout] Summary:
00:26:13 DEBUG| [stdout] Stressors run: 147
00:26:13 DEBUG| [stdout] Skipped: 2, bind-mount sock-diag
00:26:13 DEBUG| [stdout] Failed: 0,
00:26:13 DEBUG| [stdout] Oopsed: 0,
00:26:13 DEBUG| [stdout] Passed: 145, af-alg affinity aio aiol bigheap brk cache cap chdir chmod chown chroot clock clone context cpu crypt cyclic daemon dccp dentry dev dir dirdeep dnotify dup epoll eventfd fallocate fanotify fault fcntl fiemap fifo filename flock fork fp-error fstat full futex get getdent getrandom handle hdd icache icmp-flood inode-flags inotify io iomix ioprio itimer key kill klog lease link locka lockbus lockf lockofd madvise malloc membarrier memfd memrate memthrash mergesort mincore mknod mlock mmap mmapfork mmapmany mq mremap msg msync netdev netlink-proc nice null open personality pipe poll procfs pthread ptrace pty radixsort readahead rename rlimit rmap rtc schedpolicy sctp seal seccomp seek sem sem-sysv sendfile shm shm-sysv sigfd sigfpe sigpending sigq sigsegv sigsuspend sleep sock softlockup splice stackmmap stream switch symlink sync-file sysfs sysinfo tee timer timerfd tlb-shootdown tmpfs tsearch udp udp-flood unshare urandom userfaultfd utime vfork vm vm-rw vm-splice wait yield zero zombie

tags: added: verification-done-xenial
removed: verification-needed-xenial
Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (14.4 KiB)

This bug was fixed in the package linux - 4.4.0-96.119

---------------
linux (4.4.0-96.119) xenial; urgency=low

  * linux: 4.4.0-96.119 -proposed tracker (LP: #1716613)

  * kernel panic -not syncing: Fatal exception: panic_on_oops (LP: #1708399)
    - s390/mm: no local TLB flush for clearing-by-ASCE IDTE
    - SAUCE: s390/mm: fix local TLB flushing vs. detach of an mm address space
    - SAUCE: s390/mm: fix race on mm->context.flush_mm

  * CVE-2017-1000251
    - Bluetooth: Properly check L2CAP config option output buffer length

linux (4.4.0-95.118) xenial; urgency=low

  * linux: 4.4.0-95.118 -proposed tracker (LP: #1715651)

  * Xenial update to 4.4.78 stable release broke Address Sanitizer
    (LP: #1715636)
    - mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes

linux (4.4.0-94.117) xenial; urgency=low

  * linux: 4.4.0-94.117 -proposed tracker (LP: #1713462)

  * mwifiex causes kernel oops when AP mode is enabled (LP: #1712746)
    - SAUCE: net/wireless: do not dereference invalid pointer
    - SAUCE: mwifiex: do not dereference invalid pointer

  * Backport more recent Broadcom bnxt_en driver (LP: #1711056)
    - SAUCE: bnxt_en_bpo: Import bnxt_en driver version 1.8.1
    - SAUCE: bnxt_en_bpo: Drop distro out-of-tree detection logic
    - SAUCE: bnxt_en_bpo: Remove unnecessary compile flags
    - SAUCE: bnxt_en_bpo: Move config settings to Kconfig
    - SAUCE: bnxt_en_bpo: Remove PCI_IDs handled by the regular driver
    - SAUCE: bnxt_en_bpo: Rename the backport driver to bnxt_en_bpo
    - bnxt_en_bpo: [Config] Enable CONFIG_BNXT_BPO=m

  * HID: multitouch: Support ALPS PTP Stick and Touchpad devices (LP: #1712481)
    - HID: multitouch: Support PTP Stick and Touchpad device
    - SAUCE: HID: multitouch: Support ALPS PTP stick with pid 0x120A

  * igb: Support using Broadcom 54616 as PHY (LP: #1712024)
    - SAUCE: igb: add support for using Broadcom 54616 as PHY

  * IPR driver causes multipath to fail paths/stuck IO on Medium Errors
    (LP: #1682644)
    - scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION

  * accessing /dev/hvc1 with stress-ng on Ubuntu xenial causes crash
    (LP: #1711401)
    - tty/hvc: Use IRQF_SHARED for OPAL hvc consoles

  * memory-hotplug test needs to be fixed (LP: #1710868)
    - selftests: typo correction for memory-hotplug test
    - selftests: check hot-pluggagble memory for memory-hotplug test
    - selftests: check percentage range for memory-hotplug test
    - selftests: add missing test name in memory-hotplug test
    - selftests: fix memory-hotplug test

  * HP lt4132 LTE/HSPA+ 4G Module (03f0:a31d) does not work (LP: #1707643)
    - net: cdc_mbim: apply "NDP to end" quirk to HP lt4132

  * Migrating KSM page causes the VM lock up as the KSM page merging list is too
    large (LP: #1680513)
    - ksm: introduce ksm_max_page_sharing per page deduplication limit
    - ksm: fix use after free with merge_across_nodes = 0
    - ksm: cleanup stable_node chain collapse case
    - ksm: swap the two output parameters of chain/chain_prune
    - ksm: optimize refile of stable_node_dup at the head of the chain

  * sort ABI files with C.UTF-8 locale (LP: #1712345)
    - [Packaging] sort ABI ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.