qeth: check not more than 16 SBALEs on the completion queue

Bug #1750568 reported by bugproxy on 2018-02-20
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
High
Canonical Kernel Team
linux (Ubuntu)
High
Joseph Salisbury
Xenial
High
Joseph Salisbury
Artful
High
Joseph Salisbury
Bionic
High
Joseph Salisbury

Bug Description

== SRU Justification ==
af_iucv socket programs with HiperSockets as transport make use of the qdio completion queue.
Running such an af_iucv socket program may result in a crash which can be seen
in bug comment #1. This issue is fixed by mainline commit 903e48531e8b.

Commit 903e48531e8b is in mailine as of 4.8-rc8, so it is only needed in Xenial.

== Fix ==
903e48531e8b ("qeth: check not more than 16 SBALEs on the completion queue")

== Regression Potential ==
Low. Limited to s390.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

bugproxy (bugproxy) on 2018-02-20
tags: added: architecture-s39064 bugnameltc-164867 severity-high targetmilestone-inin1604
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
information type: Public → Private

------- Comment From <email address hidden> 2018-02-20 07:56 EDT-------
Description: qeth: check not more than 16 SBALEs on the completion queue
Symptom: Kernel crash
Problem: af_iucv socket programs with HiperSockets as transport
make use of the qdio completion queue. Running such an
af_iucv socket program may result in a crash:
[90341.677709] Oops: 0038 ilc:2 [#1] SMP
[90341.677743] CPU: 1 PID: 0 Comm: swapper/1
Not tainted 4.6.0-20160720.0.0e
[90341.677744] Hardware name: IBM 2964 N96 703
[90341.677746] task: 00000000edb79f00 ti: 00000000edb84000
task.ti: 00000000
[90341.677748] Krnl PSW : 0704d00180000000 000000000075bc50
(qeth_qdio_input
[90341.677756] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0
Krnl GPRS: 000003d10391e900 0000000000000001 00000000e61e6000
00000000000000
[90341.677759] 0000000000a9e6ec 5420040001a77400 0000000000000001
[90341.677761] 00000000e0d83f00 0000000000000003 0000000000000010
[90341.677784] 000000007ba8b000 0000000000943fd0 000000000075bc4e
[90341.677793] Krnl Code:
000000000075bc42: e320cc180004 lg %r2,
000000000075bc48: c0e5ffffc5cc brasl %r14,7547e0
#000000000075bc4e: 1816 lr %r1,%r6
>000000000075bc50: ba19b008 cs %r1,%r9,8(%r11)
000000000075bc54: ec180041017e cij %r1,1,8,75bcd6
000000000075bc5a: 5810b008 l %r1,8(%r11)
000000000075bc5e: ec16005c027e cij %r1,2,6,75bd16
000000000075bc64: 5090b008 st %r9,8(%r11)
[90341.677807] Call Trace:
[90341.677810] ([<000000000075bbc0>] qeth_qdio_input_handler
+0x1c8/0x4e0)
[90341.677812] ([<000000000070efbc>] qdio_kick_handler+0x124/0x2a8)
[90341.677814] ([<0000000000713570>] __tiqdio_inbound_processing
+0xf0/0xcd0)
[90341.677818] ([<0000000000143312>] tasklet_action+0x92/0x120)
[90341.677823] ([<00000000008b6e72>] __do_softirq+0x112/0x308)
[90341.677824] ([<0000000000142bce>] irq_exit+0xd6/0xf8)
[90341.677829] ([<000000000010b1d2>] do_IRQ+0x6a/0x88)
[90341.677830] ([<00000000008b6322>] io_int_handler+0x112/0x220)
[90341.677832] ([<0000000000102b2e>] enabled_wait+0x56/0xa8)
[90341.677833] ([<0000000000000000>] (null))
[90341.677835] ([<0000000000102e32>] arch_cpu_idle+0x32/0x48)
[90341.677838] ([<000000000018a126>] cpu_startup_entry+0x266/0x2b0)
[90341.677841] ([<0000000000113b38>] smp_start_secondary+0x100/0x110)
[90341.677843] ([<00000000008b68a6>] restart_int_handler+0x62/0x78)
[90341.677845] ([<00000000008b6588>] psw_idle+0x3c/0x40)
[90341.677846] Last Breaking-Event-Address:
[90341.677848] [<00000000007547ec>] qeth_dbf_longtext+0xc/0xc0

Solution: qeth_qdio_cq_handler() analyzes SBALs on this completion
queue, but does not observe the limit of 16 SBAL elements
per SBAL. This patch adds the additional check to process
not more than 16 SBAL elements.
Reproduction: Run af_iucv stress test
Upstream-ID: 903e48531e8b5d414c8f1960eacac24c31f60344
Problem-ID: 148203

Changed in ubuntu-z-systems:
importance: Undecided → High
Frank Heimes (frank-heimes) wrote :

kernel 4.16 commit:
https://github.com/torvalds/linux/commit/903e48531e8b5d414c8f1960eacac24c31f60344
but also cleanly applies to 4.4

need to be incl. into 4.15/bionic, 4.13/artful/xenial-hwe and 4.4/xenial-default

Changed in ubuntu-z-systems:
status: New → Triaged
information type: Private → Public
Changed in ubuntu-z-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
description: updated
Changed in linux (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
status: New → Triaged
Changed in linux (Ubuntu Artful):
status: New → Triaged
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
status: Triaged → Fix Committed
assignee: Skipper Bug Screeners (skipper-screen-team) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Artful):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Artful):
status: Triaged → In Progress
Changed in linux (Ubuntu Xenial):
status: Triaged → In Progress
Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Released
Joseph Salisbury (jsalisbury) wrote :

I confirmed that Artful and Bionic already contain commit 903e48531e8b5d414c8f1960eacac24c31f60344, which was included in mainline as of v4.8-rc8.

The commit is not in Xenial, so I built a Xenial test kernel with a cherry pick of it. The test kenrel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1750568/

Can you test this kernel and see if it resolves this bug?

Changed in ubuntu-z-systems:
status: Triaged → In Progress
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-23 04:58 EDT-------
I'm not able to test this kernel, but this patch was verified upfront, before posting..

Joseph Salisbury (jsalisbury) wrote :
description: updated
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
bugproxy (bugproxy) on 2018-03-01
tags: removed: bugnameltc-164867 severity-high
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
bugproxy (bugproxy) on 2018-03-14
tags: added: bugnameltc-164867 severity-high
Stefan Bader (smb) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-19 07:32 EDT-------
patch verified upfront - see LP comment #4 from 2018-02-23

tags: added: verification-done-xenial
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :
Download full text (56.9 KiB)

This bug was fixed in the package linux - 4.4.0-119.143

---------------
linux (4.4.0-119.143) xenial; urgency=medium

  * linux: 4.4.0-119.143 -proposed tracker (LP: #1760327)

  * Dell XPS 13 9360 bluetooth scan can not detect any device (LP: #1759821)
    - Revert "Bluetooth: btusb: fix QCA Rome suspend/resume"

linux (4.4.0-118.142) xenial; urgency=medium

  * linux: 4.4.0-118.142 -proposed tracker (LP: #1759607)

  * Kernel panic with AWS 4.4.0-1053 / 4.4.0-1015 (Trusty) (LP: #1758869)
    - x86/microcode/AMD: Do not load when running on a hypervisor

  * CVE-2018-8043
    - net: phy: mdio-bcm-unimac: fix potential NULL dereference in
      unimac_mdio_probe()

linux (4.4.0-117.141) xenial; urgency=medium

  * linux: 4.4.0-117.141 -proposed tracker (LP: #1755208)

  * Xenial update to 4.4.114 stable release (LP: #1754592)
    - x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
    - usbip: prevent vhci_hcd driver from leaking a socket pointer address
    - usbip: Fix implicit fallthrough warning
    - usbip: Fix potential format overflow in userspace tools
    - x86/microcode/intel: Fix BDW late-loading revision check
    - x86/retpoline: Fill RSB on context switch for affected CPUs
    - sched/deadline: Use the revised wakeup rule for suspending constrained dl
      tasks
    - can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
    - can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
    - PM / sleep: declare __tracedata symbols as char[] rather than char
    - time: Avoid undefined behaviour in ktime_add_safe()
    - timers: Plug locking race vs. timer migration
    - Prevent timer value 0 for MWAITX
    - drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
    - drivers: base: cacheinfo: fix boot error message when acpi is enabled
    - PCI: layerscape: Add "fsl,ls2085a-pcie" compatible ID
    - PCI: layerscape: Fix MSG TLP drop setting
    - mmc: sdhci-of-esdhc: add/remove some quirks according to vendor version
    - fs/select: add vmalloc fallback for select(2)
    - hwpoison, memcg: forcibly uncharge LRU pages
    - cma: fix calculation of aligned offset
    - mm, page_alloc: fix potential false positive in __zone_watermark_ok
    - ipc: msg, make msgrcv work with LONG_MIN
    - x86/ioapic: Fix incorrect pointers in ioapic_setup_resources()
    - ACPI / processor: Avoid reserving IO regions too early
    - ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
    - ACPICA: Namespace: fix operand cache leak
    - netfilter: x_tables: speed up jump target validation
    - netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed
      in 64bit kernel
    - netfilter: nf_dup_ipv6: set again FLOWI_FLAG_KNOWN_NH at flowi6_flags
    - netfilter: nf_ct_expect: remove the redundant slash when policy name is
      empty
    - netfilter: nfnetlink_queue: reject verdict request from different portid
    - netfilter: restart search if moved to other chain
    - netfilter: nf_conntrack_sip: extend request line validation
    - netfilter: use fwmark_reflect in nf_send_reset
    - ext2: Don't clear SGID when inheriting ACLs
    - reiserfs: fix race in prealloc discard
    - re...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-04 06:58 EDT-------
IBM bugzilla status -> closed, Fix Released by all required Releases

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments