Fix Kernel Crashing under IBM Virtual Scsi Driver

Bug #1642299 reported by bugproxy on 2016-11-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Tim Gardner
Xenial
Undecided
Tim Gardner
Yakkety
Undecided
Tim Gardner
Zesty
High
Tim Gardner

Bug Description

Kernel crashes running large amounts of deployment using Ibmvscsis driver.

Contact Information = Bryant <email address hidden>

Stack trace output:

[ 1780.861532] Faulting instruction address: 0xc000000000583de0
[ 1780.861542] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1780.861549] SMP NR_CPUS=2048 NUMA pSeries
[ 1780.861557] Modules linked in: ip6table_filter ip6_tables xt_tcpudp iptable_mangle ebt_arp ebt_among ebtable_filter ebtables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables bridge stp llc target_core_user uio target_core_pscsi target_core_file dccp_diag target_core_iblock iscsi_target_mod dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc pseries_rng ibmvmc(OE) rtc_generic ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 mlx4_en vxlan ip6_udp_tunnel udp_tunnel ses enclosure ibmvscsis target_core_mod configfs ibmveth mlx4_core megaraid_sas ahci libahci
[ 1780.861707] CPU: 22 PID: 35128 Comm: tcmu-runner Tainted: G W OE 4.4.13-customv7 #22
[ 1780.861718] task: c0000001e6f50080 ti: c0000001e6fd0000 task.ti: c0000001e6fd0000
[ 1780.861727] NIP: c000000000583de0 LR: d0000000065c1b04 CTR: c000000000583da0
[ 1780.861736] REGS: c0000001e6fd3950 TRAP: 0300 Tainted: G W OE (4.4.13-customv7)
[ 1780.861745] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48004484 XER: 20000000
[ 1780.861770] CFAR: c000000000008468 DAR: 00000000000012a0 DSISR: 40000000 SOFTE: 0
GPR00: d0000000065c1974 c0000001e6fd3bd0 d0000000065cba00 0000000000001298
GPR04: 00000000000012b8 c0000001fe664468 0000000000000018 646920726f662064
GPR08: 0000000000000007 0000000000000001 0000000000001298 d0000000065c2870
GPR12: c000000000583da0 c000000007add100 0000000000000000 0000000000000000
GPR16: 00003fffae670000 0000000000000000 0000000000000002 0000000000000001
GPR20: c0000001e99032e8 0000000000000000 c0000001e9903300 f000000000000000
GPR24: c0000001e9903340 0000000000000000 d000000006600000 d000000006600080
GPR28: c0000001e9902000 0000000000000080 f000000000019800 c0000001fe664458
[ 1780.861901] NIP [c000000000583de0] __bitmap_xor+0x40/0x60
[ 1780.861910] LR [d0000000065c1b04] tcmu_handle_completions+0x394/0x510 [target_core_user]
[ 1780.861919] Call Trace:
[ 1780.861926] [c0000001e6fd3bd0] [d0000000065c1974] tcmu_handle_completions+0x204/0x510 [target_core_user] (unreliable)
[ 1780.861942] [c0000001e6fd3cd0] [d0000000065c1cac] tcmu_irqcontrol+0x2c/0x50 [target_core_user]
[ 1780.861956] [c0000001e6fd3d00] [d000000006561798] uio_write+0x98/0x140 [uio]
[ 1780.861966] [c0000001e6fd3d50] [c0000000002dda0c] __vfs_write+0x6c/0xe0
[ 1780.861977] [c0000001e6fd3d90] [c0000000002de740] vfs_write+0xc0/0x230
[ 1780.861988] [c0000001e6fd3de0] [c0000000002df77c] SyS_write+0x6c/0x110
[ 1780.861999] [c0000001e6fd3e30] [c000000000009204] system_call+0x38/0xb4
[ 1780.862007] Instruction dump:
[ 1780.862013] 78c60020 3944fff8 38c6ffff 38a5fff8 78c61f48 3863fff8 7c843214 48000014
[ 1780.862033] 60000000 60000000 60000000 60420000 <e92a0009> e9050009 7faa2040 7d294278
[ 1780.862056] ---[ end trace 212caf961ccdad3d ]---

A series of patches will fix this issue.
  ibmvscsis: Rearrange functions for future patches
  ibmvscsis: Synchronize cmds at tpg_enable_store time
  ibmvscsis: Synchronize cmds at remove time
  ibmvscsis: Clean up properly if target_submit_cmd/tmr fails
  ibmvscsis: Return correct partition name/# to client
  ibmvscsis: Issues from Dan Carpenter/Smatch
http://www.spinics.net/lists/linux-scsi/msg100569.html

The patch has been accepted and applied to 4.10/scsi-queue.

scsi: ibmvscsis: Rearrange functions for future patches
https://kernel.googlesource.com/pub/scm/linux/kernel/git/mkp/scsi/+/fbbcc033a20a6af94eeb8fa995668ed5051be111

scsi: ibmvscsis: Synchronize cmds at tpg_enable_store time
https://kernel.googlesource.com/pub/scm/linux/kernel/git/mkp/scsi/+/f877a5398764ce18487c1a70f297743ae91a0e25

scsi: ibmvscsis: Synchronize cmds at remove time
https://kernel.googlesource.com/pub/scm/linux/kernel/git/mkp/scsi/+/5dda944d439e295cadceb1e2d7f5b73f9dc2b296

scsi: ibmvscsis: Clean up properly if target_submit_cmd/tmr fails
https://kernel.googlesource.com/pub/scm/linux/kernel/git/mkp/scsi/+/dcdf7ab322a304d52bfd3190d79337f09f05b3a5

scsi: ibmvscsis: Return correct partition name/# to client
https://kernel.googlesource.com/pub/scm/linux/kernel/git/mkp/scsi/+/cb70442aebffd51a295ba99bc97a793d019efefe

scsi: ibmvscsis: Issues from Dan Carpenter/Smatch
https://kernel.googlesource.com/pub/scm/linux/kernel/git/mkp/scsi/+/4.10/scsi-queue

So it would be great if it can be backported into 16.04 and 16.10?

Thanks.

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-147639 severity-high targetmilestone-inin16041

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
status: New → Triaged
Tim Gardner (timg-tpi) on 2016-11-18
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu Zesty):
assignee: Canonical Kernel Team (canonical-kernel-team) → Tim Gardner (timg-tpi)
status: Triaged → In Progress
Tim Gardner (timg-tpi) on 2016-11-18
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Luis Henriques (henrix) on 2016-11-29
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Luis Henriques (henrix) on 2016-11-29
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
bugproxy (bugproxy) on 2016-11-29
tags: added: verification-done-xenial verification-done-yakkety
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.9.0-11.12

---------------
linux (4.9.0-11.12) zesty; urgency=low

  * Miscellaneous Ubuntu changes
    - UBUNTU: SAUCE: Add '-fno-pie -no-pie' to cflags for x86 selftests
    - UBUNTU: SAUCE: (no-up) aufs: for v4.9-rc1, support setattr_prepare()

  [ Upstream Kernel Changes ]

  * rebase to v4.9

 -- Tim Gardner <email address hidden> Mon, 12 Dec 2016 06:40:40 -0700

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (17.0 KiB)

This bug was fixed in the package linux - 4.4.0-57.78

---------------
linux (4.4.0-57.78) xenial; urgency=low

  * Release Tracking Bug
    - LP: #1648867

  * Miscellaneous Ubuntu changes
    - SAUCE: Do not build the xr-usb-serial driver for s390

linux (4.4.0-56.77) xenial; urgency=low

  * Release Tracking Bug
    - LP: #1648867

  * Release Tracking Bug
    - LP: #1648579

  * CONFIG_NR_CPUS=256 is too low (LP: #1579205)
    - [Config] Increase the NR_CPUS to 512 for amd64 to support systems with a
      large number of cores.

  * NVMe drives in Amazon AWS instance fail to initialize (LP: #1648449)
    - SAUCE: (no-up) NVMe: only setup MSIX once

linux (4.4.0-55.76) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1648503

  * NVMe driver accidentally reverted to use GSI instead of MSIX (LP: #1647887)
    - (fix) NVMe: restore code to always use MSI/MSI-x interrupts

linux (4.4.0-54.75) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1648017

  * Update hio driver to 2.1.0.28 (LP: #1646643)
    - SAUCE: hio: update to Huawei ES3000_V2 (2.1.0.28)

  * linux: Enable live patching for all supported architectures (LP: #1633577)
    - [Config] CONFIG_LIVEPATCH=y for s390x

  * Botched backport breaks level triggered EOIs in QEMU guests with --machine
    kernel_irqchip=split (LP: #1644394)
    - kvm/irqchip: kvm_arch_irq_routing_update renaming split

  * Xenial update to v4.4.35 stable release (LP: #1645453)
    - x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
    - KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
    - KVM: Disable irq while unregistering user notifier
    - fuse: fix fuse_write_end() if zero bytes were copied
    - mfd: intel-lpss: Do not put device in reset state on suspend
    - can: bcm: fix warning in bcm_connect/proc_register
    - i2c: mux: fix up dependencies
    - kbuild: add -fno-PIE
    - scripts/has-stack-protector: add -fno-PIE
    - x86/kexec: add -fno-PIE
    - kbuild: Steal gcc's pie from the very beginning
    - ext4: sanity check the block and cluster size at mount time
    - crypto: caam - do not register AES-XTS mode on LP units
    - drm/amdgpu: Attach exclusive fence to prime exported bo's. (v5)
    - clk: mmp: pxa910: fix return value check in pxa910_clk_init()
    - clk: mmp: pxa168: fix return value check in pxa168_clk_init()
    - clk: mmp: mmp2: fix return value check in mmp2_clk_init()
    - rtc: omap: Fix selecting external osc
    - iwlwifi: pcie: fix SPLC structure parsing
    - mfd: core: Fix device reference leak in mfd_clone_cell
    - uwb: fix device reference leaks
    - PM / sleep: fix device reference leak in test_suspend
    - PM / sleep: don't suspend parent when async child suspend_{noirq, late}
      fails
    - IB/mlx4: Check gid_index return value
    - IB/mlx4: Fix create CQ error flow
    - IB/mlx5: Use cache line size to select CQE stride
    - IB/mlx5: Fix fatal error dispatching
    - IB/core: Avoid unsigned int overflow in sg_alloc_table
    - IB/uverbs: Fix leak of XRC target QPs
    - IB/cm: Mark stale CM id's whenever the mad agent was unregistered
    - netfilter: nft_dynset: fix element timeou...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (25.5 KiB)

This bug was fixed in the package linux - 4.8.0-32.34

---------------
linux (4.8.0-32.34) yakkety; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1649358

  * Vulnerability picked up from 4.8.10 stable kernel (LP: #1648662)
    - net: handle no dst on skb in icmp6_send

linux (4.8.0-31.33) yakkety; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1648034

  * Update hio driver to 2.1.0.28 (LP: #1646643)
    - SAUCE: hio: update to Huawei ES3000_V2 (2.1.0.28)

  * Yakkety update to v4.8.11 stable release (LP: #1645421)
    - x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
    - KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
    - KVM: Disable irq while unregistering user notifier
    - arm64: KVM: pmu: Fix AArch32 cycle counter access
    - KVM: arm64: Fix the issues when guest PMCCFILTR is configured
    - ftrace: Ignore FTRACE_FL_DISABLED while walking dyn_ftrace records
    - ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip records
    - genirq: Use irq type from irqdata instead of irqdesc
    - fuse: fix fuse_write_end() if zero bytes were copied
    - IB/rdmavt: rdmavt can handle non aligned page maps
    - IB/hfi1: Fix rnr_timer addition
    - mfd: intel-lpss: Do not put device in reset state on suspend
    - mfd: stmpe: Fix RESET regression on STMPE2401
    - can: bcm: fix warning in bcm_connect/proc_register
    - gpio: do not double-check direction on sleeping chips
    - ALSA: usb-audio: Fix use-after-free of usb_device at disconnect
    - ALSA: hda - add a new condition to check if it is thinkpad
    - ALSA: hda - Fix mic regression by ASRock mobo fixup
    - i2c: mux: fix up dependencies
    - i2c: i2c-mux-pca954x: fix deselect enabling for device-tree
    - Disable the __builtin_return_address() warning globally after all
    - kbuild: add -fno-PIE
    - scripts/has-stack-protector: add -fno-PIE
    - x86/kexec: add -fno-PIE
    - kbuild: Steal gcc's pie from the very beginning
    - ext4: sanity check the block and cluster size at mount time
    - ARM: dts: imx53-qsb: Fix regulator constraints
    - crypto: caam - do not register AES-XTS mode on LP units
    - powerpc/64: Fix setting of AIL in hypervisor mode
    - drm/amdgpu: Attach exclusive fence to prime exported bo's. (v5)
    - drm/i915: Refresh that status of MST capable connectors in ->detect()
    - drm/i915: Assume non-DP++ port if dvo_port is HDMI and there's no AUX ch
      specified in the VBT
    - virtio-net: drop legacy features in virtio 1 mode
    - clk: mmp: pxa910: fix return value check in pxa910_clk_init()
    - clk: mmp: pxa168: fix return value check in pxa168_clk_init()
    - clk: mmp: mmp2: fix return value check in mmp2_clk_init()
    - clk: imx: fix integer overflow in AV PLL round rate
    - rtc: omap: Fix selecting external osc
    - iwlwifi: pcie: fix SPLC structure parsing
    - iwlwifi: pcie: mark command queue lock with separate lockdep class
    - iwlwifi: mvm: fix netdetect starting/stopping for unified images
    - iwlwifi: mvm: fix d3_test with unified D0/D3 images
    - iwlwifi: mvm: wake the wait queue when the RX sync counter is zero
    - mfd: cor...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers