SRIOV: warning if unload VFs

Bug #1715073 reported by bugproxy on 2017-09-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
High
Canonical Kernel Team
linux (Ubuntu)
High
Joseph Salisbury
Zesty
High
Joseph Salisbury
Artful
High
Joseph Salisbury

Bug Description

== Comment: #0 - Carol L. Soto <email address hidden> - 2017-02-23 16:11:47 ==
---Problem Description---
When doing SRIOV if I unload VFs will see a warning:

Feb 23 16:05:56 powerio-le11 kernel: [ 201.343397] mlx5_3:wait_for_async_commands:674:(pid 6272): done with all pending requests
Feb 23 16:05:56 powerio-le11 kernel: [ 201.603999] iommu: Removing device 0004:01:00.2 from group 7
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604078] pci 0004:01: 0.2: [PE# 00] Removing DMA window #0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604084] pci 0004:01: 0.2: [PE# 00] Disabling 64-bit DMA bypass
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604350] mlx5_core 0004:01:00.0: VF BAR0: [mem 0x240000000000-0x2401ffffffff 64bit pref] shifted to [mem 0x240000000000-0x2401ffffffff 64bit pref] (Disabling 1 VFs shifted by 0)
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604363] mlx5_core 0004:01:00.0: can't update enabled VF BAR0 [mem 0x240000000000-0x2401ffffffff 64bit pref]
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604379] ------------[ cut here ]------------
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604386] WARNING: CPU: 14 PID: 6272 at /build/linux-twbIHf/linux-4.10.0/drivers/pci/iov.c:584 pci_iov_update_resource+0x178/0x1d0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604388] Modules linked in: mlx5_ib xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp kvm_hv kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc binfmt_misc ipmi_powernv ipmi_devintf uio_pdrv_genirq ipmi_msghandler uio vmx_crypto powernv_rng powernv_op_panel leds_powernv ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi knem(OE) ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx4_en ses enclosure scsi_transport_sas crc32c_vpmsum mlx5_core mlx4_core
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604451] tg3 ipr devlink
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604456] CPU: 14 PID: 6272 Comm: bash Tainted: G OE 4.10.0-8-generic #10-Ubuntu
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604457] task: c000000f40a6d600 task.stack: c000000f40ac8000
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604459] NIP: c0000000006721b8 LR: c0000000006721b4 CTR: 0000000000000000
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604461] REGS: c000000f40acb590 TRAP: 0700 Tainted: G OE (4.10.0-8-generic)
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604462] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604470] CR: 42424422 XER: 20000000
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] CFAR: c000000000b49db4 SOFTE: 1
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR00: c0000000006721b4 c000000f40acb810 c00000000143c900 0000000000000063
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR04: 0000000000000001 0000000000000539 c000001fff700000 0000000000021a50
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR08: 0000000000000007 0000000000000007 0000000000000001 656d5b2030524142
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR12: 0000000000004400 c00000000fb87e00 0000000010180df8 0000000010189e60
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR16: 0000000010189ed8 c000000fdd0a2400 c000001fff97d180 c000000000d46268
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR20: c000000000d4e410 c000000000d41df8 c000001fff97d190 c000000000d4d8d8
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR24: c000000000d4d8e0 c000000fe8f460a0 0000000000000001 0000000000000000
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR28: c000000fe8f80f80 0000000000000000 c000000fe8f46580 c000000fe8f46000
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604498] NIP [c0000000006721b8] pci_iov_update_resource+0x178/0x1d0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604501] LR [c0000000006721b4] pci_iov_update_resource+0x174/0x1d0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604501] Call Trace:
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604504] [c000000f40acb810] [c0000000006721b4] pci_iov_update_resource+0x174/0x1d0 (unreliable)
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604507] [c000000f40acb8c0] [c000000000655b84] pci_update_resource+0x94/0x2e0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604511] [c000000f40acb980] [c00000000007f3a0] pnv_pci_vf_resource_shift+0x1c0/0x260
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604514] [c000000f40acba70] [c000000000084c68] pnv_pci_sriov_disable+0x308/0x320
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604516] [c000000f40acbb50] [c000000000085578] pcibios_sriov_disable+0x28/0x50
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604519] [c000000f40acbb80] [c00000000067182c] pci_disable_sriov+0xac/0x1b0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604534] [c000000f40acbbc0] [d00000001436789c] mlx5_core_sriov_configure+0x64/0x310 [mlx5_core]
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604537] [c000000f40acbc50] [c000000000653e84] sriov_numvfs_store+0x134/0x1a0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604541] [c000000f40acbce0] [c000000000731d5c] dev_attr_store+0x3c/0x60
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604544] [c000000f40acbd00] [c0000000003e7078] sysfs_kf_write+0x68/0xa0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604547] [c000000f40acbd20] [c0000000003e5f1c] kernfs_fop_write+0x17c/0x250
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604551] [c000000f40acbd70] [c00000000032904c] __vfs_write+0x3c/0x70
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604553] [c000000f40acbd90] [c00000000032aad4] vfs_write+0xd4/0x240
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604556] [c000000f40acbde0] [c00000000032c688] SyS_write+0x68/0x110
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604559] [c000000f40acbe30] [c00000000000b184] system_call+0x38/0xe0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604560] Instruction dump:
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604562] 480c1891 60000000 e8bf00f0 2fa50000 7c641b78 419e0024 3c62ff98 7fc7f378
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604568] 7f66db78 3863eae8 484d7ba5 60000000 <0fe00000> 4bffff20 e8bf00b0 4bffffdc
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604574] ---[ end trace 31d4be8cddb965f1 ]---

I think this warning is coming from the fix for bugzilla: Bug 146479 LP1625318

---uname output---
4.10.0-8-generic #10-Ubuntu SMP Mon Feb 13 14:00:06 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = P8

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 Im using a Mellanox with SRIOV support
You can do this steps:
To load VF:
 modprobe mlx5_ib
echo 1 > /sys/class/infiniband/mlx5_0/device/sriov_numvfs
to unload VF
echo 0 > /sys/class/infiniband/mlx5_0/device/sriov_numvfs
After this echo you will see the warning.

Contact Information = Carol <email address hidden>

Stack trace output:
 no

Oops output:
 no

System Dump Info:
  The system is not configured to capture a system dump.

*Additional Instructions for Carol <email address hidden>:
-Attach sysctl -a output output to the bug.

== Comment: #2 - Carol L. Soto <email address hidden> - 2017-02-23 16:15:31 ==

== Comment: #3 - Carol L. Soto <email address hidden> - 2017-02-24 14:19:00 ==
Gavin provided me a proposed patch to fix this issue and it resolves the issue.
I also think we need to add this patch below apart of the proposed patch from Gavin:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers/pci/iov.c?id=5b0948dfe138f0837699f46f5877f4f81c252dac

From 5b0948dfe138f0837699f46f5877f4f81c252dac Mon Sep 17 00:00:00 2001
From: Emil Tantilov <email address hidden>
Date: Fri, 6 Jan 2017 13:59:08 -0800
Subject: PCI: Lock each enable/disable num_vfs operation in sysfs

== Comment: #5 - Carol L. Soto <email address hidden> - 2017-03-07 08:12:17 ==
(In reply to comment #4)
> (In reply to comment #3)
> > Gavin provided me a proposed patch to fix this issue and it resolves the
> > issue.
> > I also think we need to add this patch below apart of the proposed patch
> > from Gavin:
> >
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers/
> > pci/iov.c?id=5b0948dfe138f0837699f46f5877f4f81c252dac
> >
> > From 5b0948dfe138f0837699f46f5877f4f81c252dac Mon Sep 17 00:00:00 2001
> > From: Emil Tantilov <email address hidden>
> > Date: Fri, 6 Jan 2017 13:59:08 -0800
> > Subject: PCI: Lock each enable/disable num_vfs operation in sysfs
>
> Is this patch submitted to upstream?
> will back-port to Ubuntu after the patch is accepted.

This bugzilla will have 2 patches one is the one listed here and the other one is the one that Gavin sent but I have not seen it accepted. When it accepted will post the 2 commits.

== Comment: #11 - Leonardo Augusto Guimaraes Garcia <email address hidden> - 2017-06-20 18:57:31 ==
Are the patches needed to fix this bug already upstream?

== Comment: #15 - Carol L. Soto <email address hidden> - 2017-08-31 23:45:55 ==
this patch was reposted
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization&id=0fc690a7c3f7053613dcbab6a7613bb6586d8ee2

== Comment: #17 - MAMATHA INAMDAR <email address hidden> - 2017-09-05 01:31:11 ==
I think we have to backport following two patches to ubuntu

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers/pci/iov.c?id=5b0948dfe138f0837699f46f5877f4f81c252dac

https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization&id=0fc690a7c3f7053613dcbab6a7613bb6586d8ee2

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-151980 severity-high targetmilestone-inin1710
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in kernel-package (Ubuntu):
importance: Undecided → High
Joseph Salisbury (jsalisbury) wrote :

I built a Zesty test kernel with the two requested commits, 5b0948dfe1 and 0fc690a7c3. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1715073/zesty/

I also built an Artful test kernel with just commit 0fc690a7c3(Commit 5b0948dfe1 is in mainline as of 4.11-rc1). That test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1715073/artful

Can you test these two kernels and see if they resolve this bug?

------- Comment From <email address hidden> 2017-09-06 00:01 EDT-------
(In reply to comment #20)
> I built a Zesty test kernel with the two requested commits, 5b0948dfe1 and
> 0fc690a7c3. The test kernel can be downloaded from:
>
> http://kernel.ubuntu.com/~jsalisbury/lp1715073/zesty/
>
> I also built an Artful test kernel with just commit 0fc690a7c3(Commit
> 5b0948dfe1 is in mainline as of 4.11-rc1). That test kernel can be
> downloaded from:
>
> http://kernel.ubuntu.com/~jsalisbury/lp1715073/artful
>
> Can you test these two kernels and see if they resolve this bug?

I tried the 2 kernels and I do not see the stack trace that I used to see when disabling SRIOV. Thanks.

Changed in kernel-package (Ubuntu):
status: New → In Progress
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu Zesty):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Seth Forshee (sforshee) on 2017-09-08
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: New → In Progress
Stefan Bader (smb) on 2017-09-15
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.13.0-11.12

---------------
linux (4.13.0-11.12) artful; urgency=low

  * linux: 4.13.0-11.12 -proposed tracker (LP: #1716699)

  * kernel panic -not syncing: Fatal exception: panic_on_oops (LP: #1708399)
    - s390/mm: fix local TLB flushing vs. detach of an mm address space
    - s390/mm: fix race on mm->context.flush_mm

  * CVE-2017-1000251
    - Bluetooth: Properly check L2CAP config option output buffer length

 -- Seth Forshee <email address hidden> Tue, 12 Sep 2017 10:18:38 -0500

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-21 13:55 EDT-------
Hi Carol,

Could you please test the new kernel and let us know the results?

Thanks
Victor

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-25 11:34 EDT-------
You can also try:

echo "deb http://us.ports.ubuntu.com/ubuntu-ports/ $(lsb_release -sc)-proposed main restricted" | sudo tee /etc/apt/sources.list.d/proposed.list
sudo apt-get update
sudo apt-get install <pkg>
sudo apt upgrade linux*4.13.0-11.12*

if you prefer.

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-25 16:50 EDT-------
Verified with this one 4.13.0-11-generic #12-Ubuntu

tags: added: verification-done-zesty
removed: verification-needed-zesty
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.10.0-37.41

---------------
linux (4.10.0-37.41) zesty; urgency=low

  * CVE-2017-1000255
    - SAUCE: powerpc/64s: Use emergency stack for kernel TM Bad Thing program
      checks
    - SAUCE: powerpc/tm: Fix illegal TM state in signal handler

linux (4.10.0-36.40) zesty; urgency=low

  * linux: 4.10.0-36.40 -proposed tracker (LP: #1718143)

  * Neighbour confirmation broken, breaks ARP cache aging (LP: #1715812)
    - sock: add sk_dst_pending_confirm flag
    - net: add dst_pending_confirm flag to skbuff
    - sctp: add dst_pending_confirm flag
    - tcp: replace dst_confirm with sk_dst_confirm
    - net: add confirm_neigh method to dst_ops
    - net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
    - net: pending_confirm is not used anymore

  * SRIOV: warning if unload VFs (LP: #1715073)
    - PCI: Lock each enable/disable num_vfs operation in sysfs
    - PCI: Disable VF decoding before pcibios_sriov_disable() updates resources

  * Kernel has troule recognizing Corsair Strafe RGB keyboard (LP: #1678477)
    - usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard

  * CVE-2017-14106
    - tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0

  * [CIFS] Fix maximum SMB2 header size (LP: #1713884)
    - CIFS: Fix maximum SMB2 header size

  * Middle button of trackpoint doesn't work (LP: #1715271)
    - Input: trackpoint - assume 3 buttons when buttons detection fails

  * Drop GPL from of_node_to_nid() export to match other arches (LP: #1709179)
    - powerpc: Drop GPL from of_node_to_nid() export to match other arches

  * vhost guest network randomly drops under stress (kvm) (LP: #1711251)
    - Revert "vhost: cache used event for better performance"

  * arm64 arch_timer fixes (LP: #1713821)
    - Revert "UBUNTU: SAUCE: arm64: arch_timer: Enable CNTVCT_EL0 trap if
      workaround is enabled"
    - arm64: arch_timer: Enable CNTVCT_EL0 trap if workaround is enabled
    - clocksource/arm_arch_timer: Fix arch_timer_mem_find_best_frame()
    - clocksource/drivers/arm_arch_timer: Fix read and iounmap of incorrect
      variable
    - clocksource/drivers/arm_arch_timer: Fix mem frame loop initialization
    - clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is
      enabled

  * Touchpad not detected (LP: #1708852)
    - Input: elan_i2c - add ELAN0608 to the ACPI table

 -- Thadeu Lima de Souza Cascardo <email address hidden> Fri, 06 Oct 2017 16:45:48 -0300

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-10-10 07:10 EDT-------
Hi Carol,

Could you please do one last check and let us know if we can close this bug?

Thanks
Victor

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-10-16 12:32 EDT-------
(In reply to comment #30)
> Hi Carol,
>
> Could you please do one last check and let us know if we can close this bug?
>
> Thanks
> Victor

Verified with 4.10.0-37-generic.

Manoj Iyer (manjo) on 2017-11-06
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers