SRIOV: warning if unload VFs

Bug #1715073 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
High
Canonical Kernel Team
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Zesty
Fix Released
High
Joseph Salisbury
Artful
Fix Released
High
Joseph Salisbury

Bug Description

== Comment: #0 - Carol L. Soto <email address hidden> - 2017-02-23 16:11:47 ==
---Problem Description---
When doing SRIOV if I unload VFs will see a warning:

Feb 23 16:05:56 powerio-le11 kernel: [ 201.343397] mlx5_3:wait_for_async_commands:674:(pid 6272): done with all pending requests
Feb 23 16:05:56 powerio-le11 kernel: [ 201.603999] iommu: Removing device 0004:01:00.2 from group 7
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604078] pci 0004:01: 0.2: [PE# 00] Removing DMA window #0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604084] pci 0004:01: 0.2: [PE# 00] Disabling 64-bit DMA bypass
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604350] mlx5_core 0004:01:00.0: VF BAR0: [mem 0x240000000000-0x2401ffffffff 64bit pref] shifted to [mem 0x240000000000-0x2401ffffffff 64bit pref] (Disabling 1 VFs shifted by 0)
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604363] mlx5_core 0004:01:00.0: can't update enabled VF BAR0 [mem 0x240000000000-0x2401ffffffff 64bit pref]
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604379] ------------[ cut here ]------------
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604386] WARNING: CPU: 14 PID: 6272 at /build/linux-twbIHf/linux-4.10.0/drivers/pci/iov.c:584 pci_iov_update_resource+0x178/0x1d0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604388] Modules linked in: mlx5_ib xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp kvm_hv kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc binfmt_misc ipmi_powernv ipmi_devintf uio_pdrv_genirq ipmi_msghandler uio vmx_crypto powernv_rng powernv_op_panel leds_powernv ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi knem(OE) ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx4_en ses enclosure scsi_transport_sas crc32c_vpmsum mlx5_core mlx4_core
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604451] tg3 ipr devlink
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604456] CPU: 14 PID: 6272 Comm: bash Tainted: G OE 4.10.0-8-generic #10-Ubuntu
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604457] task: c000000f40a6d600 task.stack: c000000f40ac8000
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604459] NIP: c0000000006721b8 LR: c0000000006721b4 CTR: 0000000000000000
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604461] REGS: c000000f40acb590 TRAP: 0700 Tainted: G OE (4.10.0-8-generic)
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604462] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604470] CR: 42424422 XER: 20000000
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] CFAR: c000000000b49db4 SOFTE: 1
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR00: c0000000006721b4 c000000f40acb810 c00000000143c900 0000000000000063
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR04: 0000000000000001 0000000000000539 c000001fff700000 0000000000021a50
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR08: 0000000000000007 0000000000000007 0000000000000001 656d5b2030524142
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR12: 0000000000004400 c00000000fb87e00 0000000010180df8 0000000010189e60
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR16: 0000000010189ed8 c000000fdd0a2400 c000001fff97d180 c000000000d46268
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR20: c000000000d4e410 c000000000d41df8 c000001fff97d190 c000000000d4d8d8
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR24: c000000000d4d8e0 c000000fe8f460a0 0000000000000001 0000000000000000
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604471] GPR28: c000000fe8f80f80 0000000000000000 c000000fe8f46580 c000000fe8f46000
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604498] NIP [c0000000006721b8] pci_iov_update_resource+0x178/0x1d0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604501] LR [c0000000006721b4] pci_iov_update_resource+0x174/0x1d0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604501] Call Trace:
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604504] [c000000f40acb810] [c0000000006721b4] pci_iov_update_resource+0x174/0x1d0 (unreliable)
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604507] [c000000f40acb8c0] [c000000000655b84] pci_update_resource+0x94/0x2e0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604511] [c000000f40acb980] [c00000000007f3a0] pnv_pci_vf_resource_shift+0x1c0/0x260
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604514] [c000000f40acba70] [c000000000084c68] pnv_pci_sriov_disable+0x308/0x320
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604516] [c000000f40acbb50] [c000000000085578] pcibios_sriov_disable+0x28/0x50
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604519] [c000000f40acbb80] [c00000000067182c] pci_disable_sriov+0xac/0x1b0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604534] [c000000f40acbbc0] [d00000001436789c] mlx5_core_sriov_configure+0x64/0x310 [mlx5_core]
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604537] [c000000f40acbc50] [c000000000653e84] sriov_numvfs_store+0x134/0x1a0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604541] [c000000f40acbce0] [c000000000731d5c] dev_attr_store+0x3c/0x60
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604544] [c000000f40acbd00] [c0000000003e7078] sysfs_kf_write+0x68/0xa0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604547] [c000000f40acbd20] [c0000000003e5f1c] kernfs_fop_write+0x17c/0x250
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604551] [c000000f40acbd70] [c00000000032904c] __vfs_write+0x3c/0x70
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604553] [c000000f40acbd90] [c00000000032aad4] vfs_write+0xd4/0x240
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604556] [c000000f40acbde0] [c00000000032c688] SyS_write+0x68/0x110
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604559] [c000000f40acbe30] [c00000000000b184] system_call+0x38/0xe0
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604560] Instruction dump:
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604562] 480c1891 60000000 e8bf00f0 2fa50000 7c641b78 419e0024 3c62ff98 7fc7f378
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604568] 7f66db78 3863eae8 484d7ba5 60000000 <0fe00000> 4bffff20 e8bf00b0 4bffffdc
Feb 23 16:05:56 powerio-le11 kernel: [ 201.604574] ---[ end trace 31d4be8cddb965f1 ]---

I think this warning is coming from the fix for bugzilla: Bug 146479 LP1625318

---uname output---
4.10.0-8-generic #10-Ubuntu SMP Mon Feb 13 14:00:06 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = P8

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 Im using a Mellanox with SRIOV support
You can do this steps:
To load VF:
 modprobe mlx5_ib
echo 1 > /sys/class/infiniband/mlx5_0/device/sriov_numvfs
to unload VF
echo 0 > /sys/class/infiniband/mlx5_0/device/sriov_numvfs
After this echo you will see the warning.

Contact Information = Carol <email address hidden>

Stack trace output:
 no

Oops output:
 no

System Dump Info:
  The system is not configured to capture a system dump.

*Additional Instructions for Carol <email address hidden>:
-Attach sysctl -a output output to the bug.

== Comment: #2 - Carol L. Soto <email address hidden> - 2017-02-23 16:15:31 ==

== Comment: #3 - Carol L. Soto <email address hidden> - 2017-02-24 14:19:00 ==
Gavin provided me a proposed patch to fix this issue and it resolves the issue.
I also think we need to add this patch below apart of the proposed patch from Gavin:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers/pci/iov.c?id=5b0948dfe138f0837699f46f5877f4f81c252dac

From 5b0948dfe138f0837699f46f5877f4f81c252dac Mon Sep 17 00:00:00 2001
From: Emil Tantilov <email address hidden>
Date: Fri, 6 Jan 2017 13:59:08 -0800
Subject: PCI: Lock each enable/disable num_vfs operation in sysfs

== Comment: #5 - Carol L. Soto <email address hidden> - 2017-03-07 08:12:17 ==
(In reply to comment #4)
> (In reply to comment #3)
> > Gavin provided me a proposed patch to fix this issue and it resolves the
> > issue.
> > I also think we need to add this patch below apart of the proposed patch
> > from Gavin:
> >
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers/
> > pci/iov.c?id=5b0948dfe138f0837699f46f5877f4f81c252dac
> >
> > From 5b0948dfe138f0837699f46f5877f4f81c252dac Mon Sep 17 00:00:00 2001
> > From: Emil Tantilov <email address hidden>
> > Date: Fri, 6 Jan 2017 13:59:08 -0800
> > Subject: PCI: Lock each enable/disable num_vfs operation in sysfs
>
> Is this patch submitted to upstream?
> will back-port to Ubuntu after the patch is accepted.

This bugzilla will have 2 patches one is the one listed here and the other one is the one that Gavin sent but I have not seen it accepted. When it accepted will post the 2 commits.

== Comment: #11 - Leonardo Augusto Guimaraes Garcia <email address hidden> - 2017-06-20 18:57:31 ==
Are the patches needed to fix this bug already upstream?

== Comment: #15 - Carol L. Soto <email address hidden> - 2017-08-31 23:45:55 ==
this patch was reposted
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization&id=0fc690a7c3f7053613dcbab6a7613bb6586d8ee2

== Comment: #17 - MAMATHA INAMDAR <email address hidden> - 2017-09-05 01:31:11 ==
I think we have to backport following two patches to ubuntu

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers/pci/iov.c?id=5b0948dfe138f0837699f46f5877f4f81c252dac

https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization&id=0fc690a7c3f7053613dcbab6a7613bb6586d8ee2

Revision history for this message
bugproxy (bugproxy) wrote : dmesg with the warning

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-151980 severity-high targetmilestone-inin1710
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in kernel-package (Ubuntu):
importance: Undecided → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Zesty test kernel with the two requested commits, 5b0948dfe1 and 0fc690a7c3. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1715073/zesty/

I also built an Artful test kernel with just commit 0fc690a7c3(Commit 5b0948dfe1 is in mainline as of 4.11-rc1). That test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1715073/artful

Can you test these two kernels and see if they resolve this bug?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-09-06 00:01 EDT-------
(In reply to comment #20)
> I built a Zesty test kernel with the two requested commits, 5b0948dfe1 and
> 0fc690a7c3. The test kernel can be downloaded from:
>
> http://kernel.ubuntu.com/~jsalisbury/lp1715073/zesty/
>
> I also built an Artful test kernel with just commit 0fc690a7c3(Commit
> 5b0948dfe1 is in mainline as of 4.11-rc1). That test kernel can be
> downloaded from:
>
> http://kernel.ubuntu.com/~jsalisbury/lp1715073/artful
>
> Can you test these two kernels and see if they resolve this bug?

I tried the 2 kernels and I do not see the stack trace that I used to see when disabling SRIOV. Thanks.

Changed in kernel-package (Ubuntu):
status: New → In Progress
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu Zesty):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Seth Forshee (sforshee)
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.13.0-11.12

---------------
linux (4.13.0-11.12) artful; urgency=low

  * linux: 4.13.0-11.12 -proposed tracker (LP: #1716699)

  * kernel panic -not syncing: Fatal exception: panic_on_oops (LP: #1708399)
    - s390/mm: fix local TLB flushing vs. detach of an mm address space
    - s390/mm: fix race on mm->context.flush_mm

  * CVE-2017-1000251
    - Bluetooth: Properly check L2CAP config option output buffer length

 -- Seth Forshee <email address hidden> Tue, 12 Sep 2017 10:18:38 -0500

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-21 13:55 EDT-------
Hi Carol,

Could you please test the new kernel and let us know the results?

Thanks
Victor

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-25 11:34 EDT-------
You can also try:

echo "deb http://us.ports.ubuntu.com/ubuntu-ports/ $(lsb_release -sc)-proposed main restricted" | sudo tee /etc/apt/sources.list.d/proposed.list
sudo apt-get update
sudo apt-get install <pkg>
sudo apt upgrade linux*4.13.0-11.12*

if you prefer.

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-25 16:50 EDT-------
Verified with this one 4.13.0-11-generic #12-Ubuntu

tags: added: verification-done-zesty
removed: verification-needed-zesty
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.10.0-37.41

---------------
linux (4.10.0-37.41) zesty; urgency=low

  * CVE-2017-1000255
    - SAUCE: powerpc/64s: Use emergency stack for kernel TM Bad Thing program
      checks
    - SAUCE: powerpc/tm: Fix illegal TM state in signal handler

linux (4.10.0-36.40) zesty; urgency=low

  * linux: 4.10.0-36.40 -proposed tracker (LP: #1718143)

  * Neighbour confirmation broken, breaks ARP cache aging (LP: #1715812)
    - sock: add sk_dst_pending_confirm flag
    - net: add dst_pending_confirm flag to skbuff
    - sctp: add dst_pending_confirm flag
    - tcp: replace dst_confirm with sk_dst_confirm
    - net: add confirm_neigh method to dst_ops
    - net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
    - net: pending_confirm is not used anymore

  * SRIOV: warning if unload VFs (LP: #1715073)
    - PCI: Lock each enable/disable num_vfs operation in sysfs
    - PCI: Disable VF decoding before pcibios_sriov_disable() updates resources

  * Kernel has troule recognizing Corsair Strafe RGB keyboard (LP: #1678477)
    - usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard

  * CVE-2017-14106
    - tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0

  * [CIFS] Fix maximum SMB2 header size (LP: #1713884)
    - CIFS: Fix maximum SMB2 header size

  * Middle button of trackpoint doesn't work (LP: #1715271)
    - Input: trackpoint - assume 3 buttons when buttons detection fails

  * Drop GPL from of_node_to_nid() export to match other arches (LP: #1709179)
    - powerpc: Drop GPL from of_node_to_nid() export to match other arches

  * vhost guest network randomly drops under stress (kvm) (LP: #1711251)
    - Revert "vhost: cache used event for better performance"

  * arm64 arch_timer fixes (LP: #1713821)
    - Revert "UBUNTU: SAUCE: arm64: arch_timer: Enable CNTVCT_EL0 trap if
      workaround is enabled"
    - arm64: arch_timer: Enable CNTVCT_EL0 trap if workaround is enabled
    - clocksource/arm_arch_timer: Fix arch_timer_mem_find_best_frame()
    - clocksource/drivers/arm_arch_timer: Fix read and iounmap of incorrect
      variable
    - clocksource/drivers/arm_arch_timer: Fix mem frame loop initialization
    - clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is
      enabled

  * Touchpad not detected (LP: #1708852)
    - Input: elan_i2c - add ELAN0608 to the ACPI table

 -- Thadeu Lima de Souza Cascardo <email address hidden> Fri, 06 Oct 2017 16:45:48 -0300

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-10-10 07:10 EDT-------
Hi Carol,

Could you please do one last check and let us know if we can close this bug?

Thanks
Victor

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-10-16 12:32 EDT-------
(In reply to comment #30)
> Hi Carol,
>
> Could you please do one last check and let us know if we can close this bug?
>
> Thanks
> Victor

Verified with 4.10.0-37-generic.

Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.