i40e xps management broken when > 64 queues/cpus

Bug #1820948 reported by Nivedita Singhvi on 2019-03-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Nivedita Singhvi
Bionic
High
Nivedita Singhvi

Bug Description

[Impact]
Transmit packet steering (xps) settings don't work when
the number of queues (cpus) is higher than 64. This is
currently still an issue on the 4.15 kernel (Xenial -hwe
and Bionic kernels).

It was fixed in Intel's i40e driver version 2.7.11 and
in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix).

Fix
-----
The following commit fixes this issue (as identified
by Lihong Yang in discussion with Intel i40e team):

"i40e: Fix the number of queues available to be mapped for use"
Commit: bc6d33c8d93f5999920e97a8c6330b8910053d4f

It requires the following commit as well:

i40e: Do not allow use more TC queue pairs than MSI-X vectors exist
Commit: 1563f2d2e01242f05dd523ffd56fe104bc1afd58

[Test Case]
1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel
   i40e driver version: 2.1.14-k
   Any system with > 64 CPUs

2. For any queue 0 - 63, you can read/set tx xps:

echo ffffffff > /sys/class/net/eth2/queues/tx-63/xps_cpus
echo $?
0
cat /sys/class/net/eth2/queues/tx-63/xps_cpus
00,00000000,ffffffff

  But for any queue number > 63, we see this error:

echo ffffffff > /sys/class/net/eth2/queues/tx-64/xps_cpus
echo: write error: Invalid argument

cat /sys/class/net/eth2/queues/tx-64/xps_cpus
cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument

CVE References

It's been reported by an external reporter and reproduced
internally.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu Bionic):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
assignee: nobody → Nivedita Singhvi (niveditasinghvi)

Will be submitting SRU request early next week; trying to get
it into this next kernel release cycle.

Changed in linux (Ubuntu):
assignee: nobody → Nivedita Singhvi (niveditasinghvi)
Changed in linux (Ubuntu Bionic):
status: Confirmed → In Progress
Changed in linux (Ubuntu):
status: Confirmed → In Progress

Submitted patches for SRU.

description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

I'm still trying to confirm this for Xenial.

Changed in linux (Ubuntu):
status: In Progress → Fix Released

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Connor Kuehl (connork) wrote :

Hi Nivedita,

The Bionic kernel containing a fix for this issue is now in the "-proposed" repository. Could you (or the external reporter mentioned earlier in the ticket) try the proposed kernel to see if it fixes the issue?

Also, I saw you left a note earlier saying that you were still investigating this for Xenial; are there any updates from that investigation?

Thank you!

Launchpad Janitor (janitor) wrote :
Download full text (14.6 KiB)

This bug was fixed in the package linux - 4.15.0-48.51

---------------
linux (4.15.0-48.51) bionic; urgency=medium

  * linux: 4.15.0-48.51 -proposed tracker (LP: #1822820)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
    - [Packaging] resync retpoline extraction

  * 3b080b2564287be91605bfd1d5ee985696e61d3c in ubuntu_btrfs_kernel_fixes
    triggers system hang on i386 (LP: #1812845)
    - btrfs: raid56: properly unmap parity page in finish_parity_scrub()

  * [P9][LTCTest][Opal][FW910] cpupower monitor shows multiple stop Idle_Stats
    (LP: #1719545)
    - cpupower : Fix header name to read idle state name

  * [amdgpu] screen corruption when using touchpad (LP: #1818617)
    - drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over (v3)
    - drm/amdgpu: Free VGA stolen memory as soon as possible.

  * [SRU][B/C/OEM]IOMMU: add kernel dma protection (LP: #1820153)
    - ACPICA: AML parser: attempt to continue loading table after error
    - ACPI / property: Allow multiple property compatible _DSD entries
    - PCI / ACPI: Identify untrusted PCI devices
    - iommu/vt-d: Force IOMMU on for platform opt in hint
    - iommu/vt-d: Do not enable ATS for untrusted devices
    - thunderbolt: Export IOMMU based DMA protection support to userspace
    - iommu/vt-d: Disable ATS support on untrusted devices

  * Add basic support to NVLink2 passthrough (LP: #1819989)
    - powerpc/powernv/npu: Do not try invalidating 32bit table when 64bit table is
      enabled
    - powerpc/powernv: call OPAL_QUIESCE before OPAL_SIGNAL_SYSTEM_RESET
    - powerpc/powernv: Export opal_check_token symbol
    - powerpc/powernv: Make possible for user to force a full ipl cec reboot
    - powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn
    - powerpc/powernv: Move npu struct from pnv_phb to pci_controller
    - powerpc/powernv/npu: Move OPAL calls away from context manipulation
    - powerpc/pseries/iommu: Use memory@ nodes in max RAM address calculation
    - powerpc/pseries/npu: Enable platform support
    - powerpc/pseries: Remove IOMMU API support for non-LPAR systems
    - powerpc/powernv/npu: Check mmio_atsd array bounds when populating
    - powerpc/powernv/npu: Fault user page into the hypervisor's pagetable

  * Huawei Hi1822 NIC has poor performance (LP: #1820187)
    - net-next: hinic: fix a problem in free_tx_poll()
    - hinic: remove ndo_poll_controller
    - net-next/hinic: add checksum offload and TSO support
    - hinic: Fix l4_type parameter in hinic_task_set_tunnel_l4
    - net-next/hinic:replace multiply and division operators
    - net-next/hinic:add rx checksum offload for HiNIC
    - net-next/hinic:fix a bug in set mac address
    - net-next/hinic: fix a bug in rx data flow
    - net: hinic: fix null pointer dereference on pointer hwdev
    - hinic: optmize rx refill buffer mechanism
    - net-next/hinic:add shutdown callback
    - net-next/hinic: replace disable_irq_nosync/enable_irq

  * [CONFIG] please enable highdpi font FONT_TER16x32 (LP: #1819881)
    - Fonts: New Terminus large console font
    - [Config]: enable highdpi Terminus 16x32 font support

  * [19.04 FEAT] qeth: Enhanced link...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers