PR for: "IB/mlx5: Use __iowrite64_copy() for write combining stores"

Bug #2071655 reported by Brad Figg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-nvidia (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

    mlx5 has a built in self-test at driver startup to evaluate if the
    platform supports write combining to generate a 64 byte PCIe TLP or
    not. This has proven necessary because a lot of common scenarios end up
    with broken write combining (especially inside virtual machines) and there
    is other way to learn this information.

    This self test has been consistently failing on new ARM64 CPU
    designs (specifically with NVIDIA Grace's implementation of Neoverse
    V2). The C loop around writeq() generates some pretty terrible ARM64
    assembly, but historically this has worked on a lot of existing ARM64 CPUs
    till now.

    We see it succeed about 1 time in 10,000 on the worst effected
    systems. The CPU architects speculate that the load instructions
    interspersed with the stores makes the WC buffers statistically flush too
    often and thus the generation of large TLPs becomes infrequent. This makes
    the boot up test unreliable in that it indicates no write-combining,
    however userspace would be fine since it uses a ST4 instruction.

    Further, S390 has similar issues where only the special zpci_memcpy_toio()
    will actually generate large TLPs, and the open coded loop does not
    trigger it at all.

    Fix both ARM64 and S390 by switching to __iowrite64_copy() which now
    provides architecture specific variants that have a high change of
    generating a large TLP with write combining. x86 continues to use a
    similar writeq loop in the generate __iowrite64_copy().

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia/6.8.0-1011.11 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-nvidia' to 'verification-done-noble-linux-nvidia'. If the problem still exists, change the tag 'verification-needed-noble-linux-nvidia' to 'verification-failed-noble-linux-nvidia'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-nvidia-v2 verification-needed-noble-linux-nvidia
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (73.1 KiB)

This bug was fixed in the package linux-nvidia - 6.8.0-1011.11

---------------
linux-nvidia (6.8.0-1011.11) noble; urgency=medium

  * noble/linux-nvidia: 6.8.0-1011.11 -proposed tracker (LP: #2072186)

  * PR for: "IB/mlx5: Use __iowrite64_copy() for write combining stores"
    (LP: #2071655)
    - x86: Stop using weak symbols for __iowrite32_copy()
    - s390: Implement __iowrite32_copy()
    - s390: Stop using weak symbols for __iowrite64_copy()
    - arm64/io: Provide a WC friendly __iowriteXX_copy()
    - net: hns3: Remove io_stop_wc() calls after __iowrite64_copy()

  * PR for: "PCI: Clear Secondary Status errors after enumeration"
    (LP: #2071654)
    - PCI: Clear Secondary Status errors after enumeration

  * mlxbf_pmc: bring in latest 6.8 upstream commits (LP: #2069777)
    - platform/mellanox: mlxbf-pmc: Replace uintN_t with kernel-style types
    - platform/mellanox: mlxbf-pmc: Cleanup signed/unsigned mix-up
    - platform/mellanox: mlxbf-pmc: mlxbf_pmc_event_list(): make size ptr optional
    - platform/mellanox: mlxbf-pmc: Ignore unsupported performance blocks
    - platform/mellanox: mlxbf-pmc: fix signedness bugs

  [ Ubuntu: 6.8.0-40.40 ]

  * noble/linux: 6.8.0-40.40 -proposed tracker (LP: #2072201)
  * FPS of glxgear with fullscreen is too low on MTL platform (LP: #2069380)
    - drm/i915: Bypass LMEMBAR/GTTMMADR for MTL stolen memory access
  * a critical typo in the code managing the ASPM settings for PCI Express
    devices (LP: #2071889)
    - PCI/ASPM: Restore parent state to parent, child state to child
  * [UBUNTU 24.04] IOMMU DMA mode changed in kernel config causes massive
    throughput degradation for PCI-related network workloads (LP: #2071471)
    - [Config] Set IOMMU_DEFAULT_DMA_STRICT=n and IOMMU_DEFAULT_DMA_LAZY=yes for
      s390x
  * UBSAN: array-index-out-of-bounds in
    /build/linux-D15vQj/linux-6.5.0/drivers/md/bcache/bset.c:1098:3
    (LP: #2039368)
    - bcache: fix variable length array abuse in btree_iter
  * Mute/mic LEDs and speaker no function on EliteBook 645/665 G11
    (LP: #2071296)
    - ALSA: hda/realtek: fix mute/micmute LEDs don't work for EliteBook 645/665
      G11.
  * failed to enable IPU6 camera sensor on kernel >= 6.8: ivsc_ace
    intel_vsc-5db76cf6-0a68-4ed6-9b78-0361635e2447: switch camera to host
    failed: -110 (LP: #2067364)
    - mei: vsc: Don't stop/restart mei device during system suspend/resume
    - SAUCE: media: ivsc: csi: don't count privacy on as error
    - SAUCE: media: ivsc: csi: add separate lock for v4l2 control handler
    - SAUCE: media: ivsc: csi: remove privacy status in struct mei_csi
    - SAUCE: mei: vsc: Enhance IVSC chipset stability during warm reboot
    - SAUCE: mei: vsc: Enhance SPI transfer of IVSC rom
    - SAUCE: mei: vsc: Utilize the appropriate byte order swap function
    - SAUCE: mei: vsc: Prevent timeout error with added delay post-firmware
      download
  * failed to probe camera sensor on Dell XPS 9315: ov01a10 i2c-OVTI01A0:00:
    failed to check hwcfg: -22 (LP: #2070251)
    - ACPI: utils: Make acpi_handle_path() not static
    - ACPI: property: Ignore bad graph port nodes on Dell XPS 9315
    - ACPI: property: Polish ignoring bad ...

Changed in linux-nvidia (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.