A a single PCI read or write appears twice on the PCIe bus. This happens when using the SR-IOV feature with some PCI devices

Bug #1606940 reported by Ryan Harper
30
This bug affects 2 people
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Fix Released
Medium
Unassigned
Trusty
Fix Released
Medium
Unassigned

Bug Description

[Impact]

 * Users of SRIOV devices in qemu on Trusty may encounter unstable
   behavior on pass-through PCI devices due to a bug in qemu's MMIO
   mapping to overlapping ram slots. When memory is accessed in
   subpage granularity where slots have overlapping regions multiple
   invocations of the handler ocurrs which resulted in multiple pci
   writes.

   This affects the qemu releases prior to qemu 2.5, it has been fixed in
   newer releases.

 * Backporting fixes from upstream release is required to allow
   certain PCI devices under SRIOV to function properly.

 * All patches applied are already accepted upstream. Xenial, Yakkety
   are OK, Wily -> Trusty are affected.

[Test Case]

 * On a Trusty 14.04 system with affected SRIOV device.
    - boot system with sriov enabled
    - launch vm with sriov device passed through
      using guest XML attached (bug-1606940-trusty-guest.xml)
    - unpack pcimem tarball inside vm (pcimem.tar attached)
    - Read (note the pci path should point to the SRIOV device)
     ./pcimem /sys/bus/pci/devices/0000\:04\:00.0/resource0 0x10080 d
    - Write
     ./pcimem /sys/bus/pci/devices/0000\:04\:00.0/resource0 0x10080 d 2048
    - Read again
     ./pcimem /sys/bus/pci/devices/0000\:04\:00.0/resource0 0x10080 d

    The value of 0x10080 should be the same for the first read
    and the second read, after the write.

    If the bug is hit, the second read will report a value of double
    instead of the same.

[Regression Potential]

 * SR-IOV device drivers may have unknowingly relied on KVM multi-write
   behavior prior to this patch; that's highly unlikely since it would
   fail on physical hardware (which does not produce this effect). But
   there is a chance that devices only passed into the guest via SRIOV
   might break.

[Original Description]
Customer engineers are testing the SR-IOV feature with a new network card on x86 servers and ran into the issue described below.

They are *not* seeing this issue on Intel 82599 NIC.

We are testing a new device in EP mode with SRIOV. With a CentOS7 VM running on the Ubuntu 14.04.2 host (using VFIO) we see that a single PCI read or write transaction targeting the device’s BAR0 issued from the VM appears twice on the PCIe bus. The same accesses work fine when the VF is accessed directly from the Ubuntu 14.04.2 host. These BAR0 PCI accesses do not require a driver on the VM side. We can reproduce the problem using a simple user-space application to access the VF’s BAR0 registers.

We do not see this problem when the VM runs within a CentOS 7 host or under a Ubuntu 12.04 host. This appears specific to Ubuntu 14.04 release. Appreciate your help in any clues or pointers to this behavior.

This issue is also not happening with 16.04 beta.

Steps to reproduce the bug with pcimem:

Read:
./pcimem /sys/bus/pci/devices/0000\:04\:00.0/resource0 0x10080 d

Write:
./pcimem /sys/bus/pci/devices/0000\:04\:00.0/resource0 0x10080 d 2048

Read again:
./pcimem /sys/bus/pci/devices/0000\:04\:00.0/resource0 0x10080 d

The value of 0x10080 should be the same for the first read and the second read, after the write.

If the bug is hit, the second read will report a value of double instead of the same.

The register should have read back the same value that was written. The register acts like an adder in that every write adds to the previously written value minus anything the device has consumed. We see that the second read returns double the value written in the single write. We captured a PCIe trace and found that each of the PCI operation accessing this register is seen twice on the PCI bus. The 2 writes cause the register value to double which has implications for normal operation. The PCIe trace is attached and has markers to identify the relevant transactions.

Revision history for this message
Ryan Harper (raharper) wrote :

Already fixed upstream.

Changed in qemu (Ubuntu):
status: New → Fix Released
Revision history for this message
Ryan Harper (raharper) wrote :
description: updated
Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Ryan, or anyone else affected,

Accepted qemu into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/2.0.0+dfsg-2ubuntu1.28 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in qemu (Ubuntu Trusty):
status: New → Fix Committed
tags: added: verification-needed
Mathew Hodson (mhodson)
Changed in qemu (Ubuntu):
importance: Undecided → Medium
Changed in qemu (Ubuntu Trusty):
importance: Undecided → Medium
Revision history for this message
Robie Basak (racb) wrote :

This needs verification, please. It is blocking bug 1536331.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (12.1 KiB)

Hi,
I wanted to verify this to get it out of the way.

But IMHO the reproduction statements lack some info on the device type you forwarded.
With what I could set up I can't reproduce as I miss this useful register that "acts like an adder in that every write adds to the previously written value minus anything the device has consumed".

Yet for whoever comes by here all the SR-IOV setup summary to (almost) get to the point.

I'll look at the X540 spec, but I'm not sure I'll find an equally suited test register ...

1. Create matching setup:
 - set up server machine with SR-IOV as trusty
  # GRUB_CMDLINE_LINUX="intel_iommu=on" into /etc/default/grub
  # reboot (could be default but be on the safe side)
  $ sudo rmmod ixgbe
  $ sudo modprobe ixgbe max_vfs=7
  # or long term conf in /etc/modprobe.d/ixgbe.conf
  [ 390.988873] ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver - version 2.12.1-k
  [ 391.618065] ixgbevf 0000:04:10.1: Intel(R) X540 Virtual Function
  ...
  dmesg | grep -e DMAR -e IOMMU
   [ 0.000000] ACPI: DMAR 0x000000007B7E7000 0001E4 (v01 HP ProLiant 00000001 HP 00000001)
   [ 0.000000] DMAR: IOMMU enabled
   [ 1.015129] DMAR: Host address width 46
   [ 1.016287] DMAR: DRHD base: 0x000000fbffc000 flags: 0x1
   [ 1.018008] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020df
   [ 1.020342] DMAR: RMRR base: 0x00000079173000 end: 0x00000079175fff
   [ 1.022241] DMAR: RMRR base: 0x000000791ec000 end: 0x000000791effff
   [ 1.024111] DMAR: RMRR base: 0x000000791dc000 end: 0x000000791ebfff
   [ 1.026033] DMAR: RMRR base: 0x000000791c9000 end: 0x000000791d9fff
   [ 1.028022] DMAR: RMRR base: 0x000000791da000 end: 0x000000791dbfff
   [ 1.029917] DMAR-IR: IOAPIC id 8 under DRHD base 0xfbffc000 IOMMU 0
   [ 1.031796] DMAR-IR: IOAPIC id 9 under DRHD base 0xfbffc000 IOMMU 0
   [ 1.033675] DMAR-IR: HPET id 0 under DRHD base 0xfbffc000
   [ 1.535267] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
   [ 1.538291] DMAR-IR: Enabled IRQ remapping in x2apic mode
   [ 7.763329] DMAR: No ATSR found
   [ 7.764417] DMAR: dmar0: Using Queued invalidation
   [ 7.765854] DMAR: Setting RMRR:
   [ 7.766824] DMAR: Setting identity map for device 0000:01:00.0 [0x791da000 - 0x791dbfff]
   [ 7.769324] DMAR: Setting identity map for device 0000:01:00.1 [0x791da000 - 0x791dbfff]
   [ 7.771721] DMAR: Setting identity map for device 0000:01:00.2 [0x791da000 - 0x791dbfff]
   [ 7.774105] DMAR: Setting identity map for device 0000:01:00.4 [0x791da000 - 0x791dbfff]
   [ 7.776526] DMAR: Setting identity map for device 0000:03:00.0 [0x791da000 - 0x791dbfff]
   [ 7.779011] DMAR: Setting identity map for device 0000:01:00.0 [0x791c9000 - 0x791d9fff]
   [ 7.781416] DMAR: Setting identity map for device 0000:01:00.1 [0x791c9000 - 0x791d9fff]
   [ 7.783799] DMAR: Setting identity map for device 0000:01:00.2 [0x791c9000 - 0x791d9fff]
   [ 7.786268] DMAR: Setting identity map for device 0000:01:00.4 [0x791c9000 - 0x791d9fff]
   [ 7.788757] DMAR: Setting identity map for device 0000:01:00.0 [0x791dc000 - 0x791...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Intel http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/ethernet-x540-datasheet.pdf (for my case) defines in section 9.5 the configuration space for the VF.

Most entries are RO for the VF, and I found no writable register bit behaving in a similar manner than the referred "adding register", but I might overlook something.

Ryan could you still verify this or are more experienced to find such a reg for me in the spec?

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1606940] Re: A a single PCI read or write appears twice on the PCIe bus. This happens when using the SR-IOV feature with some PCI devices

The original bug was with a customer SRIOV device which did have writable
registers.
W.r.t the Intel, that model explicitly passed with no issues. AFAIK, one
has to have
this specific card, or one like it with writable registers.

On Mon, Oct 17, 2016 at 7:31 AM, ChristianEhrhardt <
<email address hidden>> wrote:

> Intel
> http://www.intel.com/content/dam/www/public/us/en/
> documents/datasheets/ethernet-x540-datasheet.pdf
> (for my case) defines in section 9.5 the configuration space for the VF.
>
> Most entries are RO for the VF, and I found no writable register bit
> behaving in a similar manner than the referred "adding register", but I
> might overlook something.
>
> Ryan could you still verify this or are more experienced to find such a
> reg for me in the spec?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1606940
>
> Title:
> A a single PCI read or write appears twice on the PCIe bus. This
> happens when using the SR-IOV feature with some PCI devices
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1606940/+subscriptions
>

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I was having some IRC conversations and email exchanges - thanks Ryan Harper and Larry Michel.

Following up on that I want to summarize:
- the back-ported fix is in qemu since 2.2 and not reverted/follow-on-fixed (only got extra feature to support non 4k host pages later on)
- the fix was tested by the reporter on special HW showing the issue against a PPA version of this upload (on a private bug that preceded this)
- since then the reporter became unresponsive (more than a month now)
- Canonical had, but no more has the special HW available in OIL-Lab, so no chance to verify on our own anymore
- Trying to test with alternative HW failed (see comment #8)

We can now either consider it "verified" by the ppa, or have to drop it to free up the queue for the next qemu fix.

I'll ping the release team to advise and - depending on the decision - help to clean the queue.

Revision history for this message
Martin Pitt (pitti) wrote :

The test against the PPA version is fine (assuming it was the same patch). The more important aspect here is that someone actually tests that proposed QEMU on other platforms, to make sure that the package still by and large works, to guard against miscompilation, changed toolchains, etc.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi Christian,

just for future references, it's of course ideal if we can verify that
the reported bug is fixed, but after 14-21 days it becomes ok to
simply run the qa-regression-tests and verify that there are no new
failures. Whether or not to drop the set if we can't verify the
original bug is fixed is something you can decide. If it was an
upstream fix that didn't take too much finagling to get backported, it
normally makes sense to take it if there are no regressions.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Martin.
Given that I tested this on Trusty with various tests I had from former bugs.
That should ensure a good base level of extra QA - especially including the recently added upgrade tests.

s390x/ppc64el wasn't available/supported back then (only since xenial/vivid) and I unfortunately have no suitable arm system available atm.

But since the patch came by embedded power x86 can be considered the cross-arch test to some extend.

Attaching logs of all the tests that ran ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The remaining issues in the logs are:
- qemu forward migration P->T and machine type uniqueness, both fixed by the upload blocked by this bug atm
- libvirt test on trusty (independent to the fix in proposed in pre & post tests)

All others worked fine - so I think we can call that verified.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Mark W Wenning (mwenning) wrote :

This bug can be marked as resolved.

From: "Panicker, Manojkumar" <email address hidden>
To: Bug 1606940 <email address hidden>
Cc:
Date: Mon, 17 Oct 2016 04:50:08 +0000
Subject: RE: [Bug 1606940] Re: A a single PCI read or write appears twice on the PCIe bus. This happens when using the SR-IOV feature with some PCI devices
I believe we have already indicated that the patch works to resolve this problem. Please let us know if you are looking for something more specific.

Manoj

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 2.0.0+dfsg-2ubuntu1.28

---------------
qemu (2.0.0+dfsg-2ubuntu1.28) trusty; urgency=medium

  [ Ryan Harper ]
  * Apply upstream fix for memory slot alignement (LP: #1606940)
    - debian/patches/kvm-fix-memory-slot-page-alignment-logic.patch

 -- Chris J Arges <email address hidden> Thu, 15 Sep 2016 09:58:23 -0500

Changed in qemu (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.