Physical host crash with Mellanox IB PCI passthrough

Bug #1091766 reported by Vlastimil Holer
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

(from http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/100736)

We have been using PCI passthrough with the Mellanox IB interface
(MT27500 Family [ConnectX-3]) on Debian 6.0.6, kernel 3.2.23 and
qemu-kvm-1.0 (both from backports). It worked fine until latest
update in backports to qemu-kvm-1.1.2. With newer qemu-kvm versions
IB device probe in guest fails leaving firmware to kill whole physical machine.

I have then compiled qemu-kvm from source, 1.0.1 was OK, 1.1.2 fails and
even 1.2.0 fails as well. Our setup is based on IBM System X iDataPlex
dx360 M4 Server.

Note: Now I have also tested latest qemu-1.3.0 with linux 3.7.1 and
new VFIO mechanism and behaves the same way.

On guest the mlx4_core fails to probe device:
| mlx4_core 0000:00:08.0: irq 74 for MSI/MSI-X
| mlx4_core 0000:00:08.0: irq 75 for MSI/MSI-X
| mlx4_core 0000:00:08.0: irq 76 for MSI/MSI-X
| mlx4_core 0000:00:08.0: irq 77 for MSI/MSI-X
| mlx4_core 0000:00:08.0: NOP command failed to generate MSI-X interrupt IRQ 51).
| mlx4_core 0000:00:08.0: Trying again without MSI-X.
| mlx4_core 0000:00:08.0: NOP command failed to generate interrupt (IRQ 51), aborting.
| mlx4_core 0000:00:08.0: BIOS or ACPI interrupt routing problem?
| mlx4_core 0000:00:08.0: PCI INT A disabled
| mlx4_core: probe of 0000:00:08.0 failed with error -16

Which immediately results in reset of the whole physical machine:
| Uhhuh. NMI received for unknown reason 3d on CPU 0.
| Do you have a strange power saving mode enabled?
| Dazed and confused, but trying to continue

Followed by events in hardware management module:
| A software NMI has occurred on system "SN# xxxxxxx"
| Fault in slot "All PCI Err" on system "SN# xxxxxxx"
| Fault in slot "PCI 1" on system "SN# xxxxxxx"
| A Uncorrectable Bus Error has occurred on system "SN# xxxxxxx"
| "Host Power" has been Power Cycled
| System "SN# xxxxxxx" has recovered from an NMI

Kernel logs for both host/guest machines and different qemu-kvm
versions are attached. PCI passthrough for e.g. Intel e1000 works
fine with all tested qemu-kvm versions.

Tags: kvm qemu qemu-kvm
Revision history for this message
Vlastimil Holer (vlastimil-holer) wrote :
Revision history for this message
Alex Williamson (alex-l-williamson) wrote :

Does the attached patch against qemu 1.3 fix it for pci-assign?

Revision history for this message
Alex Williamson (alex-l-williamson) wrote :

If the above pci-assign patch works, please also try this vfio-pci version as it requires a slightly different implementation. Also against qemu 1.3. Thanks.

Revision history for this message
Vlastimil Holer (vlastimil-holer) wrote : Re: [Bug 1091766] Re: Physical host crash with Mellanox IB PCI passthrough
Download full text (4.0 KiB)

Both patches against qemu 1.3 *works*, first with traditional PCI
pass., second with VFIO. Mellanox IB card in guest works fine again.
Great early Christmas present, thank you!

Just FYI: between both ways I can see little difference on host system
regarding the numbers of IRQs for MSI/MSI-X:

* VFIO:
| vfio_ecap_init: 0000:20:00.0 hiding ecap 0x19@0x18c
| vfio-pci 0000:20:00.0: irq 150 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 150 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 151 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 150 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 151 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 152 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 150 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 151 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 152 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 153 for MSI/MSI-X

* old way:
| assign device 0:20:0.0
| pci-stub 0000:20:00.0: restoring config space at offset 0xf (was
0x100, writing 0x10a)
| pci-stub 0000:20:00.0: restoring config space at offset 0x6 (was
0xc, writing 0xdf00000c)
| pci-stub 0000:20:00.0: restoring config space at offset 0x4 (was
0x4, writing 0x91b00004)
| pci-stub 0000:20:00.0: restoring config space at offset 0x3 (was
0x0, writing 0x10)
| pci-stub 0000:20:00.0: restoring config space at offset 0x1 (was
0x100000, writing 0x100042)
| pci-stub 0000:20:00.0: irq 134 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 135 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 136 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 137 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 138 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 139 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 140 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 141 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 142 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 143 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 144 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 145 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 146 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 147 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 148 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 149 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 150 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 151 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 152 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 153 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 154 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 155 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 156 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 157 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 158 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 159 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 160 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 161 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 162 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 163 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 134 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 135 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 136 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 137 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 138 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 139 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 140 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 141 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 142 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 143 for MSI/MSI-X
| ...

Read more...

Revision history for this message
Alex Williamson (alex-l-williamson) wrote :

VFIO is doing what I expect, growing the enabled vectors as each is unmasked. It starts from 1 and has to disable and re-enable each time it grows, so we have:

| vfio-pci 0000:20:00.0: irq 150 for MSI/MSI-X

1

| vfio-pci 0000:20:00.0: irq 150 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 151 for MSI/MSI-X

1 -> 2

| vfio-pci 0000:20:00.0: irq 150 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 151 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 152 for MSI/MSI-X

2 -> 3

| vfio-pci 0000:20:00.0: irq 150 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 151 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 152 for MSI/MSI-X
| vfio-pci 0000:20:00.0: irq 153 for MSI/MSI-X

3 -> 4

So you likely see through lspci that both the guest and host have 4 vectors enabled.

On legacy assignment we have:

| pci-stub 0000:20:00.0: irq 134 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 135 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 136 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 137 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 138 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 139 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 140 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 141 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 142 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 143 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 144 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 145 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 146 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 147 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 148 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 149 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 150 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 151 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 152 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 153 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 154 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 155 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 156 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 157 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 158 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 159 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 160 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 161 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 162 for MSI/MSI-X
| pci-stub 0000:20:00.0: irq 163 for MSI/MSI-X

So you have 30 vectors enabled in the host! wow. Does lspci in host and guest confirm this? (ie. 30 vectors on the host and likely 4 vectors on the guest) If so, I'll need to look to see if we can apply something more similar to the vfio solution to legacy assignment, otherwise we're wasting a lot of vectors in the host. Did qemu-kvm-1.0 consume this many vectors? Thanks.

Revision history for this message
Vlastimil Holer (vlastimil-holer) wrote :

On Wed, Dec 19, 2012 at 4:31 PM, Alex Williamson
<email address hidden> wrote:
> So you likely see through lspci that both the guest and host have 4
> vectors enabled.

Maybe you'll have to kick me on how to find this data. If I look on
lspci output, I find only this relevant thing:
        Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
                Vector table: BAR=0 offset=0007c000
                PBA: BAR=0 offset=0007d000
enabled or disabled, no count=x/y with enabled/limit which I can see
for some other devices. I'm attaching full lspci -vvv output for one
Mellanox card on physical host, where it's not used.

> vectors on the guest) If so, I'll need to look to see if we can apply
> something more similar to the vfio solution to legacy assignment,
> otherwise we're wasting a lot of vectors in the host. Did qemu-kvm-1.0
> consume this many vectors? Thanks.

Yes, it did. All host/guest kernel logs for qemu-kvm 1.0.1, 1.1.2 and
1.2.0 were attached to this bug.

Revision history for this message
Alex Williamson (alex-l-williamson) wrote :

You're right, I'm thinking of MSI where lspci reports x/y vectors. The only way I know to get this is to grep /proc/interrupts on host and guest. Look for kvm or vfio in the host and likely some device specific identifier in the guest.

I also see your original log now, so old qemu worked, but wasted lots and lots of host vectors. Thanks.

Revision history for this message
Alex Williamson (alex-l-williamson) wrote :

Here's another version of the legacy pci-assign patch. This should also only use 4 vectors on the host, like vfio. I'm a little uneasy about setting up an MSIMessage with unknown data, but I guess we did it for a long time previously. Please test. Thanks

Revision history for this message
Vlastimil Holer (vlastimil-holer) wrote :

On Wed, Dec 19, 2012 at 5:46 PM, Alex Williamson
<email address hidden> wrote:
> You're right, I'm thinking of MSI where lspci reports x/y vectors. The
> only way I know to get this is to grep /proc/interrupts on host and
> guest. Look for kvm or vfio in the host and likely some device specific
> identifier in the guest.

You were absolutely right with vector counts. With VFIO I can see 4
and 4 vectors in host and guest and with legacy PCI assign. (on
qemu-1.0.1) I can see 30 in host and 4 in guest.

On Wed, Dec 19, 2012 at 11:46 PM, Alex Williamson
<email address hidden> wrote:
> Here's another version of the legacy pci-assign patch. This should also
> only use 4 vectors on the host, like vfio. I'm a little uneasy about
> setting up an MSIMessage with unknown data, but I guess we did it for a
> long time previously. Please test. Thanks

I have physical machine in production back again, so it'll take few
days to test your new patch. I'll let you know.

Revision history for this message
Vlastimil Holer (vlastimil-holer) wrote :

Confirmed, third patch for legacy pci-assign enables only 4 vectors on
host and guest. Mellanox IB works for me fine as well. Thanks!

Revision history for this message
Vlastimil Holer (vlastimil-holer) wrote :

Just a silly questions, because I don't know the deveploment process in QEMU project -- can be your patches commited into project's VCS so that new stable release contains them and doesn't fail again? Thank you!

Revision history for this message
Alex Williamson (alex-l-williamson) wrote :

I'm currently trying to make that happen, starting with the patches in comments 2 & 3 and moving to something like the patch in comment 8 in the development branch.

Revision history for this message
Vlastimil Holer (vlastimil-holer) wrote :

My believe is that this bug can be closed. We have tested QEMU 1.4.x and 1.5.x series (with pci-assign) and at least these works fine.

Changed in qemu:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.