DMAR initialization errors with newer Intel BIOSes

Bug #1847335 reported by Allain Legacy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Jim Somerville

Bug Description

Brief Description
-----------------

We are experiencing errors when trying to start ovs-dpdk on some wolfpass systems with newer Intel BIOSses. The errors manifest themselves as a PCI bind error when running a command such as this one:

controller-1:~$ sudo /usr/share/openvswitch/scripts/dpdk-devbind.py --bind vfio-pci 0000:19:00.0
Password:
Error: bind failed for 0000:19:00.0 - Cannot bind to driver vfio-pci

Wolfpass systems with older Intel BIOS are not experiencing the same errors.

While investigating possible reasons for this error we realized that the IOMMU was not configured properly for this device (0000:19:00.0). (The MMU configuration is a requirement of the vfio-pci driver). We noticed that the dmesg log on the system was missing logs when compared to working systems with older BIOSes. These logs were missing:

controller-0:~$ dmesg |grep -i mmu|grep 19:00
[ 6.718559] iommu: Adding device 0000:19:00.0 to group 24
[ 6.723989] iommu: Adding device 0000:19:00.1 to group 25

Looking at the full dmesg log I noticed the following log which seems to happen when the DMAR initialization is starting. Following this log we do not see any of the normal DMAR initialization like we do on the other boards.

[ 3.680989] DMAR: Device scope type does not match for 0000:17:00.0

Therefore, it looks like DMAR initialization is bailing out rather than continuing. I tracked that log down to the following kernel patch which confirms that initialization is aborting when this log is output. I am guessing we do not have this patch in our kernel otherwise DMAR initialization should continue normally even though there is a scope mismatch.

https://lkml.org/lkml/2016/6/15/442

To confirm that there was truly a mismatch on that device I used the following command to dump the DMAR table on both nodes.

cat /sys/firmware/acpi/tables/DMAR > new-system.dmar.raw

Then I copied those files down from both systems to my own Linux machine and used the following commands to parse the DMAR tables and get the translations which get output to the same filename with the ".isl" extension instead of the ".raw" extension.

iasl -d new-system.dmar.raw
iasl -d old-system.dmar.raw

The older (good) system reports the following entry:

[1B0h 0432 2] Subtable Type : 0002 [Root Port ATS Capability]
[1B2h 0434 2] Length : 0030

[1B4h 0436 1] Flags : 00
[1B5h 0437 1] Reserved : 00
[1B6h 0438 2] PCI Segment Number : 0000

[1B8h 0440 1] Device Scope Type : 02 [PCI Bridge Device]
[1B9h 0441 1] Entry Length : 08
[1BAh 0442 2] Reserved : 0000
[1BCh 0444 1] Enumeration ID : 00
[1BDh 0445 1] PCI Bus Number : 17

[1BEh 0446 2] PCI Path : 00,00

while the newer (broken) system reports multiple entries for bus 17 path 00,00 which is probably what is causing the DMAR initialization to error out.

[1B0h 0432 2] Subtable Type : 0001 [Reserved Memory Region]
[1B2h 0434 2] Length : 0020

[1B4h 0436 2] Reserved : 0000
[1B6h 0438 2] PCI Segment Number : 0000
[1B8h 0440 8] Base Address : 0000000052CC8000
[1C0h 0448 8] End Address (limit) : 000000005ACCFFFF

[1C8h 0456 1] Device Scope Type : 01 [PCI Endpoint Device]
[1C9h 0457 1] Entry Length : 08
[1CAh 0458 2] Reserved : 0000
[1CCh 0460 1] Enumeration ID : 00
[1CDh 0461 1] PCI Bus Number : 17

[1CEh 0462 2] PCI Path : 00,00

[1D0h 0464 2] Subtable Type : 0002 [Root Port ATS Capability]
[1D2h 0466 2] Length : 0030

[1D4h 0468 1] Flags : 00
[1D5h 0469 1] Reserved : 00
[1D6h 0470 2] PCI Segment Number : 0000

[1D8h 0472 1] Device Scope Type : 02 [PCI Bridge Device]
[1D9h 0473 1] Entry Length : 08
[1DAh 0474 2] Reserved : 0000
[1DCh 0476 1] Enumeration ID : 00
[1DDh 0477 1] PCI Bus Number : 17

[1DEh 0478 2] PCI Path : 00,00

Comparing the two ISL files I noticed that the newer system has an additional entry for bus number 17 path 00,00 which is reported as an "PCI Endpoint Device" in addition to a "PCI Bridge Device" whereas the older system only reports a single entry for "PCI Bridge Device".

The only significant difference, or difference related to the PCI setup, that I could find is that the BIOS and IFWI versions are different between the two systems.

older system:

BIOS: SE5C620.86B.00.010013.030920180427

IFWI: 2018.10.5.01.0427.selfboot

newer system:

BIOS: SE5C620.86B.02.01.0008.03192019159

IFWI: 2019.12.2.12.1559.selfboot

We downgraded the broken system to the same BIOS and firmware version as the good system and the problem went away so clearly this is some incompatibility between the newer BIOS and our older kernel. I am opening this LP to track the porting of the aforementioned kernel patch to our kernel (if not already present).

Severity
--------
Provide the severity of the defect.
Critical, hosts won't unlock if running latest BIOS on wolfpass hardware.

Steps to Reproduce
------------------
Install a system onto wolfpass hardware with the latest BIOS installing both ovs-dpdk and openstack.

Expected Behavior
------------------
The ovs-dpdk application should start and the hosts should unlock.

Actual Behavior
----------------
See error above.

Reproducibility
---------------
100%

System Configuration
--------------------
AIO-DX, but likely all systems with wolfpass and latest BIOS.

Branch/Pull Time/Commit
-----------------------
2019/10/04

Last Pass
---------
Unknown

Timestamp/Logs
--------------
See above

Test Activity
-------------
Feature testing

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 / medium priority - issue impacts updating to the latest NIC firmware

tags: added: stx.3.0 stx.distro.other
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Jim Somerville (jsomervi)
Revision history for this message
Austin Sun (sunausti) wrote :

Hi Jim:
   do we have some update for this ?

Revision history for this message
Jim Somerville (jsomervi) wrote :

Hi Austin:

I've been working some higher priority critical security issues, this only has medium priority. But since somebody has already done the work of identifying the required patch, I can get it added pretty quickly and supply you with a patch if you would like to do the actual testing for it.

Revision history for this message
Austin Sun (sunausti) wrote :

Hi Jim:
   Thanks a lot. would you like provide the patch ? then Let us co-work w/ Allain or other test team to see if they can verify it.
   Thanks.
   BR
Austin Sun

Revision history for this message
Jim Somerville (jsomervi) wrote :

Looking at the kernel changelog in the spec file for 957.21.3, I see:

- [iommu] vt-d: Don't reject NTB devices due to scope mismatch (Jerry Snitselaar) [1499325]

made it into:

* Mon Oct 23 2017 Rafael Aquini <email address hidden> [3.10.0-749.el7]

So we have it already. But I'll check the actual source tomorrow, it wouldn't be the first time that the changelog showed commits that weren't in, and didn't list commits that were in.

Revision history for this message
Jim Somerville (jsomervi) wrote :

Confirmed, that patch is already in our load. I'll look for something else.

Revision history for this message
Jim Somerville (jsomervi) wrote :

There has been a new BIOS released since the version used here. I recommend that it be tried:

https://downloadcenter.intel.com/download/29129/Intel-Server-Board-S2600WF-Family-BIOS-and-Firmware-Update-for-Intel-One-Boot-Flash-Update-Intel-OFU-Utility-and-WinPE-

Release notes indicate extensive changes, some involving the PCI bus.

@Austin, do you have a platform where you see this issue, and if so, are you running the latest BIOS, ie. 02.01.0009 ?

Revision history for this message
Ghada Khalil (gkhalil) wrote :

The reporter of this LP is no longer with WR and cannot confirm if the new firmware addresses the issue reported. Given that the required kernel patch is already in starlingX, we will close this bug as Invalid for now.

Changed in starlingx:
status: Triaged → Invalid
Revision history for this message
Jim Somerville (jsomervi) wrote :

We are seeing this problem on wolfpass-15 in the Wind River lab. It is running the 02.01.0009 bios but a newer bios was just released earlier in January, namely 02.01.0010, which addresses some pci issues and thus we should try an upgrade.

Changed in starlingx:
status: Invalid → Confirmed
Revision history for this message
Jim Somerville (jsomervi) wrote :

Another update: On wolfpass-15, I see the bridge at 17:00 presents an entry type of "endpoint" but the pci header type is bridge. This fails a dmar sanity check, and the dmar setup code bails out. If I neuter off this sanity check, dmar setup continues and all is well. So this definitely appears to be broken bios code.

Anyone experiencing this issue who really wants to continue using the particular platform can contact me for a workaround commit. In the meantime, we will get the bios upgraded to the 0010 version (it is currently on 0009) and if still broken then will report it to Intel.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

As per above, this is suspected to be a BIOS issue. So removing the stx release gate as it's not a software issue with stx code.

tags: removed: stx.3.0
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Medium → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/714128

Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/714128
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=a745a5b6f8a02b74f69f828f14960e97a758853c
Submitter: Zuul
Branch: master

commit a745a5b6f8a02b74f69f828f14960e97a758853c
Author: Jim Somerville <email address hidden>
Date: Fri Mar 20 10:36:14 2020 -0400

    Kernel: Workaround broken bios affecting iommu

    Problem:
    Broken bios creates inaccurate DMAR tables,
    reporting some bridges as having endpoint types.
    This causes IOMMU initialization to bail
    out early with an error code, the result of
    which is vfio not working correctly.
    This is seen on some Skylake based Wolfpass
    server platforms with up-to-date bios installed.

    Solution:
    Instead of just bailing out of IOMMU
    initialization when such a condition is found,
    we report it and continue. The IOMMU ends
    up successfully initialized anyway. We do this
    only on platforms that have the Skylake bridges
    where this issue has been seen.

    This change is inspired by a similar one posted by
    Lu Baolu of Intel Corp to lkml

    https://lkml.org/lkml/2019/12/24/15

    Change-Id: Ief2df7099b6118eab7f99d5531616926a7a3eb27
    Closes-Bug: 1847335
    Signed-off-by: Jim Somerville <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/716162

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (f/centos8)
Download full text (10.7 KiB)

Reviewed: https://review.opendev.org/716162
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=246f33226dbb50a4c5e86d497df745120ca9e0e4
Submitter: Zuul
Branch: f/centos8

commit a745a5b6f8a02b74f69f828f14960e97a758853c
Author: Jim Somerville <email address hidden>
Date: Fri Mar 20 10:36:14 2020 -0400

    Kernel: Workaround broken bios affecting iommu

    Problem:
    Broken bios creates inaccurate DMAR tables,
    reporting some bridges as having endpoint types.
    This causes IOMMU initialization to bail
    out early with an error code, the result of
    which is vfio not working correctly.
    This is seen on some Skylake based Wolfpass
    server platforms with up-to-date bios installed.

    Solution:
    Instead of just bailing out of IOMMU
    initialization when such a condition is found,
    we report it and continue. The IOMMU ends
    up successfully initialized anyway. We do this
    only on platforms that have the Skylake bridges
    where this issue has been seen.

    This change is inspired by a similar one posted by
    Lu Baolu of Intel Corp to lkml

    https://lkml.org/lkml/2019/12/24/15

    Change-Id: Ief2df7099b6118eab7f99d5531616926a7a3eb27
    Closes-Bug: 1847335
    Signed-off-by: Jim Somerville <email address hidden>

commit 1435fe178ab88aa2b77970a3c07e8a907477a654
Author: Jim Somerville <email address hidden>
Date: Mon Mar 16 16:16:20 2020 -0400

    Build mpt2sas and mpt3sas drivers as modules

    History:
    Back in the day, we didn't have an initramfs
    to allow us to load disk drivers as modules. All
    disk drivers had to be built-in. In CentOS 7.3,
    the mpt2sas and mpt3sas driver code was reorganized
    to allow for a common code base. But along with that,
    those drivers would only now build as modules. We
    created a patch which involved taking a snapshot of
    mpt driver code, and massaged it all into building
    as built-in drivers.

    Problem:
    That old code snapshot along with the fact
    that those two drivers initialize without their
    associated hardware being present (they are built-in),
    seems to cause interference with some other LSI raid
    controllers, namely Harpoon in AVAGO MR9460-8i via a
    Huawei enclosure.

    Solution:
    Simply revert to building those two mptsas drivers as
    modules, the way CentOS intended. They will reside
    on initramfs and be loaded automatically if the
    appropriate hardware is present. With these drivers now
    out of the way, the problematic raid controller works
    fine, driven by the megaraid_sas driver.

    Change-Id: I054c2396df4e659c324e70bffcf3940ad93c9354
    Closes-Bug: 1866293
    Signed-off-by: Jim Somerville <email address hidden>

commit bed7388b678b9eda0d06b4d16fb00711741f9ef0
Author: Paul Vaduva <email address hidden>
Date: Tue Mar 10 12:05:31 2020 -0400

    Release FDs when stuck peering recovery

    During stuck peering recovery if file descriptors are
    not released the state machine does not advance to
    OPERATIONAL state

    Partial-bug: 1856064

    Change-Id: I3fba7be661ebf22...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.