RAID card is not supported on starlingx 3.0

Bug #1866293 reported by weichen
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Jim Somerville

Bug Description

We cannot install starlingx 3.0 on HUAWEI physical host.

Hardware:
HUAWEI 2288H V5
DISK: AVAGO MegaRAID SAS 9460-8i type: LSI SAS3508

Software:
starlingx 3.0

Steps:
Simply install the all in one duplex starlingx, it will be blocked. The log could be seen in the attachment.
After disable RAID in BIOS, installation could success.

Revision history for this message
weichen (challengingway) wrote :
  • Error.docx Edit (1.9 MiB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
Austin Sun (sunausti)
tags: added: stx.3.0
Revision history for this message
Austin Sun (sunausti) wrote :

some triage information.
once change kernel config
remove tis_extra:
   CONFIG_MEGARAID_SAS=y
   CONFIG_SCSI_MPT2SAS=y
   CONFIG_SCSI_MPT3SAS=y
enable
   CONFIG_DM_RAID=m

and not apply Enable-building-mpt2sas-and-mpt3sas-as-builtin-for-C.patch

starlingx can be installed successfully

Austin Sun (sunausti)
Changed in starlingx:
assignee: nobody → Austin Sun (sunausti)
status: New → Confirmed
Revision history for this message
Jim Somerville (jsomervi) wrote :

We don't manage an initrd, so all disk drivers are built into the kernel. At some point around CentOS 7.3, the CentOS folks combined the source for the mpt2sas and mpt3sas drivers, however, they cannot build as built-ins, only as modules. So the patch Enable-building-mpt2sas-and-mpt3sas-as-builtin-for-C.patch was created, taking a *snapshot* of the mpt3sas code at the time. That snapshot doesn't support the SAS3508, as more recent CentOS 7 kernels do.

Will continue to look at a solution.

Ghada Khalil (gkhalil)
tags: added: stx.distro.other
Changed in starlingx:
assignee: Austin Sun (sunausti) → Jim Somerville (jsomervi)
Revision history for this message
Jim Somerville (jsomervi) wrote :

I have it building now, need to run some tests before handing it over for you to pass on to CUC for testing.

Revision history for this message
Jim Somerville (jsomervi) wrote :

I have added an experimental patch as I don't have access to hardware to properly test it. I have confirmed that both mpt2sas and mpt3sas appear to have initialized as they previously have. Please cd to stx/integ and git am <patch file>, rebuild, including remaking the installer. build-iso. test.

Note that the new patch to the kernel is significantly smaller and more straightforward than the old one, and hopefully solves the problem.

Revision history for this message
Jim Somerville (jsomervi) wrote :

It looks like from the description that "DISK: AVAGO MegaRAID SAS 9460-8i type: LSI SAS3508" should be driven from the megaraid_sas driver. If so, it is possible that the old mptsas driver code may have been interfering with it. I still want to see the dmesg output from any experiments.

Revision history for this message
Austin Sun (sunausti) wrote :

Hi, Jim:
   The dmesg and config were attached , these 2 files are from my iso which is applied 'build megaraid as module" . I'm building an iso with patch you provided and will update once done. ?field.comment=Hi, Jim:
   The dmesg and config were attached , these 2 files are from my iso which is applied 'build megaraid as module" . I'm building an iso with patch you provided and will update once done.

Revision history for this message
Austin Sun (sunausti) wrote :
Revision history for this message
Austin Sun (sunausti) wrote :

Hi, Jim
   Good News. your patch works.
I can only use web bmc to access server. so I cannot copy dmesg.
but I grasp several.
megasas:07.707.50.00-rh1
mpt2sas version 20.103.01.00 loaded.

config are set same as stx.3.0 release
CONFIG_MEGARAID_SAS=y
CONFIG_SCSI_MPT2SAS=y
CONFIG_SCSI_MPT3SAS=y
if you need some more info , please let me know.

Revision history for this message
Jim Somerville (jsomervi) wrote :

That is good news. You only have serial access through the BMC, but any chance you can just run dmesg and then copy and paste the results by just scrolling back in your terminal?

Revision history for this message
Jim Somerville (jsomervi) wrote :

OK, I now know what's going on, thanks to your dmesg. Back in the pre-STX distant past we weren't managing an initramfs, so disk drivers had to be built-in. Now, the built kernel package does have a module populated initramfs, but doesn't include the out-of-tree modules that we build separately. So out-of-tree disk drivers are still a no go, not that we currently have any. Anyway, the right answer to this issue should be to simply drop both the original mptsas patch and my replacement patch and build with the two MPTSAS config options set to module:

   CONFIG_MEGARAID_SAS=y <- leave this one alone, doesn't hurt to be built in
   CONFIG_SCSI_MPT2SAS=m
   CONFIG_SCSI_MPT3SAS=m

Can you try this final test, including the installer rebuild? Thanks.

Revision history for this message
Austin Sun (sunausti) wrote :

Hi, Jim:
  As you expected, drop both the orig mptsas and replacement patch and enable CONFIG_SCSI_MPT2SAS=m
   and CONFIG_SCSI_MPT3SAS=m, the install and reboot are fines .

does this mean 'config_scsi_mpt2sas=m' is in-tree modules , not out-of-tree modules?

Thanks.
BR
Austin Sun.

Ghada Khalil (gkhalil)
tags: added: stx.4.0
Changed in starlingx:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/713327

Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
Austin Sun (sunausti) wrote :

Hi, Ghada:
    any plan for including this fix to stx.3.0 release ?

Thanks.
BR
Austin Sun.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/713327
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=1435fe178ab88aa2b77970a3c07e8a907477a654
Submitter: Zuul
Branch: master

commit 1435fe178ab88aa2b77970a3c07e8a907477a654
Author: Jim Somerville <email address hidden>
Date: Mon Mar 16 16:16:20 2020 -0400

    Build mpt2sas and mpt3sas drivers as modules

    History:
    Back in the day, we didn't have an initramfs
    to allow us to load disk drivers as modules. All
    disk drivers had to be built-in. In CentOS 7.3,
    the mpt2sas and mpt3sas driver code was reorganized
    to allow for a common code base. But along with that,
    those drivers would only now build as modules. We
    created a patch which involved taking a snapshot of
    mpt driver code, and massaged it all into building
    as built-in drivers.

    Problem:
    That old code snapshot along with the fact
    that those two drivers initialize without their
    associated hardware being present (they are built-in),
    seems to cause interference with some other LSI raid
    controllers, namely Harpoon in AVAGO MR9460-8i via a
    Huawei enclosure.

    Solution:
    Simply revert to building those two mptsas drivers as
    modules, the way CentOS intended. They will reside
    on initramfs and be loaded automatically if the
    appropriate hardware is present. With these drivers now
    out of the way, the problematic raid controller works
    fine, driven by the megaraid_sas driver.

    Change-Id: I054c2396df4e659c324e70bffcf3940ad93c9354
    Closes-Bug: 1866293
    Signed-off-by: Jim Somerville <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Next step is to cherrypick this change to the r/stx.3.0 branch

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (r/stx.3.0)

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/713936

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (r/stx.3.0)

Reviewed: https://review.opendev.org/713936
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=ef150416f5ccb18707d75df1f1f8b2a36156b5b4
Submitter: Zuul
Branch: r/stx.3.0

commit ef150416f5ccb18707d75df1f1f8b2a36156b5b4
Author: Jim Somerville <email address hidden>
Date: Mon Mar 16 16:16:20 2020 -0400

    Build mpt2sas and mpt3sas drivers as modules

    History:
    Back in the day, we didn't have an initramfs
    to allow us to load disk drivers as modules. All
    disk drivers had to be built-in. In CentOS 7.3,
    the mpt2sas and mpt3sas driver code was reorganized
    to allow for a common code base. But along with that,
    those drivers would only now build as modules. We
    created a patch which involved taking a snapshot of
    mpt driver code, and massaged it all into building
    as built-in drivers.

    Problem:
    That old code snapshot along with the fact
    that those two drivers initialize without their
    associated hardware being present (they are built-in),
    seems to cause interference with some other LSI raid
    controllers, namely Harpoon in AVAGO MR9460-8i via a
    Huawei enclosure.

    Solution:
    Simply revert to building those two mptsas drivers as
    modules, the way CentOS intended. They will reside
    on initramfs and be loaded automatically if the
    appropriate hardware is present. With these drivers now
    out of the way, the problematic raid controller works
    fine, driven by the megaraid_sas driver.

    Change-Id: I054c2396df4e659c324e70bffcf3940ad93c9354
    Closes-Bug: 1866293
    Signed-off-by: Jim Somerville <email address hidden>
    (cherry picked from commit 1435fe178ab88aa2b77970a3c07e8a907477a654)
    Conflicts:
     kernel/kernel-rt/centos/build_srpm.data
     kernel/kernel-std/centos/build_srpm.data

Ghada Khalil (gkhalil)
tags: added: in-r-stx30
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/716162

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (f/centos8)
Download full text (10.7 KiB)

Reviewed: https://review.opendev.org/716162
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=246f33226dbb50a4c5e86d497df745120ca9e0e4
Submitter: Zuul
Branch: f/centos8

commit a745a5b6f8a02b74f69f828f14960e97a758853c
Author: Jim Somerville <email address hidden>
Date: Fri Mar 20 10:36:14 2020 -0400

    Kernel: Workaround broken bios affecting iommu

    Problem:
    Broken bios creates inaccurate DMAR tables,
    reporting some bridges as having endpoint types.
    This causes IOMMU initialization to bail
    out early with an error code, the result of
    which is vfio not working correctly.
    This is seen on some Skylake based Wolfpass
    server platforms with up-to-date bios installed.

    Solution:
    Instead of just bailing out of IOMMU
    initialization when such a condition is found,
    we report it and continue. The IOMMU ends
    up successfully initialized anyway. We do this
    only on platforms that have the Skylake bridges
    where this issue has been seen.

    This change is inspired by a similar one posted by
    Lu Baolu of Intel Corp to lkml

    https://lkml.org/lkml/2019/12/24/15

    Change-Id: Ief2df7099b6118eab7f99d5531616926a7a3eb27
    Closes-Bug: 1847335
    Signed-off-by: Jim Somerville <email address hidden>

commit 1435fe178ab88aa2b77970a3c07e8a907477a654
Author: Jim Somerville <email address hidden>
Date: Mon Mar 16 16:16:20 2020 -0400

    Build mpt2sas and mpt3sas drivers as modules

    History:
    Back in the day, we didn't have an initramfs
    to allow us to load disk drivers as modules. All
    disk drivers had to be built-in. In CentOS 7.3,
    the mpt2sas and mpt3sas driver code was reorganized
    to allow for a common code base. But along with that,
    those drivers would only now build as modules. We
    created a patch which involved taking a snapshot of
    mpt driver code, and massaged it all into building
    as built-in drivers.

    Problem:
    That old code snapshot along with the fact
    that those two drivers initialize without their
    associated hardware being present (they are built-in),
    seems to cause interference with some other LSI raid
    controllers, namely Harpoon in AVAGO MR9460-8i via a
    Huawei enclosure.

    Solution:
    Simply revert to building those two mptsas drivers as
    modules, the way CentOS intended. They will reside
    on initramfs and be loaded automatically if the
    appropriate hardware is present. With these drivers now
    out of the way, the problematic raid controller works
    fine, driven by the megaraid_sas driver.

    Change-Id: I054c2396df4e659c324e70bffcf3940ad93c9354
    Closes-Bug: 1866293
    Signed-off-by: Jim Somerville <email address hidden>

commit bed7388b678b9eda0d06b4d16fb00711741f9ef0
Author: Paul Vaduva <email address hidden>
Date: Tue Mar 10 12:05:31 2020 -0400

    Release FDs when stuck peering recovery

    During stuck peering recovery if file descriptors are
    not released the state machine does not advance to
    OPERATIONAL state

    Partial-bug: 1856064

    Change-Id: I3fba7be661ebf22...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.