Installer fails with NVMe controllers that support multipathing

Bug #1967190 reported by M. Vefa Bicakci
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
M. Vefa Bicakci

Bug Description

Brief Description
-----------------
When attempting to install StarlingX onto a system with NVMe controllers that support NVMe multipathing, the anaconda installer fails with the following traceback, even if the NVMe drives are not intended to be used for the installation:

Traceback (most recent call last):
  File "/sbin/anaconda", line 1305, in <module>
    matched = device_matches("LABEL=OEMDRV", disks_only=True)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage_utils.py", line 813, in device_matches
    single_spec_matches = udev.resolve_glob(full_spec)
  File "/usr/lib/python2.7/site-packages/blivet/udev.py", line 138, in resolve_glob
    if fnmatch.fnmatch(name, glob) or fnmatch.fnmatch(path, glob):
  File "/usr/lib64/python2.7/fnmatch.py", line 43, in fnmatch
    return fnmatchcase(name, pat)
  File "/usr/lib64/python2.7/fnmatch.py", line 79, in fnmatchcase
    return _cache[pat].match(name) is not None
TypeError: expected string or buffer

Severity
--------
Critical: Installation fails.

Steps to Reproduce
------------------
Use a StarlingX ISO image to start installation on a system with NVMe controllers that support multipathing.

Expected Behavior
------------------
Installation should succeed.

Actual Behavior
----------------
Installation fails with the traceback mentioned in this bug's description.

Reproducibility
---------------
Reliably reproducible.

System Configuration
--------------------
Not relevant.

Branch/Pull Time/Commit
-----------------------
This issue appears to have existed since the v5.10 kernel has been introduced to StarlingX with commit 2cb3d041cd92 in StarlingX/kernel repository

Last Pass
---------
StarlingX versions with the legacy v3.10-based kernel do not have this issue.

Timestamp/Logs
--------------
Please see the bug description.

Test Activity
-------------
Installation by user.

Workaround
----------
It is possible to work around this issue by adding the "nvme_core.multipath=0" option to the kernel's command line in the GRUB/bootloader prompt before starting the installation.

Changed in starlingx:
assignee: nobody → M. Vefa Bicakci (vbicakci)
information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kernel (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/kernel/+/835916

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kernel (master)
Download full text (3.2 KiB)

Reviewed: https://review.opendev.org/c/starlingx/kernel/+/835916
Committed: https://opendev.org/starlingx/kernel/commit/6fe8d6083263e0fdbce0b6ba9e59ed10ebd932e6
Submitter: "Zuul (22348)"
Branch: master

commit 6fe8d6083263e0fdbce0b6ba9e59ed10ebd932e6
Author: M. Vefa Bicakci <email address hidden>
Date: Wed Mar 30 13:39:01 2022 -0400

    kernel: Disable NVMe multi-path kconfig option

    This commit disables the NVMe multi-path kernel configuration option,
    which was introduced with kernel v4.15-rc1, and is included in
    StarlingX's v5.10-based kernel.

    This feature exposes block devices such as "nvme0c0n1" in
    /sys/class/block with the "hidden" attribute set to 1; however, the
    older anaconda installer and user-space packages that StarlingX inherits
    from CentOS 7 were not programmed to ignore such hidden block devices.
    This results in anaconda reporting an error when probing the disks on
    the system and aborting the installation. Similarly, the "lsblk" program
    reports warnings about unrecognized block devices.

    Both of these issues were fixed upstream with the following commits:
    - https://github.com/storaged-project/blivet/commit/8957eb1f82a37542ea1ae3990b6068b3cf90a3be
    - https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?id=c8487d854ba5cf5bfcae78d8e5af5587e7622351

    Given the difficulty of auditing all of CentOS 7's user-space for
    potential incompatibilities with NVMe multi-path support, this commit
    disables this feature.

    The following options were considered to disable this option:
    - Unset CONFIG_NVME_MULTIPATH (implemented in this commit)
    - Use the kernel command line argument nvme_core.multipath=0
    - Use a modprobe.d configuration file with the following contents:
      "options nvme_core multipath=0".

    Of these options, the first was chosen to ensure non-error-prone
    consistency across all StarlingX configurations, even though the latter
    two options are more flexible.

    Verification
    - The issue was confirmed to exist on a standalone server with NVMe
      multi-path support, by attempting to install StarlingX using an ISO
      image built from StarlingX's master branch, in All-in-One simplex
      mode.
    - An ISO image was successfully built with this commit using a
      monolithic incremental build procedure.
    - The built ISO image was used to install StarlingX succcessfully onto
      the same standalone server in All-in-One simplex mode.
    - Basic disk operations were carried out by creating a partition and a
      file system on the NVMe drive, as well as using dd in a loop to write
      zero bytes to a file created in the same file system.

      (Note that the same test procedure was not repeated with a server that
      has unaffected NVMe controllers due to lab availability constraints.
      Inspection of the kernel's NVMe multi-path support code indicates that
      the kernel falls back to non-multi-path code paths when an NVMe
      controller does not support this feature.)

    Closes-Bug: 1967190
    Change-Id: I791cfb6d7bc141e2114dea5e7f8d648b6df81f14
    S...

Read more...

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.7.0 stx.distro.other stx.kernel
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.