FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root device not found

Bug #1809168 reported by Nivedita Singhvi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
High
Unassigned
Xenial
Triaged
High
Unassigned

Bug Description

[IMPACT]

Booting of the Xenial-based FIPS kernel packages
failed with disk not found errors on amd64.
This was also observed on standard Ubuntu
kernels prior to 4.11.0.

FIPS
------
1. linux-image-4.4.0-1002-fips <-- FAIL
2. linux-image-4.4.0-1006-fips <-- FAIL

UBUNTU
------------

1. Bionic kernels all WORK

2. Artful kernels:

   Ubuntu-4.11.0-1.6 <-- WORKS
   Ubuntu-4.10.0-26.30 <-- FAILS

3. Xenial kernels:

   Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
   Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS

   Ubuntu-lts-* <--- ALL FAIL
   Ubuntu-4.4.0-* <--- ALL FAIL

We have narrowed down the window to be:

4.11.0-1.6 (custom build) <--- WORKS
4.10.0-43.47~16.04.1 <-- FAILS

Also works:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/linux-image-4.11.0-041100-generic_4.11.0-041100.201705041534_amd64.deb

Symptoms
-------------
System cannot find the root disk and drops into
an initramfs shell:

mdadm script local-block "CREATE group disk not found"
"Gave up waiting for root device. Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay=...
   - Check root= ...
 - Missing modules (cat /proc/modules; ls/dev)
ALERT! UUID=... does not exist. Dropping to a shell!

...
(initramfs)_

There does not appear to be any workaround so far.
The disks are encrypted SSDs.

Attaching commit list between the last known
failing Artful kernel and earliest known
working kernel (adjacent tags) and other info.

Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :
affects: libgcrypt20 (Ubuntu) → linux (Ubuntu)
description: updated
Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1809168

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Xenial):
status: New → Triaged
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

We can perform a "Reverse" bisect to identify the commit that fixes this bug in 4.11.0-1.6. It is easiest to bisect the mainline kernel. Per the description, 4.11 final works. Can you test the following two kernels?

v4.10 Final: https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/
v4.11-rc1: https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc1/

Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

Thanks, Joe. I'll update this bug as soon as I get the results
from the reporter.

Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

The disk in question is a PERC_H740P_Adp.

Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

v4.10 Final <---- FAILS
v4.11-rc1 <---- WORKS

Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :
Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

One check to see if the above is the issue:

1. dpkg -l | grep crypt
2. dpkg -l | grep lvm

If lvm2 is not installed, for instance, it should be possible to
do the following to fix the problem:

1. # apt install lvm2
2. # update-initramfs -c -k all

Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

Or the missing crypto packages (I should have added).
It's not likely the second URL above.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I started a "Reverse" kernel bisect between v4.10 final and v4.11-rc1. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
caa59428971d5ad81d19512365c9ba580d83268c

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1809168

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

This issue is not any of the above mentioned bugs,
after some checking.

This is possibly enablement of new HW as BradF
suggested might be the case, and Dan Streetman
identified the driver/device in question.

Driver: megaraid_sas
PCI dev id: 0016
#define PCI_DEVICE_ID_LSI_HARPOON 0x0016

All of the following commits went in into v4.11-rc1.

The following is the commit that enables
the PCI dev id:

45f4f2eb3da3cbff02c3d77c784c81320c733056 scsi: megaraid_sas: Add new pci device Ids for SAS3.5 Generic Megaraid Controllers

Set of patches related to this (there might be others):
223e4b93e61f7538681632bfb19edd4f27a0c319 scsi: megaraid_sas: driver version upgrade
ede7c3ce82dc4001bbab33dddebab8c089f309e0 scsi: megaraid_sas: Implement the PD Map support for SAS3.5 Generic Megaraid Controllers
b71b49c209facf8fec3778142ae5e45bb6ca4afc scsi: megaraid_sas: ldio_outstanding variable is not decremented in completion path
3e5eadb1a881bea2e3fa41f5ae7cdbfa36222d37 scsi: megaraid_sas: Enable or Disable Fast path based on the PCI Threshold Bandwidth
9581ebebbe351d99579e8701e238c2771ccdae93 scsi: megaraid_sas: Add the Support for SAS3.5 Generic Megaraid Controllers Capabilities
d889344e4e59eb962894ab3b64042dc37a2d8b39 scsi: megaraid_sas: Dynamic Raid Map Changes for SAS3.5 Generic Megaraid Controllers
69c337c0f8d74d71e085efa8869be9fc51e5962b scsi: megaraid_sas: SAS3.5 Generic Megaraid Controllers Fast Path for RAID 1/10 Writes
fdd84e2514b0157219720cf8f3f55757938a39cd scsi: megaraid_sas: SAS3.5 Generic Megaraid Controllers Stream Detection and IO Coalescing
45d446038c7b93c40b2fe5ba0e95380f19e0493e scsi: megaraid_sas: EEDP Escape Mode Support for SAS3.5 Generic Megaraid Controllers
2493c67e518c772a573c3b1ad02e7ced5b53f6ca scsi: megaraid_sas: 128 MSIX Support
45f4f2eb3da3cbff02c3d77c784c81320c733056 scsi: megaraid_sas: Add new pci device Ids for SAS3.5 Generic Megaraid Controllers <---

I will try and confirm that it is the above case,
in which case we likely will not backport to
4.4/Xenial or FIPS, as it is new HW enablement.

Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

We now believe this to be the case. The above is an issue
of new HW enablement.

I believe this can be closed Will Not Fix.

For standard kernels, enablement available via the -hwe
kernel upgrade.

For FIPS kernels, new HW enablement is not expected to be
backported/patched in existing 4.4.0-baseed releases.

Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu Xenial):
assignee: Joseph Salisbury (jsalisbury) → nobody
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.