With kernel 4.13 btrfs scans for devices before all devices have been discovered

Bug #1752961 reported by Carl Reinke on 2018-03-02
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Artful
Medium
Unassigned
Bionic
Medium
Unassigned
Cosmic
Medium
Unassigned

Bug Description

See attached dmesg outputs for booting kernels 4.11.x (working) and 4.13.x (not working).

dmesg-4.11.0-14-good.txt shows the dmesg output when booting kernel 4.11.x.
btrfs scans for devices after all 4 (sda, sdb, sdc, sdd) of the devices have been discovered by the kernel. The btrfs RAID1 filesystem mounts, and everything is good.

dmesg-4.13.0-36-fail.txt shows the dmesg output when booting kernel 4.13.x.
btrfs scans for devices after only 2 (sda, sdb) of the devices have been discovered by the kernel. The btrfs RAID1 filesystem fails to mount ("failed to read the system array: -5"). The remaining 2 devices (sdc, sdd) are discovered by the kernel immediately afterward. At the end of the log, I run `btrfs device scan` and mount the filesystem manually.

Hardware:
  HP ProLiant MicroServer Gen8
  4x WDC WD20EFRX
---
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: aplay: device_list:270: no soundcards found...
ApportVersion: 2.20.9-0ubuntu7
Architecture: amd64
ArecordDevices: arecord: device_list:270: no soundcards found...
DistroRelease: Ubuntu 18.04
InstallationDate: Installed on 2015-10-15 (933 days ago)
InstallationMedia: Xubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422.1)
MachineType: HP ProLiant MicroServer Gen8
Package: linux (not installed)
ProcEnviron:
 TERM=linux
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-4.15.0-20-generic root=UUID=d976ab07-8377-46dd-ac6c-f5f7312a8305 ro rootflags=subvol=@ rootdelay=10
ProcVersionSignature: Ubuntu 4.15.0-20.21-generic 4.15.17
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
Tags: bionic apport-hook-error
Uname: Linux 4.15.0-20-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: Upgraded to bionic on 2018-05-05 (0 days ago)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 07/16/2015
dmi.bios.vendor: HP
dmi.bios.version: J06
dmi.chassis.type: 7
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrJ06:bd07/16/2015:svnHP:pnProLiantMicroServerGen8:pvr:cvnHP:ct7:cvr:
dmi.product.family: ProLiant
dmi.product.name: ProLiant MicroServer Gen8
dmi.sys.vendor: HP

Carl Reinke (carlreinke) wrote :
Carl Reinke (carlreinke) wrote :

This problem is still present as of 4.15.0-20.

Carl Reinke (carlreinke) on 2018-05-05
affects: linux-hwe-edge (Ubuntu) → linux (Ubuntu)

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1752961

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: artful

apport information

tags: added: apport-collected apport-hook-error bionic
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Carl Reinke (carlreinke) on 2018-05-06
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.17 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17-rc4

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: kernel-da-key
Carl Reinke (carlreinke) wrote :

This issue started with the version 4.13 series. The last version that did not have the issue is 4.11.0-14.

Mainline kernel version 4.17-rc4 also exhibits the issue.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a bisect to figure out what commit caused this regression. We need to identify the earliest kernel where the issue started happening as well as the last kernel that did not have this issue.

Can you test the following kernels and report back? We are looking for the first kernel version that exhibits this bug:

v4.12 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/
v4.13-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc1/
v4.13-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc4/
v4.13 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13/

You don't have to test every kernel, just up until the kernel that first has this bug.

Thanks in advance!

tags: added: performing-bisect
Carl Reinke (carlreinke) wrote :

v4.12 Final works. v4.13-rc1 does not.

Carl Reinke (carlreinke) wrote :

FWIW, v4.12.14 also works.

Changed in linux (Ubuntu Cosmic):
status: Confirmed → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Artful):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
Changed in linux (Ubuntu Artful):
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v4.12 final and v4.13-rc1. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
e5f76a2e0e84ca2a215ecbf6feae88780d055c56

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

e5f76a2e0e84ca2a215ecbf6feae88780d055c56 (4.12.0-041200.201805301435) does not work.

Carl Reinke (carlreinke) wrote :

I guess it's less ambiguous to say that e5f76a2 has the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
1849f800fba32cd5a0b647f824f11426b85310d8

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

1849f80 has the bug.

Joseph Salisbury (jsalisbury) wrote :

Sorry for the delay.

I built the next test kernel, up to the following commit:
cbcd4f08aa637b74f575268770da86a00fabde6d

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

cbcd4f0 has the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
1b044f1cfc65a7d90b209dfabd57e16d98b58c5b

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

1b044f1 does NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
cbf4b3867875206aa548a8c6d7c886f3299d619e

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

cbf4b38 does NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
4422d80ed7d4bdb2d6e9fb890c66c3d9250ba694

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

4422d80 does NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
24040a58379e2f2fa6aa9466911b758073b6bdfa

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

24040a5 does NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
44c891af576997763d1d4c790d50d10db9eff00f

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

44c891a does NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
c94dc34f771a25b8c3e0955147fdc4f5e3d79908

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

c94dc34 does NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
0cf9f5096da2200b52cee0e38139c99c4fc0151c

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

0cf9f50 does NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
409acdd0412e9343095d965a9228f6e6a83a416f

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

409acdd does NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
878c33a78811f90795f17333bc3a7c819a1589a7

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

878c33a does NOT have the bug.

This bug was nominated against a series that is no longer supported, ie artful. The bug task representing the artful nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Artful):
status: In Progress → Won't Fix
Carl Reinke (carlreinke) wrote :

Any new build to test?

This issue is still present in 4.15.0-30.

Joseph Salisbury (jsalisbury) wrote :

Sorry for the delay. I missed the email regarding you're test results of the last kernel.

I built the next test kernel, up to the following commit:

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Joseph Salisbury (jsalisbury) wrote :

The latest test kernel is for commit 362f6729cbb1d6bbab59e069f19441b0622ff7ec

Carl Reinke (carlreinke) wrote :

362f672 has the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
6836796de4019944f4ba4c99a360e8250fd2e735

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

6836796 does NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

Commit 362f6729cbb1d6bbab59e069f19441b0622ff7ec was reported as the first bad commit. However, this commit is a merge tag, so we'll need to bisect into that merge.

Before we start that process, this bug has been open a while and 4.18 final is now available. Could you test that kernel to see if this bug was already fixed upstream?

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18

Carl Reinke (carlreinke) wrote :

It looks like 4.18 does not have the issue.

Joseph Salisbury (jsalisbury) wrote :

That's good news. We now have to identify the fix in 4.18 and ensure it is applied to Ubuntu. Can you test the following two upstream stable kernels to see if the fix was already sent to stable:

4.15.18: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15.18/
4.17.17: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17.17/

Carl Reinke (carlreinke) wrote :

Both 4.15.18 and 4.17.17 have the bug.

Joseph Salisbury (jsalisbury) wrote :

Thanks for testing. We need to narrow down the specific version of 4.18 that has the fix. Can you test the following kernels:

v4.18-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc1/
v4.18-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc4/

Carl Reinke (carlreinke) wrote :

Both 4.18-rc1 and 4.18-rc4 do NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

I reviewed the btrfs commits in v4.18-rc1, but the are quite a few of them. We should perform a "Reverse" bisect to identify the correct on that fixes this bug.

I started a "Reverse" bisect between v4.17 final and v4.18-rc1.

I built the first test kernel, up to the following commit:
1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

1c8c5a9 has the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
d60dafdca4b463405e5586df923f05b10e9ac2f9

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

d60dafd does NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
fdea70d26a471e002f2afc3a48821323b699f1e6

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

fdea70d has the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
6a8b25abf1b79db6877645335c73ad6a5061d9b0

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

The packages appear to still be the build from 10/01.

Joseph Salisbury (jsalisbury) wrote :

I rebuilt the kernel, so they should be there now:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Carl Reinke (carlreinke) wrote :

6a8b25a has the bug.

I also tried 4.18.0-10.11 from cosmic, and it appears to still have the bug. :(

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
68cc38ff33f38424d0456f9a1ecfec4683226a7e

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Bogdan Micu (buggyro) wrote :

68cc38ff33f38424d0456f9a1ecfec4683226a7e has the bug.

Carl Reinke (carlreinke) wrote :

Agreed. 68cc38f has the bug.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
18f1837632783fec017fd932a812d383e3406af0

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1752961

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Carl Reinke (carlreinke) wrote :

18f1837 has the bug.

Carl Reinke (carlreinke) wrote :

It looks like I made a mistake on a previous build. I retested d60dafd, built on 20180911, and it actually does have the bug. Sorry!

(Being an apparent race condition, a buggy kernel actually does work sometimes.)

Joseph Salisbury (jsalisbury) wrote :

We may have to go back and retest for the bisect then.

A best next step would be to test the current mainline kernel and see if the bug is already fixed upstream. It is available from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20-rc2/

Carl Reinke (carlreinke) wrote :

Mainline 4.20-rc2 has the bug.

I went back and tested mainline 4.18-rc1 and mainline 4.18 a few more times, and they actually do have the bug. So it appears that the bug is not fixed upstream after all.

Shall we go back to bisecting for the change that introduced the bug?

Carl Reinke (carlreinke) wrote :

I retested the last 2 builds from the original bisection, and I'm pretty confident that the results were correct:
362f672 has the bug.
6836796 does NOT have the bug.

Joseph Salisbury (jsalisbury) wrote :

Thanks for retesting. Can you test v4.17 final? It is available from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17/

Carl Reinke (carlreinke) wrote :

v4.17 final has the bug.

Christian Weinberger (weini22) wrote :

I did my upgrade from Debian stable to stretch-backports and therefore from 4.9 to 4.18 today.
Wrong distribution, I know ;-)

Interestingly, I´m also facing the issue on a HP MicroServer Gen8.

Reason for posting is, that I was able to work around the issue by specifying "rootdelay=5" at the kernel command line. This might be helpful for others.
Of course, thats just a hotfix and no alternative to solve the real issue.

Exact kernel string (uname -a) is: Linux nas2 4.18.0-0.bpo.1-amd64 #1 SMP Debian 4.18.6-1~bpo9+1 (2018-09-13) x86_64 GNU/Linux

Bogdan Micu (buggyro) wrote :

Dear Christian,
Can you please detail a little how/where you added the rootdelay parameter.
I am facing the same problem with a 4 drive raid1 on a HP MicroServer Gen8, on update from Ubuntu server 16.04 to 18.04 (kernel 4.4 to kernel 4.15).
I tried to add the parameter to /etc/default/grub in both GRUB_CMDLINE_LINUX_DEFAULT or GRUB_CMDLINE_LINUX= with no success, and then update-grub, grub-install /dev/sda.
sda is 3tb hdd gpt
I tried to increase rootdelay up to 90 still no boot.
On boot i can confirm that the rootdelay is added to commandline if i try to edit the entry.
For now i am booting with old 4.4 kernel, but i dont't think this is ok in the long run.
Thanks in advance.

Christian Weinberger (weini22) wrote :

Hi Bogdan!
Sorry to hear, that it doesn´t work for you.
I fear that you did add the parameter in the same way then I did. So there might be some other difference.

This is my kernel cmdline (cat /proc/cmdline):
BOOT_IMAGE=/vmlinuz-4.18.0-0.bpo.3-amd64 root=UUID=bda00e2f-ad36-46eb-a81a-cceb7736ab65 ro rootflags=subvol=VOL_ROOT rootdelay=5 apparmor=1 security=apparmor mce=ignore_ce

The apparmor stuff shouldn´t make a difference, it is to my understanding not started at this early stage. I´m not sure about mce=ignore_ce. You might give this a try. I can´t remember the specific issue why I added it, but it was quite likely also related to HP gen8 MircoServer.

Hope this helps you for your further investigations.

Bogdan Micu (buggyro) wrote :

Dear Christian,
Thank you for your reply.
I took another route which worked.
First i booted with kernel 4.4 normally.
I edited /usr/share/initramfs-tools/scripts/local-premount/btrfs by adding sleep=5 before "/bin/btrfs device scan" line, then i ran the following commands:
update-initramfs -u
update-grub
grub-install /dev/sda
and then rebooted and it works :)

I also removed the rootdelay=5 which i inserted previously and it still works.

Christian Weinberger (weini22) wrote :

I just updated to Debian "4.19.0-0.bpo.1-amd64 #1 SMP Debian 4.19.12-1~bpo9+1 (2018-12-30) x86_64 GNU/Linux"

Even without the "rootdelay" boot option, the boot process proceeds now as it should again.

Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu Artful):
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu Bionic):
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu Cosmic):
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu Bionic):
status: In Progress → Confirmed
Changed in linux (Ubuntu Cosmic):
status: In Progress → Confirmed
Changed in linux (Ubuntu):
status: In Progress → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers