Regression: quotas+lvm prevent yakkety from booting (boots once every ~4 attempts)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
quota (Ubuntu) |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
After an upgrade from xenial to yakkety, the system does not boot reliably. Only 1 boot over 3-4 seems to succeed.
Quite often, the system seems to hang during boot and then drops into an emergency prompt.
Pressing esc during the boot to show what is going on reveals that there are processes waiting for the disks to become ready.
This is likely to be related to the specific system configuration, with a raid array of two disks managed by dmraid (it is an intel fake raid) and lvm2 on top of it. Most stuff is on it in logical disks, but for the boot partition that is on a fast ssd, and the root of the filesystem that is on lvm2 with a physical volume spanning the rest of the ssd.
The very same configuration worked just fine on xenial.
To make the matter worse, it is impossible to boot in "recovery mode" to get a root prompt, because after you login as root something suddenly times out and the screen gets messed up (not a graphics card issue, but pieces of messages popping out here and there, the recovery mode menu suddenly reappearing, etc.
Please give this bug appropriately high priority, because it prevents servers from coming up after power failures.
ProblemType: Bug
DistroRelease: Ubuntu 16.10
Package: systemd 231-9git1
ProcVersionSign
Uname: Linux 4.8.0-22-generic x86_64
ApportVersion: 2.20.3-0ubuntu8
Architecture: amd64
Date: Wed Oct 19 21:34:16 2016
EcryptfsInUse: Yes
MachineType: Dell Inc. Precision WorkStation T5400
ProcKernelCmdLine: BOOT_IMAGE=
SourcePackage: systemd
UpgradeStatus: Upgraded to yakkety on 2016-10-18 (1 days ago)
dmi.bios.date: 04/30/2012
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A11
dmi.board.name: 0RW203
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 7
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.
dmi.product.name: Precision WorkStation T5400
dmi.sys.vendor: Dell Inc.
Changed in systemd: | |
status: | Unknown → New |
Changed in systemd: | |
status: | New → Fix Released |
no longer affects: | systemd |
Changed in quota (Ubuntu): | |
status: | New → Incomplete |
Issue is present also moving do mdadm (which is possible for my fake raid since it uses intel metadata).
When it happens, the boot hangs for 1 minute and a half on
A start job is running for dev-disk-by...
with all disks appearing in turn on the line. After that the machine drops to an emergency prompt.
Analyzing the systemd journal shows the corresponding jobs timing out.
Interestingly:
1) *all* the items mentioned in /etc/fstab get shown here. Namely, when the issue occurs, the error is shown even for the swap partintion (that is on a plain ssd partition...
2) It is not a problem with /etc/fstab but some race: every few attempts the system succeeds to boot
3) When you get to the emergency prompt all the disks on which systemd hangs appear to be working just fine
4) Removing from the fstab the entries corresponding to stuff on the slower fake raid, the boot starts being reliable (I think)
How can I debug this? How can I force systemd to wait before it tries to access the disks?
Should I open a bug on systemd?
Can someone provide pointers at some systemd documentation that might be releant to this issue? It seems such a complex device that I do not know where to start.