systemd 215 hangs during boot

Bug #1385630 reported by Harry
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

I just installed (for testing) the newest systemd from Vivid proposed repositories.
Yes, that is only a proposed package.
The version is 215-5ubuntu1. Also the new plymouth was needed to do this (version 0.9.0-0ubuntu8).

After the installation I found that every now and then booting or rebooting fails.
It fails in a system-fsck line. So, nowhere to go from that, no tty's, no nothing.
After each failure the next booting is successful, however.

My setup is extremely fast, I am able to get to the desktop in about 5 seconds from grub menu.
The setup contains Intel Core i7 4790 processor and Samsung 850 Pro SSD.

This may also be some sort of race situation.

Revision history for this message
Harry (harry33) wrote :

Further info:
I use solely systemd booting.
So, I have systemd-sysv installed and upstart, systemd-shim, cgmanager, libcgmanager0, removed.
I also use the kernel line: init=/lib/systemd/systemd
If I remove that, I get a kernel panic.

Harry (harry33)
description: updated
Revision history for this message
Harry (harry33) wrote :

I downgraded systemd to the version 208 in official Vivid repos.
Now booting works perfectly and fast.

It may be worthwhile to tell that I have purged systemd-shim, cgmanager and libcgmanager0 from my setup.
So this really is a pure systemd booting setup now.

Revision history for this message
Martin Pitt (pitti) wrote :

It works quite fine here (I've run it for some two weeks now with countless reboots), so I need more information about this. Can you please do a screenshot/photo when it hangs? If there was an error message, I'd like to see it. If there is none, for how long did you wait? I. e. could it have been an actual fsck which just takes a while?

It might also be helpful if you drop the "quiet" and "splash" boot options. You can do this in the grub menu (press shift during boot), press "e" on the Ubuntu boot line, remove those two options, and boot with Ctrl+X.

tags: added: systemd-boot
removed: vervet
Changed in systemd (Ubuntu):
status: New → Incomplete
summary: - New systemd (215) does not boot very well
+ systemd 215 hangs in fsck
Revision history for this message
Harry (harry33) wrote : Re: systemd 215 hangs in fsck

Martin, do you have upstart, systemd-shim, cgmanager and libcdmanager0 purged and are you using kernel line init=/lib/systemd/systemd ?

I cannot boot at all if I remove the kernel booting line.

Then to your questions:

1) I could not get the booting hang when "quiet" was dropped from the kernel line.
The booting process slowed down significantly and a number of text lines went by and booting was successful a number of times.

2) with quiet (but splash dropped) in the kernel line, I can reproduce the hanging very often.

3) this is all there is visible in the screen when booting hangs:
[1.490641] systemd[1]: job lvm2.service/start deleted to break ordering cycle starting with local-fs.target/start
[1.492015] systemd-fsck[235]:/dev/sda1: clean, 86887/6111232 files, 1044303/24414062 blocks

4) I did wait for several minutes when hang appears. Note that my set up is very fast and fsck will run in a few seconds.

Also, after downgrading to systemd 208 all is well again, with no hangs.

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 1385630] Re: systemd 215 hangs in fsck

Harry [2014-10-28 16:58 -0000]:
> Martin, do you have upstart, systemd-shim, cgmanager and libcdmanager0
> purged and are you using kernel line init=/lib/systemd/systemd ?
>
> I cannot boot at all if I remove the kernel booting line.

Well yes -- You have no package which provides /sbin/init then, so you
need to install either upstart or systemd-sysv.

> 2) with quiet (but splash dropped) in the kernel line, I can reproduce
> the hanging very often.
>
> 3) this is all there is visible in the screen when booting hangs:
> [1.490641] systemd[1]: job lvm2.service/start deleted to break ordering cycle starting with local-fs.target/start
> [1.492015] systemd-fsck[235]:/dev/sda1: clean, 86887/6111232 files, 1044303/24414062 blocks

Interesting, so mostl likely the fsck succeeded. So next thing: Can
you please add "systemd.debug-shell" to the kernel command line, and
when you get the hang press Alt+F9? There should be a root shell
there. Please run "systemctl" to see which unit is something other
than "loaded active", like "loading", or "failed", etc. For those it
would be very helpful if you could then do
"sudo systemctl -l status <servicename>", e. g.
"sudo systemctl -l status display-manager" and capture the output. You
can e. g. redirect the output to a file in /root or even /home/harry/,
if file systems are already mounted; if not, a photo camera shot will
do, too.

Please let me know if the above instructions are too hard for you (not
sure about how familiar you are with unix command line tools).

Thanks!

Revision history for this message
Harry (harry33) wrote : Re: systemd 215 hangs in fsck

Hi Martin,

Here is some info.

"Well yes -- You have no package which provides /sbin/init then, so you
need to install either upstart or systemd-sysv."

I have had systemd-sysv installed all the time.

"Interesting, so mostl likely the fsck succeeded."

Mounts: sda1 is /, sdb1 is /home. They are two separate SSD's

"Can you please add "systemd.debug-shell" to the kernel command line, and
when you get the hang press Alt+F9? There should be a root shell
there... "

I added systemd.debug-shell to the kernel booting line.
Then rebooted till the system froze, with the lines I previously wrote.
But, pressing Alt+F9 did nothing.
You see, when the hang strikes, the system does not response to anything.
This is something systemd 215 brought about, v. 208 works just fine.

Revision history for this message
Martin Pitt (pitti) wrote :

Oh, so you can't even switch between other consoles? Or is the debug console not working for you for some reason? I. e. when you press Alt+F2, F3, etc., do you see any change at all?

Does SysRq still work, e. g. SysRQ+s (sync), then SysRQ+u (unmount), then SysRQ+b (reboot). If not even that works, then the kernel itself crashed. Did you ever encounter this without "quiet" and "splash"? With more debug information I'm afraid I have nothing in my hands to see where the problem is. My system also boots rather fast (quad-core i5, very fast (500 MB/s) SSD), but I've never seen such a hard lock-up.

summary: - systemd 215 hangs in fsck
+ systemd 215 hangs during boot
Revision history for this message
Harry (harry33) wrote :

Hi Martin,

True, I cannot switch between tty's. The whole system is unresponsive to any keyboard commands.

About SysRQ (reisub): I cannot try it, because in my keyboard (Logitech K740) there is no SysRQ button.
Usually it is the "alt-function" of the PrintScreen button, but not here.

About the kernels. I have seen this hanging behaviour with the Ubuntu kernels 3.16.0-23 and 3.16.0-24 (the latest).
But always with "quiet" in the kernel booting line.
If I remove "quiet" and turn to verbose mode, the booting will significantly slow down (because of the rolling text lines in the screen), but I haven't seen these hangs then.
In the log files, there is nothing, no errors about this.
Only one in x.log pointing that vesa module is not present. But I do not have vesa driver installed, instead I have modesetting and fbdev installed. They are loaded fine. And of course the Intel driver, the loading of which always succeeds.

Revision history for this message
Harry (harry33) wrote :

Martin,

In addition the previous post #8

I tried again booting without "quiet" in the kernel line: 10/10 boots were successful.
Then with "quiet": 3 out of 10 did hang.

Hanging (or crash) happens right after the fsck is done.
Mostly after sda1 fsck and before sdb1 fsck.
Sometimes both sda1 and sdb1 fsck lines are visible after the hang, indicating they both been done.

So, if booting always succeeds without "quiet" (slow boot) and hang only occurs with "quiet" (fast boot),
can that be an indication of an unsuccessful race condition, followed by a hang or crash?
And why systemd 208 was always successful, but 215 not?

Revision history for this message
Martin Pitt (pitti) wrote :

Yes, it's certainly a race condition of some sort. Unfortunately quite impossible to debug remotely as there are no available logs/kernel traces :-( SysRQ missing is also a bit unhelpful there (yes, it's usually Alt+PrintScr); does that work if boot works normally? If it just doesn't work with that hang, then the kernel is so thoroughly locked up that not even SysRQ helps. Maybe try with a different keyboard?

> And why systemd 208 was always successful, but 215 not?

Well, that's exactly the $10,000 question that this report is about :-) It has slightly different udev rules and does different things at bootup, but there's been a lot of changes between those versions.

Do you get the hang if you boot without "quiet" and also without "splash"? That's text-mode only and might be a bit faster again.

Revision history for this message
Harry (harry33) wrote :

SysRQ
My keyboard does not apparently have that button.
http://www.logitech.com/en-gb/product/illuminated-keyboard-k740
So, at least trying Alt+Print Screen R+E+I+S+U+B does nothing, even after a successful boot.

The hang
Yes, I now have only this in kernel line: quiet init=/lib/systemd/systemd
So, no splash there.
And yes, it boots fine without quiet, but much slower, because of the text being printed onto screen.

Additional questions
1) Why doesn't my setup boot at all, if I remove the kernel line init=/lib/systemd/systemd ?
2) Why doesn't my setup boot at all, if I use /bin/systemd in the kernel line instead?
   I do have that link file though. All I get is kernel panic.

Revision history for this message
Martin Pitt (pitti) wrote :

> 1) Why doesn't my setup boot at all, if I remove the kernel line init=/lib/systemd/systemd ?

That's a good question, if you have systemd-sysv installed it should work without it (that's how I boot my system).

$ ls -l /sbin/init
lrwxrwxrwx 1 root root 20 Nov 3 07:41 /sbin/init -> /lib/systemd/systemd

Does that look any different for you?

> 2) Why doesn't my setup boot at all, if I use /bin/systemd in the kernel line instead?

That should also work indeed, I do that all the time. This was fixed a while ago in utopic: https://launchpad.net/ubuntu/+source/initramfs-tools/0.103ubuntu8. What's your version of initramfs-tools? Also, did you perhaps somehow damage your initramfs? Can you please attach your /boot/grub/grub.cfg ?

Revision history for this message
Harry (harry33) wrote :

Martin,
Here are the answers.

1) $ ls -l /sbin/init
Command not found

So, a little different. What is missing here?

2) Initramfs-tools v. 0.103ubuntu8
That is the latest one.

3) /boot/grub/grub.cfg
It is now attached.

Revision history for this message
Harry (harry33) wrote :

My bad. A correction.

1) ls -l /sbin/init
lrwxrwxrwx 1 root root 20 marra 3 08:41 /sbin/init -> /lib/systemd/systemd

Revision history for this message
Martin Pitt (pitti) wrote :

ok, these look good. Can you please check the output of

  gzip -cd /boot/initrd.img-3.16.0-24-generic | cpio -t | grep readlink

it should be something like

   150957 blocks
   bin/readlink

If that's not there, then for some reason your initramfs is broken. Does it change anything if you do

  sudo update-initramfs -u

? What's the output of that?

(That shouldn't have anything to do with the boot race hang, though)

Revision history for this message
Harry (harry33) wrote :

Martin,

Here are the results.

1) "gzip -cd /boot/initrd.img-3.16.0-24-generic | cpio -t | grep readlink"
gzip: /boot/initrd.img-3.16.0-24-generic: not in gzip format
cpio: arkiston ennenaikainen loppu

(arkiston ennenaikainen loppu = the archiv ended prematurely)

2) after running "sudo update-initramfs -u"
update-initramfs: Generating /boot/initrd.img-3.16.0-24-generic

3) Then running again "gzip -cd /boot/initrd.img-3.16.0-24-generic | cpio -t | grep readlink"
gzip: /boot/initrd.img-3.16.0-24-generic: not in gzip format
cpio: arkiston ennenaikainen loppu

So, no change and it looks like a problem with initramfs.

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 1385630] Re: systemd 215 hangs during boot

Harry [2014-11-07 17:24 -0000]:
> gzip: /boot/initrd.img-3.16.0-24-generic: not in gzip format

Le huh? Can you please give the output of

  file /boot/initrd.img-3.16.0-24-generic

Thanks!

Martin
--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Revision history for this message
Harry (harry33) wrote :

Here:

file /boot/initrd.img-3.16.0-24-generic
/boot/initrd.img-3.16.0-24-generic: ASCII cpio archive (SVR4 with no CRC)

Revision history for this message
Martin Pitt (pitti) wrote :

Did you modify /etc/initramfs-tools/initramfs.conf in any way? Please attach it here, and also any file in /etc/initramfs-tools/conf.d/ (normally there should be none).

Revision history for this message
Harry (harry33) wrote :

Martin,

My setup is fairly new, from September 2014.
I installed Ubuntu-Gnome Trusty then and upraded it first to Utopic and now to Vivid dev.
I haven't mofified /etc/initramfs-tools/initramfs.conf in any way.
I attach this file here.
The folder /etc/initramfs-tools/conf.d/ is empty.

Revision history for this message
Harry (harry33) wrote :

Martin,
I also attach the file /etc/intramfs-tools/update-initramfs-conf here.

Revision history for this message
Martin Pitt (pitti) wrote :

OK, that looks fairly normal. And yet your initramfs is not at all matching the expected form :-( Perhaps you can upload /boot/initrd.img-3.16.0-24-generic somewhere where I can inspect it? (It's about 30 MB). If not, then at least the output of

  cpio -tv /boot/initrd.img-3.16.0-24-generic

might be insightful.

Revision history for this message
Harry (harry33) wrote :

Martin,

The command "cpio -tv /boot/initrd.img-3.16.0-24-generic" did nothing.
Terminal just started a neverending process (but not getting stuck).
I did wait for several minutes.
The size of /boot/intitrd.img-3.16.0-24-generic is 19.2 Mb.

Revision history for this message
Harry (harry33) wrote :

Here is the attachment /boot/initrd.img-*

Revision history for this message
Martin Pitt (pitti) wrote :

Argh, forgot a "<":

$ cpio -tv < initrd.img-3.16.0-24-generic
drwxr-xr-x 2 root root 0 Nov 10 17:05 kernel
drwxr-xr-x 2 root root 0 Nov 10 17:05 kernel/x86
drwxr-xr-x 2 root root 0 Nov 10 17:05 kernel/x86/microcode
-rw-r--r-- 1 root root 20480 Nov 10 17:05 kernel/x86/microcode/GenuineIntel.bin

So, whatever this initramfs is, it's certainly not something I've ever seen before. It seems totally broken to me :-/ Even cpio itself seems confused as the displayed contents certainly doesn't account for 19 MB.

Do you have any idea where this GenuineIntel.bin thingy comes from? Do you have any third-party packages installed or other tweaks?

Revision history for this message
Harry (harry33) wrote :

Martin,

Well the GenuineIntel.bin may come from intel-microcode package of the Vivid multiverse repository.
This is installed because of the Intel Core i7 Haswell processor.
My motherboard Asus Sabertooth has UEFI-BIOS and it is using UEFI Secure boot, which may have something to do with this too, not sure about it though.

Those are the setups regarding Intel, no third-party packages are installed.

Right, here is the output of "cpio -tv < /boot/initrd.img-3.16.0-24-generic"
drwxr-xr-x 2 root root 0 Nov 10 18:05 kernel
drwxr-xr-x 2 root root 0 Nov 10 18:05 kernel/x86
drwxr-xr-x 2 root root 0 Nov 10 18:05 kernel/x86/microcode
-rw-r--r-- 1 root root 20480 Nov 10 18:05 kernel/x86/microcode/GenuineIntel.bin
42 blocks

Revision history for this message
Martin Pitt (pitti) wrote :

That's the one thing I actually tried to install, but it didn't break the initramfs in such a way. So I still don't know at all what's wrong here :-(

Revision history for this message
Harry (harry33) wrote :

An update to this bug.
I haven't seen the hang described here since I upgraded GDM to the version 3.14.0-0ubuntu1.
It may or may not have anything to do with this.
However, this bug report may now be closed.

Changed in systemd (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.