booting without --no-log causes init and plymouth-upstart-bridge to spin at 100%

Bug #922754 reported by Stefan Bader on 2012-01-27
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
upstart
High
James Hunt
upstart (Ubuntu)
High
James Hunt

Bug Description

Release: Precise
Architecture: amd64

Likely broken since upstart_1.4-0ubuntu3

When booting after recent updates I see init and plymouth-upstart-bridge spinning at 100% CPU time, defunct processes do not get reaped and a console on ttyS1 which is supposed to start, does not come up.

Looking at strace of plymouth-upstart-bridge it seems to run a tight loop of
epoll_wait(3, {}, 64, 0) = 0 (/proc/1383/fd/3 -> anon_inode:[eventpoll])

and init does another tight loop of
read(24, 0x7ffdc2d1caa0, 8192) = -1 EAGAIN (Resource temporarily unavailable)
  (/proc/1/fd/24 -> /dev/ptmx)

Booting with --no-log on the grub commandline brings the system up without showing any problems.

Stefan Bader (smb) wrote :
Stefan Bader (smb) wrote :
Stefan Bader (smb) wrote :
Stefan Bader (smb) wrote :
Stefan Bader (smb) wrote :
Stefan Bader (smb) wrote :
Stefan Bader (smb) wrote :
Stefan Bader (smb) wrote :
Stefan Bader (smb) wrote :
summary: - booting without --no-log causes init and plymouth-upstard-bridge to spin
+ booting without --no-log causes init and plymouth-upstart-bridge to spin
at 100%
Steve Langasek (vorlon) on 2012-01-27
Changed in upstart (Ubuntu):
importance: Undecided → High
Steve Langasek (vorlon) wrote :

Stefan, is this on real hardware or in a VM? Can you attach the /proc/cmdline used?

Is plymouth itself still running at this point? even with upstart banging on the cpu like this, it shouldn't take too long for the boot sequence to finish and plymouth to stop, at which point the bridge should also go away... but it doesn't sound like that's what happened to you.

Changed in upstart (Ubuntu):
status: New → Incomplete
Jason Conti (jconti) wrote :

I am seeing similar activity when I attempt to reboot over ssh. The system boots fine and I don't see any high cpu activity from init otherwise. The system will not reboot until I exit the ssh session. This only appears to happen over ssh, it reboots fine otherwise.

Strace on pid 1 yields an endless stream of:
read(11, 0x7f7d8d703730, 8192) = -1 EAGAIN (Resource temporarily unavailable)

With /proc/1/fd/11:
lrwx------ 1 root root 64 Jan 27 15:53 /proc/1/fd/11 -> /dev/ptmx

Which seems to match above. This is on real hardware and here is /proc/cmdline:
BOOT_IMAGE=/boot/vmlinuz-3.2.0-11-generic root=UUID=16258cc4-9dec-421a-9f8e-c0e6632dda7d ro quiet splash vt.handoff=7

The disabled logging in 1.4-0ubuntu4 appears to fix this though, the system reboots immediately without having to exit the ssh session first.

James Hunt (jamesodhunt) wrote :

@Stefan: I've updated the upstart-testing PPA with upstart - 1.4-0ubuntu5:

    https://launchpad.net/~jamesodhunt/+archive/upstart-testing/+packages

This backs out a late change which I suspect might be the cause of this problem. Please could you try this version once it has built.

Hi James,

On Sat, Jan 28, 2012 at 10:34:25AM -0000, James Hunt wrote:
> @Stefan: I've updated the upstart-testing PPA with upstart -
> 1.4-0ubuntu5:

> https://launchpad.net/~jamesodhunt/+archive/upstart-
> testing/+packages

> This backs out a late change which I suspect might be the cause of this
> problem. Please could you try this version once it has built.

I've been able to reproduce the problem when booting -0ubuntu4 here with
--log, and I'm afraid the problem is also still reproducible with the
-0ubuntu5 package from your ppa. Will try to find a minimal reproducer for
this issue.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Steve Langasek (vorlon) wrote :

Reproducer attached. I can use this to reliably reproduce the busy loop on my laptop and in a VM. So hopefully this provides the clues we need!

In the process of refining this test case, I also repeatedly ran into a panic (!= EIO assertion failure in log.c), but I haven't been able to figure out again how to reproduce the error. Maybe there's another related bug lingering, or maybe it's a different manifestation of the same bug.

Changed in upstart (Ubuntu):
status: Incomplete → Triaged
Stefan Bader (smb) wrote :

Spoke with Steve this morning. Since he seems to have a minimal reproducer he said cmdline would be unimportant. Checked my specific setup and the only out of normal is the fact that console is set to be ttyS1 (which is a serial over lan). I am using LVM for data (/home) beside that anything else is in / which is on an extended partition (ok, I got a few more logical partitions that normal (sda5 to sda16)).

James Hunt (jamesodhunt) wrote :

@Stefan - thanks for the info.

@Steve - great work identifying a minimal test case! I too can now recreate the problem using upstart-busyloop.conf.

Changed in upstart (Ubuntu):
assignee: nobody → James Hunt (jamesodhunt)
James Hunt (jamesodhunt) wrote :

This problem is caused when upstart attempts to exec directly a command that does not exist

I've created an updated package (upstart version '1.4-0ubuntu5~jh ') in my 'ppa:jamesodhunt/upstart-job-logging' PPA [1] that fixes this issue.

Please could those affected by this problem try it out and provide feedback.

[1] - https://launchpad.net/~jamesodhunt/+archive/upstart-job-logging

James Hunt (jamesodhunt) on 2012-02-03
Changed in upstart (Ubuntu):
status: Triaged → In Progress
Changed in upstart:
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → James Hunt (jamesodhunt)
Stefan Bader (smb) wrote :

Test booted with 1.4-0ubuntu5~jh installed and the --no-log not present. Issue seems to be fixed. No processes in tight loop (init, upstart-udev-bridge) observed. Boot finished correctly and brought up serial console.

James Hunt (jamesodhunt) wrote :

Thanks for the feedback Stefan!

Steve Langasek (vorlon) wrote :

On Fri, Feb 03, 2012 at 05:24:55PM -0000, Stefan Bader wrote:
> Test booted with 1.4-0ubuntu5~jh installed and the --no-log not present.
> Issue seems to be fixed. No processes in tight loop (init, upstart-udev-
> bridge) observed. Boot finished correctly and brought up serial console.

Stefan, I believe this package still has logging support disabled by default
still. Could you please boot with the '--log' boot option, and confirm that
things are still ok for you?

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Stefan Bader (smb) wrote :

Looks ok with --log, too. And to be sure checked files under /var/log/upstart and saw newer (from that boot) timestamps.

Jason Conti (jconti) wrote :

Still seeing this with ssh using 1.4-0ubuntu5~jh with --log. Maybe I should file a separate bug?

I haven't been able to come up with a minimal test case, but it turns out you don't need to shutdown/reboot to see it, just have an ssh user logged in, run: sudo stop ssh; and then watch init spin with EAGAIN until the ssh user manually exits.

Steve Langasek (vorlon) wrote :

I can also confirm that booting 5~jh with --log works here, with no busy-loop in init.

Jason, it sounds like you have a somewhat different bug. Yes, I think you should file a separate bug report for it.

Steve Langasek (vorlon) wrote :

(And btw, I can reproduce the behavior you describe with ssh, so that should be sufficient to let us pin this down, thanks - but it ought to be tracked as a separate bug report nevertheless.)

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 1.4-0ubuntu5

---------------
upstart (1.4-0ubuntu5) precise; urgency=low

  * Merge of important upstream log fixes to handle scenario attempts
    are made to exec(3) directly non-existent commands. (LP: #922754)
 -- James Hunt <email address hidden> Fri, 03 Feb 2012 14:16:40 +0000

Changed in upstart (Ubuntu):
status: In Progress → Fix Released
James Hunt (jamesodhunt) wrote :

I see that Jason has now raised bug 926468 on the ssh issue.

James Hunt (jamesodhunt) on 2012-03-26
Changed in upstart:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers