'start on starting tty1' -> pre-start exec stop tty1' hangs

Bug #652221 reported by Tom Gall on 2010-09-30
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linaro
Fix Released
Undecided
Unassigned
upstart (Ubuntu)
Low
Unassigned

Bug Description

linaro-headless image from 9/30, hwpack for omap from 9/30 on Rev A2 beagle XM

System boots, serial cons works but tty cons does not.

Last message to tty is :
[ 12.075653] sd 1:0:0:0: [sdb] Attached SCSI removable disk

however serial cons has several more messages:

[ 12.075653] sd 1:0:0:0: [sdb] Attached SCSI removable disk
[ 12.102233] Registered led device: beagleboard::usr0
[ 12.102386] Registered led device: beagleboard::usr1
[ 12.134399] OMAPFB: pan_display(0)
[ 12.134429] OMAPFB: setcmap
[ 12.134429] OMAPFB: setcmap
[ 12.222412] Registered led device: beagleboard::pmu_stat
[ 12.337219] smsc95xx v1.0.4
[ 12.412200] OMAPFB: pan_display(0)
[ 12.412231] OMAPFB: setcmap
[ 12.412231] OMAPFB: setcmap
[ 12.565185] smsc95xx 1-2.1:1.0: usb0: register 'smsc95xx' at usb-ehci-omap.0-2.1, smsc95xx USB 2.0 Ethernet, a2:ba:f
[ 12.565368] usbcore: registered new interface driver smsc95xx
[ 12.602966] usbcore: registered new interface driver hiddev
[ 12.612457] input: Logitech USB-PS/2 Optical Mouse as /devices/platform/ehci-omap.0/usb1/1-2/1-2.5/1-2.5.1/1-2.5.1:2
[ 12.613159] generic-usb 0003:046D:C00E.0001: input,hidraw0: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] o0
[ 12.613739] usbcore: registered new interface driver usbhid
[ 12.613769] usbhid: USB HID core driver
[ 12.894165] input: Apple, Inc Apple Keyboard as /devices/platform/ehci-omap.0/usb1/1-2/1-2.5/1-2.5.2/1-2.5.2:1.0/in3
[ 12.933807] apple 0003:05AC:0220.0002: input,hidraw1: USB HID v1.11 Keyboard [Apple, Inc Apple Keyboard] on usb-ehc0
[ 12.967651] input: Apple, Inc Apple Keyboard as /devices/platform/ehci-omap.0/usb1/1-2/1-2.5/1-2.5.2/1-2.5.2:1.1/in4
[ 12.995483] apple 0003:05AC:0220.0003: input,hidraw2: USB HID v1.11 Device [Apple, Inc Apple Keyboard] on usb-ehci-1

Blinking cursor at the bottom of the screen however no input is accepted.

This also occurs on Rev C4 Beagle Board

summary: - Headless on beagle XM, does not boot into tty
+ Headless on beagle, beagle XM, does not boot into tty
Changed in linaro:
status: New → Confirmed
Michael Hudson-Doyle (mwhudson) wrote :

This happens too on my xM.

Setting a root password and running getty -8 38400 tty1 from the serial console allows me to log in on the tty (the keyboard works fine afaict). I get "-bash: no job control in this shell" when I log in like this though.

Michael Hudson-Doyle (mwhudson) wrote :

I think the no job control thing is probably related to how I ran getty. If I run it as "(getty -8 38400 tty1 &)" so it ends up under init in the process tree, it seems much happier.

Michael Hudson-Doyle (mwhudson) wrote :

Also there are getty's running on other consoles e.g. I can press C-A-F2 and log in.

I get the feeling I may not be understanding what is intended here...

On Fri, Oct 01, 2010 at 01:46:58AM -0000, Michael Hudson-Doyle wrote:
> Also there are getty's running on other consoles e.g. I can press C-A-F2
> and log in.

> I get the feeling I may not be understanding what is intended here...

There's supposed to be a custom upstart job, /etc/init/openvt.conf, provided
by the chessy-overlay-minimal package (which should be renamed to
linaro-overlay-minimal, but not done yet).
http://snapshots.linaro.org/10.11-daily/linaro-headless/20100930/0/images/tar/linaro-m-headless-tar-20100930-0.packages
doesn't list this package, but there are apparently some bugs in how the
dpkg list is being done; /etc/init/openvt.conf does appear in
<http://snapshots.linaro.org/10.11-daily/linaro-headless/20100930/0/images/tar/linaro-m-headless-tar-20100930-0.list>.

What does 'status openvt' show on this system?

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

It seems to be present, but possibly not starting for some reason? I don't speak upstart well enough to understand the following:

root@localhost:~# status openvt
openvt start/pre-start, process 495
root@localhost:~# dpkg -l chessy-overlay-minimal
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==============-==============-============================================
ii chessy-overlay 1011.2 Chessy Overlay for Linaro 10.11 headless/min
root@localhost:~# cat /etc/init/openvt.conf
#
# Author: Steve Langasek <email address hidden>
# License: see debian/copyright
#

start on starting tty1

respawn
pre-start exec stop tty1
exec /bin/openvt -e -c 1 -- /bin/auto-root-login

Michael Hudson-Doyle (mwhudson) wrote :

Hmm, again with the utter lack of upstart chops, I think this might be some kind of race between the openvt and tty1 jobs. Maybe the openvt job is being started and so 'stop tty1' is being called before the tty1 job fully starts up, or something. If I disable the tty1 job and run 'start openvt' it seems to work -- although I've been fooling around for a while, I may have made some other significant changes. I can't find the runes to make the openvt job start on startup though.

On Mon, Oct 04, 2010 at 03:39:14AM -0000, Michael Hudson-Doyle wrote:
> It seems to be present, but possibly not starting for some reason? I
> don't speak upstart well enough to understand the following:

> root@localhost:~# status openvt
> openvt start/pre-start, process 495

If this is reproducible, next time you hit this can you please see what the
PID listed as 'openvt start/pre-start' is doing? This suggests that we've
hung trying to run 'exec stop tty1', which should be a trivial operation.
Getting the 'ps' output, strace output, and a backtrace would help.

On Mon, Oct 04, 2010 at 04:13:32AM -0000, Michael Hudson-Doyle wrote:
> Hmm, again with the utter lack of upstart chops, I think this might be
> some kind of race between the openvt and tty1 jobs. Maybe the openvt
> job is being started and so 'stop tty1' is being called before the tty1
> job fully starts up, or something.

The intended behavior here is:

 - the start conditions for the tty1 job are met.
 - upstart sees that openvt is 'start on starting tty1' and processes it
   first.
 - the pre-start script runs and sets the tty1 job back to 'stopped' state
   with no further processing
 - the openvt exec line runs, and the openvt job is marked as
   'start/running'

Starting the openvt job manually does the same thing, except that 'exec stop
tty1' is a no-op if tty1 is already stopped.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

On Mon, 04 Oct 2010 06:43:24 -0000, Steve Langasek <email address hidden> wrote:
> On Mon, Oct 04, 2010 at 03:39:14AM -0000, Michael Hudson-Doyle wrote:
> > It seems to be present, but possibly not starting for some reason? I
> > don't speak upstart well enough to understand the following:
>
> > root@localhost:~# status openvt
> > openvt start/pre-start, process 495
>
> If this is reproducible, next time you hit this can you please see what the
> PID listed as 'openvt start/pre-start' is doing?

root@localhost:~# status openvt
openvt start/pre-start, process 479
root@localhost:~# ps 479
  PID TTY STAT TIME COMMAND
  479 ? Ss 0:00 stop tty1

> This suggests that we've
> hung trying to run 'exec stop tty1',

Yeah, looks like it.

> which should be a trivial operation. Getting the 'ps' output, strace
> output,

root@localhost:~# strace -p 479
Process 479 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>

(that doesn't look very illuminating!)

> and a backtrace would help.

Not really:

(gdb) bt
#0 0x4019c896 in ?? () from /lib/libc.so.6
#1 0x4021de70 in poll () from /lib/libc.so.6
#2 0x00000006 in ?? ()
#3 0x00000006 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

> On Mon, Oct 04, 2010 at 04:13:32AM -0000, Michael Hudson-Doyle wrote:
> > Hmm, again with the utter lack of upstart chops, I think this might be
> > some kind of race between the openvt and tty1 jobs. Maybe the openvt
> > job is being started and so 'stop tty1' is being called before the tty1
> > job fully starts up, or something.
>
> The intended behavior here is:
>
> - the start conditions for the tty1 job are met.
> - upstart sees that openvt is 'start on starting tty1' and processes it
> first.
> - the pre-start script runs and sets the tty1 job back to 'stopped' state
> with no further processing

This appears to be where we're failing.

> - the openvt exec line runs, and the openvt job is marked as
> 'start/running'
>
> Starting the openvt job manually does the same thing, except that 'exec stop
> tty1' is a no-op if tty1 is already stopped.

This seems a little strange:

root@localhost:~# status tty1
tty1 stop/starting

I'm attaching the syslog that resulted from adding --verbose to the bootargs.

Cheers,
mwh

tags: added: beaglexm linaro omap3
Torez Smith (lnxtorez) wrote :

fyi...a similar problem is seen on the OMAP3 - IGEPv2 board but the C-A-F2 does not temporarily resolve the problem.

If I change openvt.conf like so:

--- init/openvt.conf 2010-10-08 04:00:41.000000000 +0000
+++ /etc/init/openvt.conf 2010-10-08 04:08:27.000000000 +0000
@@ -3,8 +3,8 @@
 # License: see debian/copyright
 #

-start on starting tty1
-
-respawn
-pre-start exec stop tty1
+#start on starting tty1
+console output
+#respawn
+#pre-start exec stop tty1
 exec /bin/openvt -e -c 1 -- /bin/auto-root-login

so it doesn't try to start on boot, then boot, shut down tty1 by hand (which has started up fine, btw), then:

root@localhost:~# start openvt
openvt start/running, process 695
root@localhost:~# openvt: vt 1 is in use; command aborted
        use `openvt -f' to force.

If I add -f to the openvt options, then start openvt appears to work fine.

Changing in like this:

--- init/openvt.conf 2010-10-08 04:00:41.000000000 +0000
+++ /etc/init/openvt.conf 2010-10-08 04:01:58.000000000 +0000
@@ -3,8 +3,8 @@
 # License: see debian/copyright
 #

-start on starting tty1
-
-respawn
+start on started tty1
+console output
+#respawn
 pre-start exec stop tty1
-exec /bin/openvt -e -c 1 -- /bin/auto-root-login
+exec /bin/openvt -f -e -c 1 -- /bin/auto-root-login

gets it to start up as appears to be desired. But when I reboot, I see on the monitor, though not on the console, a message:

uart_close: bad serial port count; tty->count is 1, port->count is 2

So maybe there's some bug where vts don't get let go of as they should?

In any case, I don't think the respawn will work as intended, because the pre-start stop of tty1 won't work more than once.

Torez Smith (lnxtorez) wrote :

Hey Tom and Michael,
think we chatted on the irc a bit about this bug and wanted to state in here that the problem is also seen on igep board.

Executing "exec /bin/openvt -f -c 1 -- /bin/auto-root-login" certainly helps but think we need a longer term solution.

Steve Langasek (vorlon) on 2010-10-14
tags: removed: beaglexm
Steve Langasek (vorlon) wrote :

Opening a task on upstart for this bug; I have no explanation for why the 'stop tty1' is hanging, so that definitely looks like an upstart bug that should be looked at. If this is really so unreliable, we can certainly work around it in Linaro for release ('start on started tty1' is one option).

As for respawn: the 'stop tty1' should be guarded with a '|| true', but should not otherwise need to be changed.

so

  pre-start script
     stop tty1 || true
  end script

Dave Martin (dave-martin-arm) wrote :

> openvt: vt 1 is in use; command aborted
> use `openvt -f' to force.

The weird thing about this is that getty does no check on VT_GETSTATE, so it will happily let run multiple gettys on the same tty.

... os openvt looks a bit overzealous. tty1 only counts as "used" because it is assigned to the console inside the kernel ... which is probably a bad reason for refusing to run something on a vt. So this may a bizarre special case which happens if you try to use openvt with tty1 - nothing process appears to have tty1 open if you look with ls -l /proc/*/fd/ | grep 'tty\|console'

Re the upstart problems, it seems bizarre to start getty in tty1 and then kill it and run something else. I expect that upstart jobs of like the below weren't really part of the original thinking...

start on started x
exec stop x

Wouldn't it make more sense to have a linaro modification to the upstart package, which removes of modified the tty1 job?

Cheers
---Dave

Dave Martin (dave-martin-arm) wrote :

Interesting to note that kbd/src/deallocvt.c contains this:

} else if (num == 1) {
        fprintf(stderr,
                _("%s: VT 1 is the console and cannot be deallocated\n"),
                progname);
        exit(1);
}

This is decided in userspace without asking the kernel, suggesting that someone, somewhere believed tty1 really is a special case.

Dave Martin (dave-martin-arm) wrote :

(Also, please excuse the illiteracy of the previous two posts ...)

Paul Larson (pwlars) wrote :

Confirmed this still happens on RC candidate image from 20101101
# status tty1
tty1 stop/starting

# status openvt
openvt start/pre-start, process 475

Also, I noticed that if I try to manually run '/bin/openvt -e -c 1 -- /bin/auto-root-login', I get the following output on the tty1 console:
ioctl TIOCSCTTY: Operation not permitted

On Fri, Oct 22, 2010 at 01:21:01PM -0000, Dave Martin wrote:
> Re the upstart problems, it seems bizarre to start getty in tty1 and
> then kill it and run something else. I expect that upstart jobs of like
> the below weren't really part of the original thinking...

> start on started x
> exec stop x

They certainly are part of the original thinking; in fact it's a common use
case for an upstart job to stop *itself* in a pre-start script. We've just
run into a bug in a corner case, perhaps because no other packages are
stopping other jobs in pre-start scripts up to now.

> Wouldn't it make more sense to have a linaro modification to the upstart
> package, which removes of modified the tty1 job?

That means having to maintain a forked version of the upstart package; it
looks like this script works reliably if called 'start on started tty1'
instead of 'start on starting tty1' which is what's there now, so I think
that's what we should go with pending an upstart fix.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

didnt we have "on started tty1" initially anyway? ++ on changing this to "started" for release

On Mon, Nov 08, 2010 at 01:24:34PM -0000, Alexander Sack wrote:
> didnt we have "on started tty1" initially anyway? ++ on changing this to
> "started" for release

No, it was 'start on starting' initially, and this *did* work for me - I
don't know when it broke or why.

The reason to prefer 'start on starting' is to avoid letting the getty start
up and output any login prompt to the tty when we're going to turn around
right afterwards and spawn a shell. So if we figure out the upstart bug in
the future, I think we do want to restore this.

With the Nov 08 build #2, there continues to be no cons on the tty on my Rev C4 beagle. Serial cons continues to be fine.

Tom Gall (tom-gall) wrote :

On my beagle XM, there is a cons on the tty as well as the serial as you'd expect. BUT and I guess I should raise this as a different bug, the cursor is one line down from where it should be (and where test shows up when typed) further the cursor is not solid but stripped where solid white row of pixels, black row of pixels, solid white row ... etc

On Tue, Nov 09, 2010 at 03:49:23AM -0000, Tom Gall wrote:
> With the Nov 08 build #2, there continues to be no cons on the tty on my
> Rev C4 beagle.

Sounds like we might be chasing a couple of different bugs here, then.
Please show the output of 'status tty1' and 'status openvt' on this system.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

root@localhost:~# status openvt
openvt start/running, process 365

root@localhost:~# status tty1
tty1 stop/waiting
root@localhost:~# status tty2
tty2 start/running, process 325
root@localhost:~# status tty3
tty3 start/running, process 326
root@localhost:~# status tty4
tty4 start/running, process 318

On Wed, Nov 10, 2010 at 04:30:11AM -0000, Tom Gall wrote:
> root@localhost:~# status openvt
> openvt start/running, process 365

Well, that's what we want to see. Can you get a screenshot (or
transcription) of VT1 in this case?

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Can someone confirm whether this is still present on the natty images we are building now?

Please see:

http://snapshots.linaro.org/11.05-daily

Tom Gall (tom-gall) wrote :

Still fails on natty headless dated 12-01 on beagle Rev C4 using omap3 hwpack again dated 1201

Tom Gall (tom-gall) wrote :

Still fails on natty headless, 12-06 snapshot on Beagle Rev C4, using omap3 hwpack dated 12-06 as well.

Tom Gall (tom-gall) wrote :

status openvt
openvt start/running, process 408

root@localhost:~# status tty1
tty1 stop/waiting
root@localhost:~# status tty2
tty2 start/running, process 360
root@localhost:~# status tty3
tty3 start/running, process 362
root@localhost:~# status tty4
tty4 start/running, process 354

if I flip over to tty2 (ctrl-option-f2), I see the login prompt! Then if I flip BACK to tty1 (ctrl-option-F1), it's at the command line prompt!

after which

root@localhost:~# status tty1
tty1 stop/waiting

Steve Langasek (vorlon) wrote :

Tom,

What exactly do you see on tty1 at the end of boot?

Tom Gall (tom-gall) wrote :

Here's a picture : http://yfrog.com/fyfnnj

Tom Gall (tom-gall) wrote :

From the pictured state in #32 using the 12-16 headless if on a fresh boot, you Ctrl-Alt-F1 it does bring up tty1 on the beagle C4. You won't see tty1 (or any of the other ttys) until you do the ctrl-alt-Fx magic.

Steve Langasek (vorlon) wrote :

This is definitely a different bug than the one that was originally reported, then. I'm marking the Linaro task as resolved, so we don't confuse the issue given that the original bug indicated an upstart race condition that still hasn't been addressed. Tom, please open a new bug report against Linaro for the outstanding Beagle XM issue you're seeing.

Changed in linaro:
status: Confirmed → Fix Released
Jamie Bennett (jamiebennett) wrote :

Tested on Linaro 11.05 Alpha-2 headless with hwpack from 2011-02-01, console correctly appears on the screen with a login prompt. Garbled text on the serial console but thats another bug.

Steve Langasek (vorlon) on 2011-02-12
summary: - Headless on beagle, beagle XM, does not boot into tty
+ 'start on starting tty1' -> pre-start exec stop tty1' hangs
Changed in upstart (Ubuntu):
status: New → Confirmed
Steve Langasek (vorlon) on 2011-02-12
Changed in upstart (Ubuntu):
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments