plymouth loses output to /dev/console (such as ci-info: messages)

Bug #1235231 reported by Robie Basak on 2013-10-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Plymouth
Invalid
Medium
plymouth (Ubuntu)
High
Unassigned

Bug Description

During early boot, plymouth captures writes to /dev/console. As I understand it, it is supposed to re-play that content later. Whether or not it intends to is not the issue. In our cloud-images (and in other cases) users may write data to /dev/console that needs to be seen on the other side (ie a serial logger or 'get-console-ouptput' api call).

See the attached 'my-logger.conf' for an upstart job that simply writes data early in boot to /dev/console. Sometimes data "lost" is available in /var/log/boot.log, but sometimes it doesn't even seem to make it there. bug 682831 has an attachment 'myhack-init' that does a similar thing but starts outputting by replacing /sbin/init and then calling /sbin/init itself.

Related bugs:
 * bug 682831: lost console output early in boot
 * bug 1303815: cloud image boot produces unnecessary output

== Original bug report ==
On precise cloud images, I get a console output line like this:

ci-info: eth0 : 1 192.168.122.204 255.255.255.0 52:54:00:7f:d8:ca^M

On saucy cloud images, this is missing.

Robie Basak (racb) wrote :

Sample console output from precise.

Robie Basak (racb) wrote :

Sample console output from saucy.

Robie Basak (racb) wrote :

User data used on my sample precise output.

Robie Basak (racb) wrote :

User data used on my sample saucy output.

Robie Basak (racb) wrote :

From the saucy instance (failure case):

ubuntu@test-saucy:~$ cat /etc/cloud/build.info
build_name: server
serial: 20130925
ubuntu@test-saucy:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Saucy Salamander (development branch)
Release: 13.10
Codename: saucy
ubuntu@test-saucy:~$ dpkg-query -W cloud-init
cloud-init 0.7.3~bzr871-0ubuntu1

From the precise instance (success case):

$ cat /etc/cloud/build.info
build_name: server
serial: 20130909
ubuntu@test:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.3 LTS
Release: 12.04
Codename: precise
ubuntu@test:~$ dpkg-query -W cloud-init
cloud-init 0.6.3-0ubuntu1.6

Robie Basak (racb) wrote :

Steps to reproduce (guessing, untested but it should be something like this):

1. Start on Saucy with a KVM capable machine and kernel.
2. apt-get install uvtool
3. uvt-simplestreams-libvirt sync arch=amd64 release=saucy
4. uvt-kvm create -d test release=saucy
5. Observe /var/log/libvirt/qemu/test.log

Scott Moser (smoser) wrote :

Robie provided a bunch of useful info for debugging.

I'll attach alpha3.log, beta1.log and beta2.log.
alpha3 has 'ci-info' output
beta1 and beta2 do not.

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :
Changed in cloud-init (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed

I just booted a m1.small image on EC2 and 'euca-get-console-output' of each of alpha1, beta1 and beta2 and the ci-info is present in all of them.

summary: - ci-info: eth0 missing from latest saucy images
+ ci-info: eth0 not reliable in latest saucy images
Scott Moser (smoser) wrote :

Heres a simple reproduce that shows the issue.
it seems racy condition like.

rel="saucy"
dimg="$rel-server-cloudimg-amd64-disk1.img"
dbase_url="http://cloud-images.ubuntu.com/$rel/current/"

mylogger_url="https://launchpadlibrarian.net/152522734/my-logger.conf"

## get files and prereqs
sudo apt-get install cloud-image-utils

wget "${dbase_url}$dimg" -O $dimg.dist
wget "$mylogger_url" -O my-logger.conf

cat > user-data <<EOF
#cloud-config
password: passw0rd
chpasswd: { expire: False }
ssh_pwauth: True
EOF

# create a 'seed.img' for cloud-init local datasource
cloud-localds seed.img user-data

# create a 'disk1.img' backed by the downloaded disk
qemu-img create -f qcow2 -b $dimg.dist disk1.img

# patch it with 'my-logger.conf'
sudo mount-image-callback -v disk1.img -- \
   sh -c 'cp my-logger.conf $MOUNTPOINT/etc/init/my-logger.conf'

# boot in kvm with log to 'serial.log'
kvm -net nic -net user,hostfwd=tcp::2222-:22 \
   -drive file=disk1.img,if=virtio -drive file=seed.img,if=virtio \
   -serial file:serial.log -curses

# after the system boots, you can ssh to localhost on port 2222 and
# log in with 'ubuntu' and 'passw0rd'
# you'll see that /run/my.log has output in it.

Scott Moser (smoser) wrote :

I just put this on upstart. I assume its probably really plymouth related.
but between alpha3 beta1, upstart changed to 1.10 from 1.09.

plymouth did not change.

Note, that even on saucy, with the my-logger.conf job in place, some output is lost. (ie, you do not see all 'X=' values).

James Hunt (jamesodhunt) wrote :

Upstart certainly changed, but none of the changes affect console handling. Could the kernel change be the culprit maybe?

Scott Moser (smoser) wrote :

So, some more information..
this bug is "fixed" by doing:
  for x in /etc/init/plymouth*; do sudo mv $x $x.disabled; done

after doing so 100% of /dev/console output makes it to the console (whoohoo!).

that lays this squarely on plymouth.
There is some conversation with slangasek, smoser and utlemming at
http://irclogs.ubuntu.com/2013/10/04/%23ubuntu-devel.html#t18:07 .

Steve thinks its not entirely unreasonable to just disable plymouth via .override files in cloud-images. I think he would suggest that that be done at the package level and be handled by cloud-init. That wasn't clear to me though.

Given the above, this is clearly a plymouth bug, its supposed to replay output that it captured, but it is losing some.

affects: upstart (Ubuntu) → plymouth (Ubuntu)
Changed in plymouth (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Scott Moser (smoser) on 2013-10-07
description: updated
summary: - ci-info: eth0 not reliable in latest saucy images
+ plymouth loses output to /dev/console (such as ci-info: messages)
Scott Moser (smoser) on 2013-10-07
description: updated

On Mon, Oct 07, 2013 at 01:05:44PM -0000, Scott Moser wrote:
> Steve thinks its not entirely unreasonable to just disable plymouth via
> .override files in cloud-images. I think he would suggest that that be
> done at the package level and be handled by cloud-init. That wasn't
> clear to me though.

Yes, that is my recommendation.

From a follow-on discussion with Ben:

04-10-2013 12:03:12 <slangasek> so I think adding 'manual' .overrides for all of /etc/init/plymouth{,-log,-splash,-upstart-bridge}.conf would DTRT

Scott Moser (smoser) wrote :

Steve,

Do you think this can be fixed in plymouth correctly?

If not, and you still would recommend .override files which of the following solutions would you think is best (or another you suggest):

A.) 'disabler' package built by 'plymouth' source package (sugg name: plymouth-disabler)
B.) 'disabler' package built by 'cloud-init' source package (sugg name: cloud-init-plymouth-disabler)

Then, which mechanism for disabling would you think is the best solution:
1.) cloud-init installation of .override files via dpkg-divert in postinstall script
2.) cloud-init-plymouth-disabler package built by cloud-init

Personally, it seems to me that 'A' is better than 'B', but I completely defer to your judgement on 1 or 2.

I'd like to get this in as soon as possible in trusty.

Steve Langasek (vorlon) wrote :

I think the cloud-init *binary package* should add these .override files as conffiles. While the issue of messages going missing may affect systems other than cloud images, it's precisely the cloud images where a) plymouth adds no value, and b) the impact of messages going missing from the console is highest.

And dpkg-divert is not the right tool for this. The .override files should be created in the postinst, and removed in postrm (when called with $1 == remove || $1 == purge); but this should absolutely not involve dpkg-divert.

Scott Moser (smoser) wrote :

Steve,
  Why would you suggest cloud-init as a [source] package for this?
  The fact that plymouth is broken is completely unrelated to cloud-init or ubuntu use as a guest at all.

  Wouldn't it be better to put this package closer to plymouth, so it is more likely to be updated if ever needed?

Scott Moser (smoser) wrote :

I put a preliminary "plymouth-disabler" package at lp:~smoser/+junk/plymouth-disabler
It simply lays down .override files.

one bit of wierdness is that you then have to --purge to remove the changes that its installation caused.

Ie,if you install, and then remove without --purge the .override files are still present and plymouth is still disabled.

Scott Moser (smoser) wrote :

irc conversation response to comment 21.

11/21/13 14:24:54 <smoser> so would you NAK a proposal to put 'plymouth-disabler' into 'plymouth' packaging ?
11/21/13 14:25:15 <smoser> i'd like to get that in quick and be done with it. and it really has nothing to do with cloud-init, but everythign to do with plymouth.
11/21/13 14:28:26 <slangasek> I think it's wrong to put it there, because it's about plymouth not being a proper fit for the cloud images, so why shouldn't it be in a package maintained by the cloud team
11/21/13 14:28:58 <slangasek> I see your point that it doesn't really fit cloud-init - maybe a separate source package that's seeded directly in the cloud images?
11/21/13 14:29:09 <smoser> its about fixing a bug in plymouth
11/21/13 14:29:15 <smoser> a dataloss bug
11
/21/13 14:29:35 <smoser> i'm fine to make a separate sourc epakcage
11/21/13 14:29:38 <smoser> and seed it into cloud images
11/21/13 14:30:00 <smoser> but i think that means maintenance that isn't obvious should a plymouth upstart job be added.
11/21/13 14:30:05 <smoser> thats why i wanted it in plymouth.
11/21/13 14:30:27 <slangasek> right, but there shouldn't be any more plymouth jobs added
11/21/13 14:30:36 <slangasek> so the risk there is minmial
11/21/13 14:30:38 <slangasek> minimal too
11/21/13 14:31:46 <smoser> alright then. new source package :-(.

During early boot, plymouth captures writes to /dev/console. As I understand it, it is supposed to re-play that content later. Whether or not it intends to is not the issue. In Ubuntu cloud-images (and in other cases) users may write data to /dev/console that needs to be seen on the other side (ie a serial logger or 'get-console-ouptput' EC2 api call).

This bug is easily reproducible on a local kvm, a cloud such as EC2 or a local lxc container. In my opinion, it should be considered "data loss". The issue is that a user cannot reliably write data to /dev/console.

See the attached 'my-logger.conf' for an upstart job that simply writes data early in boot to /dev/console. Sometimes data "lost" is available in /var/log/boot.log, but sometimes it doesn't even seem to make it there. bug 682831 (http://pad.lv/682831) has an attachment 'myhack-init' that does a similar thing but starts outputting by replacing /sbin/init and then calling /sbin/init itself.

I'd really appreciate it if someone was able to take a look at this. I'm willing to help reproduce it and can even give access to some cloud-instances that can reproduce if necessary.

Scott Moser (smoser) wrote :
Changed in plymouth:
importance: Unknown → Medium
status: Unknown → Confirmed

Created attachment 89759
/sbin/init wrapper that will demonstrate the issue

See attachment here, which will reproduce this bug.

Scott Moser (smoser) wrote :

Theres a plea for review/accept at http://irclogs.ubuntu.com/2013/11/25/%23ubuntu-devel.html#t14:18 that is met with 2 questions:
 a.) why isn't this in plymouth package (because slangasek requested it not be)
 b.) why isn't this in cloud-init package (because its not at all related to cloud-init)

Rbasak mentioned that disabling plymouth does have fallout, for example lightdm is 'start on started plymouth'. Other upstart jobs on my system that mention plymouth are:

$ grep -r plymouth /etc/init -l | grep -v "/plymouth"
/etc/init/console-font.conf
/etc/init/lightdm.conf
/etc/init/failsafe.conf
/etc/init/setvtrgb.conf
/etc/init/mountall.conf
/etc/init/friendly-recovery.conf

lightdm and failsafe are the ones that seem most worrysome there.

Does the version of plymouth you're using have add_consoles_from_file() in main.c ? If not, pulling in the commits that add that function and then subsquently fix that function may fix this issue.

We used to infer serial consoles from the kernel command line. These days we read them directly from /sys/class/tty/console/active

Ray,
 thanks for the response.
 The version of plymouth we have is '0.8.8-0ubuntu8'. I realize that doesn't mean much, but you can check it out explicitly with 'bzr branch lp:ubuntu/trusty/plymouth'

 It does have the 'add_consoles_from_file' method.

 I'll poke read some more there, try from trunk and see what i can't find out.

  thanks again for the response.

Dimitri John Ledkov (xnox) wrote :

On a regular trusty laptop with an SSD i did 6 boots (3 reboots and 3 cold boots), and compared the boot.logs.
My expectation is to have no less than 6 instances of each unique boot messages, that is not the case:
      1 * Starting Bridge socket events into upstart
      1 * Starting modem connection manager
      1 * Starting Restore Sound Card State
      1 * Starting System V initialisation compatibility
      2 * Starting bluetooth daemon
      2 * Starting configure network device
      2 * Starting system logging daemon
      3 * Starting save kernel messages
      3 * Stopping save kernel messages
      4 * Starting Bridge file events into upstart
      4 * Starting save udev log and update rules
      4 * Starting SystemD login management service
      4 * Stopping save udev log and update rules
      5 * Starting ACPI daemon
      5 * Starting enable remaining boot-time encrypted block devices
      5 * Starting Reload cups, upon starting avahi-daemon to make sure remote queues are populated
      5 * Starting Reload cups, upon starting avahi-daemon to make sure remote queues are populated
      5 * Stopping cold plug devices
      5 * Stopping log initial device creation

bootlogs attached.

I'm not sure if this is the same or different problem.

Dimitri John Ledkov (xnox) wrote :
Scott Moser (smoser) wrote :

plymouth-disabler has been accepted into the archive.
That could be used in the cloud images to reach the desired affect, but there is some fallout.

I'd like to investigate some more actually getting the bug fixed in plymouth and doing without plymouth-disabler. If that should occur, we can remove plymouth-disabler from archive.

Changed in plymouth-disabler (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Scott Moser (smoser) wrote :

Using a build from current lp:ubuntu/plymouth that has steve's fix for bug 1160079 inside it, I no longer am losing data written to the console!

I've tested this in a trusty kvm local image.
disk-patched.img had the new debs installed
disk-unpatched.img did not.
both images had https://bugs.freedesktop.org/attachment.cgi?id=89759 run on them (so the logging init-wrapper was in place).

$ cat go-test.txt
LOG="out.log"
rm -f "$LOG"
i=0
while i=$(($i+1)) && [ $i -le 10 ]; do
  echo "---- $(date) $i ----" >> "$LOG"
  for r in patched unpatched; do
    qemu-img create -f qcow2 -b disk1-$r.img disk1.img
    slog=$(printf "serial-%s-%02d.log" "$r" "$i")
    echo "---- $(date) $i $r ----" >> "$LOG"
    kvm -net nic -net user,hostfwd=tcp::2222-:22 \
       -drive file=disk1.img,if=virtio -drive file=seed.img,if=virtio \
       -serial "file:$slog" -vga none -nographic
  done
done

echo "--- $(date) FINISHED ---" >> "$LOG"

$ cat user-data
#cloud-config
password: passw0rd
chpasswd: { expire: False }
ssh_pwauth: True
power_state:
 mode: poweroff
 message: "Bye Bye"

$ for f in serial-*.log; do echo $f: $(grep "XX:" $f | wc -l); done
serial-patched-01.log: 103
serial-patched-02.log: 103
serial-patched-03.log: 103
serial-patched-04.log: 103
serial-patched-05.log: 103
serial-patched-06.log: 103
serial-patched-07.log: 103
serial-patched-08.log: 103
serial-patched-09.log: 103
serial-patched-10.log: 103
serial-unpatched-01.log: 80
serial-unpatched-02.log: 82
serial-unpatched-03.log: 81
serial-unpatched-04.log: 80
serial-unpatched-05.log: 79
serial-unpatched-06.log: 84
serial-unpatched-07.log: 80
serial-unpatched-08.log: 83
serial-unpatched-09.log: 82
serial-unpatched-10.log: 82

Scott Moser (smoser) wrote :

(oh yeah, above, the '103' was actually correct, as I modified the MAX in that script to be 100).

Hello Robie, or anyone else affected,

Accepted plymouth into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/plymouth/0.8.8-0ubuntu16 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package plymouth - 0.8.8-0ubuntu16

---------------
plymouth (0.8.8-0ubuntu16) trusty; urgency=medium

  [ Steve Langasek ]
  * Fix debian/patches/tty1-after-boot.patch: when we have multiple
    non-default console options on the kernel commandline,
    local_console_terminal will not be set, so we should avoid the resulting
    assert. LP: #1160079, LP: #1235231.

  [ Dimitri John Ledkov ] LP: #1292458
  * debian/patches/use-upstart-private-socket.patch: instead of using
    system dbus, use private upstart socket in the plymouth-upstart-bridge.
  * debian/plymouth.plymouth-upstart-bridge.upstart: start
    plymouth-upstart-bridge on startup, and thus gather a more complete
    boot.log.
  * debian/patches/dont-bail-on-dummy-terms.patch: don't bail
    plymouth-upstart-bridge if TERM is not set and/or running on a dummy
    terminal, as we can run on those. This makes console-log useful in
    lxc-containers and actually contain messages from upstart jobs
    starting and stopping.
  * debian/patches/no-print-empty-description.patch: do not store
    description when dbus returns empty string, and thus stop printing
    empty "Starting... done" messages. LP: #1185560.
  * debian/plymouth.plymouth-upstart-bridge.upstart: update stop on
    conditions to stop when plymouth-shutdown finishes.
  * Make ubuntu-logo theme support scale factor, provide one plymouth
    theme with scale 1 and another with scale 2, until scale factor can be
    dynamically detected.
  * Mark plymouth-upstart-bridge to respawn.
 -- Dimitri John Ledkov <email address hidden> Thu, 03 Apr 2014 16:37:29 +0100

Changed in plymouth (Ubuntu):
status: Confirmed → Fix Released
Scott Moser (smoser) wrote :

so with this fix being added, we now see some annoying adaditional output, I've filed bug 1303815 to track that.

no longer affects: cloud-init (Ubuntu)
description: updated
no longer affects: plymouth-disabler (Ubuntu)

Marking this as RESOLVED / INVALID.
The bug was actually in downstream ubuntu package.

Changed in plymouth:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.