No login screen when booting Cosmic due to plymouth racing with the rest of the boot process and not releasing DRM devices in time

Bug #1795637 reported by Will Cooke on 2018-10-02
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
gdm3 (Ubuntu)
Status tracked in Cosmic
Cosmic
High
Unassigned
plymouth (Ubuntu)
Status tracked in Cosmic
Cosmic
High
Unassigned

Bug Description

Booting a fresh install of Cosmic from the ISO generated 1 Oct 2018 doesnt present a log in screen. Switching to tty2 and then back to 1 makes the log in screen appear.

ProblemType: Bug
DistroRelease: Ubuntu 18.10
Package: gdm3 3.30.1-1ubuntu1
ProcVersionSignature: Ubuntu 4.18.0-8.9-generic 4.18.7
Uname: Linux 4.18.0-8-generic x86_64
ApportVersion: 2.20.10-0ubuntu11
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Tue Oct 2 13:23:42 2018
InstallationDate: Installed on 2018-10-02 (0 days ago)
InstallationMedia: Ubuntu 18.10 "Cosmic Cuttlefish" - Beta amd64 (20181001)
SourcePackage: gdm3
UpgradeStatus: No upgrade log present (probably fresh install)
mtime.conffile..etc.gdm3.custom.conf: 2018-10-02T13:15:45.550495

Will Cooke (willcooke) wrote :
Will Cooke (willcooke) wrote :
Will Cooke (willcooke) wrote :

Updating to cosmic proposed seems to have fixed it.
I will retest a few times and mark this as invalid if it works reliably.

Will Cooke (willcooke) wrote :

And it's come back again. New log attached.
I will log upstream as requested.

Will Cooke (willcooke) wrote :

Here's the log from when I switch back to tty2 and everything kicks in to life.

Will Cooke (willcooke) wrote :

And now it's working again....
Here's the log from the last boot where it worked ok.

Daniel van Vugt (vanvugt) wrote :

Sounds like bug 1794280 (?)

Daniel van Vugt (vanvugt) wrote :

Also sounds like closed bug 1786872 and maybe bug 1786883.

Will Cooke (willcooke) wrote :

Attached a full log on the upstream bug, and here too.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in gdm3 (Ubuntu):
status: New → Confirmed
Daniel van Vugt (vanvugt) wrote :

Confirmed on a fresh install last night. Oddly, this bug does not occur on the cosmic machines I've had installed for a long time already.

I still wonder if this needs to be coordinated with jibel and merged with bug 1794280?

Daniel van Vugt (vanvugt) wrote :

Nevermind. Let's wait and see how the upstream conversation goes first:

https://gitlab.gnome.org/GNOME/gdm/issues/428

Changed in gdm3 (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
tags: added: black-screen
tags: added: rls-cc-incoming
Will Cooke (willcooke) wrote :

A hacky workaround is to add:

ExecStartPre=/bin/sleep 5

to the [system] section of the gdm systemd unit.

Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1795637

tags: added: iso-testing
Will Cooke (willcooke) on 2018-10-09
tags: added: rls-cc-tracking
removed: rls-cc-incoming
Iain Lane (laney) wrote :

I think the dupe indicated in comment #12 is probably correct.

I didn't manage to make this happen when not using plymouth (removing "splash" from the kernel commandline). I also installed bionic-release's version of plymouth, to eliminate whether or not the patches applied there are bad. The bug still happens with that version, too.

I'll attach some logs. When the bug happens, we get EPERM when calling drmModeObjectSetProperty() and other things:

Oct 09 17:40:46 marshmallow gnome-shell[764]: Failed to apply DRM plane transform 0: Permission denied

and of course GDM can't start up.

It's some kind of race, since apparently (I didn't actually confirm this myself, but others have said so) switching VTs or restarting GDM manually makes it work. The first one isn't that surprising, since GDM is dynamically starting & stopping its greeter when the active VT changes. Also adding a sleep apparently works around the problem - maybe also not surprising if we're waiting for something to relinquish the DRM device.

Iain Lane (laney) wrote :
Iain Lane (laney) wrote :
Iain Lane (laney) wrote :
Daniel van Vugt (vanvugt) wrote :

> if we're waiting for something to relinquish the DRM device

AFAIK the only "something" would be Plymouth. So good thinking...

Iain Lane (laney) wrote :

I think it's like this (line numbers added)

bad:

line 423: [ply-boot-server.c:388] print_connection_process_identity:connection is from pid 677 (/bin/plymouth deactivate) with parent pid 668 (/usr/sbin/gdm3)
line 498: [./plugin.c:649] activate:taking master and scanning out

good:

line 402: [./plugin.c:649] activate:taking master and scanning out
line 573: [ply-boot-server.c:388] print_connection_process_identity:connection is from pid 711 (/bin/plymouth deactivate) with parent pid 665 (/usr/sbin/gdm3)
line 583: [./plugin.c:679] deactivate:dropping master

In the bad case, we don't see "dropping master" because boot_splash is NULL, because we were already deactivated.

I asked upstream about the attached patch, will see what they say and in the meantime it's on https://launchpad.net/~laney/+archive/ubuntu/plymouth if somebody would care to test.

Changed in plymouth (Ubuntu Cosmic):
status: New → Triaged
importance: Undecided → High
tags: added: patch
Iain Lane (laney) wrote :

I've uploaded a pkg to cosmic unapproved which contains https://gitlab.freedesktop.org/plymouth/plymouth/commit/85d843af843589ce8538a59e5cb665b8253e380d amongst a few other cherry picks that are worthwhile to have in cosmic.

--- details unrelated to this bug (boot race with plymouth & DRM) ---

I know this isn't the end of the story with regard to boot races (there's a "Could not find primary drm kms device" thing that Trevinho is looking into), but it gets us closer.

Possibly we should use a new bug for that other problem so this one doesn't get confusing with multiple fixes for a broad symptom.

(If so, please file it, rls nominate it, attach the journal logs that show the bug happening so it's clear it's about that problem and no other and assign to Trevinho)

summary: - No login screen when booting Cosmic
+ No login screen when booting Cosmic due to plymouth racing with the rest
+ of the boot process and not releasing DRM devices in time
Changed in gdm3 (Ubuntu Cosmic):
status: Triaged → Invalid
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package plymouth - 0.9.3-1ubuntu10

---------------
plymouth (0.9.3-1ubuntu10) cosmic; urgency=medium

  * Grab some commits from upstream:
    + 0009-renderer-support-reactivating-renderer-without-closi.patch,
      0010-main-Only-activate-renderers-if-the-splash-uses-pixe.patch:
      Fix (or at least improve the situation regarding!) toggling of renderers
      on and off, e.g. when pressing escape multiple times to switch between
      text and graphics.
    + 0011-drm-Remove-unnecessary-reset_scan_out_buffer_if_need.patch:
      See the patch for a detailed description, but it avoids some cases of
      renderers becoming active unnecessarily, which is related to...
    + 0013-device-manager-don-t-watch-for-udev-events-when-deac.patch:
      Don't process udev events after we've been deactivated. These can cause
      renderers to activate, which might make them claim DRM master and never
      release it, causing wayland / Xorg to fail. (LP: #1795637)
  * git_ensure_tty_closed_0a662723.patch:
    + Re-cherry-picked, as it had been rebased before. The filename is now a
      lie, since I took the commit instead of the merge, but kept as-is for
      a more sane diff.

 -- Iain Lane <email address hidden> Wed, 10 Oct 2018 20:40:30 +0100

Changed in plymouth (Ubuntu Cosmic):
status: Triaged → Fix Released
Christopher Patti (feoh) wrote :

Hi there! I'm looking for suggestions on further testing I can do to disambiguate the situation I'm seeing from this bug.

I reported https://bugs.launchpad.net/ubuntu/+source/gdm3/+bug/1796614 which was marked as a duplicate of this bug.

However, I now have this package installed (Output of apt policy below), and I am still seeing a blank screen with a blinking cursor at initial boot, but can flip to a different VT, login and run startx and everything functions normally.

Am I

A) Mis-understanding what I need to do in order to test this fix, or
B) missing some other subtlety that would lead to my expectations that this bug should go away to be invalid or
C) in fact observing a different bug altogether that isn't a duplicate of this one?

Any hints anyone can throw my way would be very much appreciated. Thanks in advance!

Will Cooke (willcooke) wrote :

Hi Christopher. Yes, we think this bug is fixed, but the "only getting a flashing cursor" problem is a different bug. We're still working on it right now. Most of the work is happening on the upstream bug https://gitlab.gnome.org/GNOME/gdm/issues/428#note_338918 and it's associated cnages https://gitlab.gnome.org/GNOME/gdm/merge_requests/37

Daniel van Vugt (vanvugt) wrote :

I thought this was the "only getting a flashing cursor" bug. Maybe we need to reopen this?

Daniel van Vugt (vanvugt) wrote :

If this bug remains closed (which is the prerogative of the original reporter) then please put future discussion in bug 1796614 instead.

Iain Lane (laney) wrote :

Almost anything which breaks the boot can have that symptom. I would suggest not having a meta-bug otherwise it'll be reopened and re-closed every time. Using that other one should be just fine.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers