lightdm tries (and fails) to start too early?

Bug #969489 reported by Serge Hallyn on 2012-03-30
974
This bug affects 295 people
Affects Status Importance Assigned to Milestone
upstart
Undecided
Unassigned
lightdm (Ubuntu)
High
Unassigned

Bug Description

Sometimes lightdm comes up fine. Other times it appears to fail. x-0.log shows it tried to load 'nv' (rather than 'nvidia' which is what actually exists). When I then log into console and 'start lightdm', it comes up fine (on :1).

I've not had it come up ok today, so I don't know if, when it comes up fine, it starts on :0. No idea if the lightdm.conf needs a change to 'start on'clause, or if this is just a case of a bad shipped config file (for module 'nv').

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: lightdm 1.1.9-0ubuntu1
ProcVersionSignature: Ubuntu 3.2.0-20.33-generic 3.2.12
Uname: Linux 3.2.0-20-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 2.0-0ubuntu1
Architecture: amd64
Date: Fri Mar 30 13:40:27 2012
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Alpha amd64 (20120323)
ProcEnviron:
 TERM=xterm
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: lightdm
UpgradeStatus: No upgrade log present (probably fresh install)

Serge Hallyn (serge-hallyn) wrote :
Serge Hallyn (serge-hallyn) wrote :

In case I wasn't clear, here is the window that pops up about half the times that I boot.

When this comes up, I hop to a tty, log in, and do 'start lightdm'. Then I have the bad X on :0 (with log files showing it tried to modprobe 'nv'), and the good X which I just started on :1.

When I don't get this popup, then lightdm/X start just fine and are running on :0.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lightdm (Ubuntu):
status: New → Confirmed
Miklos Juhasz (mjuhasz) wrote :

I see this with Intel Sandy Bridge as well (i7). Either the low graphics warning or just blank screen. Manually starting lightdm always helps.

Miklos Juhasz (mjuhasz) wrote :

In the /etc/init/lightdm.conf file adding some delay before executing lightdm solves the problem for me:

    sleep 10
    exec lightdm

This way I can also see the Ubuntu boot screen with the progress bar which I never saw before. It was just the plain background color flashing a couple of times quickly (I have an SSD drive) and after that in the lucky cases the login screen showed up, other cases I had to start lightdm manually.

Miklos Juhasz (mjuhasz) wrote :
Miklos Juhasz (mjuhasz) wrote :

Quoting Miklos Juhasz (<email address hidden>):
> In the /etc/init/lightdm.conf file adding some delay before executing
> lightdm solves the problem for me:
>
> sleep 10

unacceptable on a 5 second boot :)

> exec lightdm
>
> This way I can also see the Ubuntu boot screen with the progress bar
> which I never saw before. It was just the plain background color
> flashing a couple of times quickly (I have an SSD drive) and after that

Right, I have an SSD drive as well. That's why I thought perhaps lightdm
needs another required event to ensure the nvidia kernel module has
finished loading.

Miklos Juhasz (mjuhasz) wrote :

I get this with fglrx and intel driver as well. It seems that there's some race condition in the upstart scripts and the graphic card is not ready yet.

My desktop machine became virtually unbootable with a fast SSD so 10s (actually 5s is enough) delay was the only way to make it work. I can press Ctrl+Alt+F1 and log in and start lightdm by hand but others family members can't... so they have to wait a few more seconds.

Bryce Harrington (bryce) wrote :

Dupe of bug #982889 for Intel.

For nvidia/fglrx, we've had numerous reports (the issue seems more widespread with binary drivers).

Comment #5 is just a workaround, but that's essentially what's needed - pause X startup until the kernel driver is ready. Question is how to best do that...

Clint Byrum (clint-fewbar) wrote :

Here is the start on:

start on ((filesystem
           and runlevel [!06]
           and started dbus
           and (drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
                or stopped udev-fallback-graphics))
          or runlevel PREVLEVEL=S)

I suspect drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1 or stopped udev-fallback-graphics is emitting too soon.

here is udev-fallback-graphics:

start on (startup and
   (graphics-device-added PRIMARY_DEVICE_FOR_DISPLAY=1
           or drm-device-added PRIMARY_DEVICE_FOR_DISPLAY=1
           or stopped udevtrigger or container))

That bare 'card0' on the drm-device-added event is quite puzzling. Is it possible that it is simply matching as 'drm-device-added' and a second, non primary graphics chip is causing the issue? That might explain it. The timeline for these events is pretty complex so it will likely take some deep thought.

Nothing in the upstart man pages suggests this syntax, event, bareword, variable, is valid or what it might mean. It seems to have been cargo culted to KDM as well, and come from gdm in lucid originally.

Can somebody who is affected try removing 'card0' and see if that fixes the problem?

I suspect this might be related to bug #936552 which also has to do with lightdm seemingly starting too soon.

Clint Byrum (clint-fewbar) wrote :

Hm the card0 thing doesn't seem to make a difference in my trackpad bug.

However, adding 'and stopped udevtrigger' *does* solve that problem. So can somebody who is affected try adding that ? as in, make /etc/init/lightdm.conf's start on:

start on ((filesystem
           and runlevel [!06]
           and started dbus
           and stopped udevtrigger
           and (drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
                or stopped udev-fallback-graphics))
          or runlevel PREVLEVEL=S)

My theory is that the udev drm-device-added event is not being seen because the kernel has already sent it by the time we get to this point, since the boot speed is so fast.

James Hunt (jamesodhunt) wrote :

For 12.10, we really should add a 'primary-graphics-card' event alias or abstract job that abstracts the udev complexities since that confusing condition ("drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1 or stopped udev-fallback-graphics") is also semi-duplicated in other jobs:

- plymouth-splash.conf
- udev-fallback-graphics.conf

I think the "definitive" condition that becomes true when the first graphics device becomes available is:

(graphics-device-added PRIMARY_DEVICE_FOR_DISPLAY=1
    or drm-device-added PRIMARY_DEVICE_FOR_DISPLAY=1
    or stopped udev-fallback-graphics)

I agree that 'card0' seems bogus - looks like it's trying to match DEVPATH but that will never match since there is no wildcard in the expression. It should be something like the following (note the asterisk after the equals!!):

    drm-device-added DEVPATH=*card0 PRIMARY_DEVICE_FOR_DISPLAY=1

Using named variables ('name=value' as opposed to positional ones where only a *value* is given) as I've done above is *much* safer in this scenario since to use positional variables, you need to know the exact content of the corresponding udev message that the upstart-udev-bridge converts into an Upstart event.

File /lib/udev/rules.d/78-graphics-card.rules is what is modifying the udev message to add PRIMARY_DEVICE_FOR_DISPLAY=1 so I'm wondering if we can simplify the logic to be simply:

    drm-device-added PRIMARY_DEVICE_FOR_DISPLAY=1

... since udev will tag both KMS and non-KMS devices correctly.

Miklos Juhasz (mjuhasz) wrote :

> However, adding 'and stopped udevtrigger' *does* solve that problem.

Based on 10 restarts I can confirm that. Either it does solve the problem or it just delays lightdm with the right amount of time.

Serge Hallyn (serge-hallyn) wrote :

So far, adding 'and stopped udevtrigger' seems to be working for me as well.

Serge Hallyn (serge-hallyn) wrote :

Any chance of getting that into precise's lightdm yet?

Sebastien Bacher (seb128) wrote :

uploads for precise are pretty frozen, that could be in the first SRU though, could somebody come with a merge request with the change?

Lincoln Smith (lincoln-smith) wrote :

I'm getting the same symptom - more often than not getting dropped to the low-graphics alert during startup - and the change to /etc/init/lightdm.conf listed above has *not* made a difference. This behaviour is consistent across several different machines. From googling around this bug seems fairly prevalent, and is graphics vendor/chipset agnostic which supports your reasoning above. However as noted the prooposed change hasn't helped me.

Looking through logs in /var/log/lightdm the only thing that stands out is in the greeter logs e.g. for a failed start:
No protocol specified
No protocol specified
...
(unity-greeter:2340): Gtk-WARNING **: cannot open display: :0
No protocol specified
...

I'm happy to try out other ideas for fixes as this is a show stopper for a desktop environemnt refresh under way at work.

James Hunt (jamesodhunt) wrote :

This looks like it might be a duplicate of bug 615549.

I've done some digging and although I'm not sure this will fully resolve the issue observed, I have found a bug in lightdm.conf which could cause the behaviour seen...

As Clint intimates in #11, the current 'start on' condition...

    drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1

... is *NOT* valid. What this condition is asking Upstart to do is match on an event whose first _positional_ environment variable *exactly matches* the string 'card0' and which also has an environment variable called 'PRIMARY_DEVICE_FOR_DISPLAY' with value '1'.

However, if you look at /var/log/udev:

$ awk 'BEGIN{RS="";ORS="\n\n"}; /UDEV *\[/ && /PRIMARY_DEVICE_FOR_DISPLAY=1/ { print; }' /var/log/udev
UDEV [34.559987] add /devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0 (drm)
ACTION=add
DEVNAME=/dev/dri/card0
DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0
DEVTYPE=drm_minor
MAJOR=226
MINOR=0
PRIMARY_DEVICE_FOR_DISPLAY=1
SEQNUM=2231
SUBSYSTEM=drm
TAGS=:udev-acl:
UDEV_LOG=3
USEC_INITIALIZED=34559916
$

... you'll see that there are no variables that exactly match 'card0'! The fix is as simple as adding an asterisk to perform a wild-card match:

  drm-device-added *card0 PRIMARY_DEVICE_FOR_DISPLAY=1

However, my preference would to make our intent explicit and say:

  drm-device-added DEVPATH=*card0 PRIMARY_DEVICE_FOR_DISPLAY=1

This would protect against a change to the way the upstart-udev-bridge handles udev environment variable ordering and is much easier to understand.

@serge - could you add the missing asterisk and report back?

James Hunt (jamesodhunt) wrote :

Addendum to #19:

Currently the 'card0' matching is being attempted on DEVPATH. This isn't obvious from the /var/log/udev extract shown, but the upstart-udev-bridge will in this scenario create the event with the following environment variable ordering:

1) DEVPATH
2) DEVNAME
3) SUBSYSTEM
4) ACTION
5) <<the rest>>

So if in doubt, prefer name=value pairs to positional variables.

James Hunt (jamesodhunt) wrote :

Please could those affected attach /var/log/udev?

Lincoln Smith (lincoln-smith) wrote :

Changing 'card0' to 'DEVPATH=*card0' seems to have fixed the problem for me, at least across a half dozen reboots.

Udev log attached.

meijer.o (meijer-o) wrote :

Same issue 01-06-12 fully updated system (gnome remix based on ubuntu)
Hardware HP ProBook Sandybridge I3
Intel 3000
Intel SSD
Boottime 8 sec.

sometimes (one out of ten) the xserver freezes at login

adding 'and stopped udevtrigger to lightdm.conf fixes the issue.

 lightdm.conf looks now like this:

start on ((filesystem
           and runlevel [!06]
           and started dbus
           and stopped udevtrigger #this line I added
           and (drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
                or stopped udev-fallback-graphics))
          or runlevel PREVLEVEL=S)

Best regards,

Otto Meijer

Lincoln Smith (lincoln-smith) wrote :

Just an update that I still get this very infrequently - maybe once a week across many reboots of several machines - post applying the 'DEVPATH=*card0' fix.

Same here. DEVPATH=*card0 and still having the issue.

James Doherty (j-doherty) wrote :

I'm using 12.10 and am seeing the same log messages as Lincoln reported in #18. I have used both Intel and Nvidia chipsets and the issue seems to occur in both.

A couple of things I have tried:

* Removing 'card0' from the PRIMARY_DEVICE_FOR_DISPLAY line
* Adding 'and stopped udevtrigger' as above
* Adding 'sleep 10' before lightdm is executed

So far only adding a sleep has helped, although it is not 100% effective.

I've had a look at /var/log/udev and there are no PRIMARY_DEVICE_FOR_DISPLAY lines, so am I right in thinking that once udev-fallback-graphics stops, lightdm will try to start? To me, it seems like something is still not ready at the time that LightDM runs.

Gavin Graham (gavingraham) wrote :

All,

There is a lot more heat in #1066410 but not as much progress as I've seen here. I've suggested that #1066410 is marked as a duplicate of this ticket and those people come here and mark it as affecting them also.

Gavin Graham (gavingraham) wrote :

I've tried adding "and stopped udevtrigger" and "sleep x" and both work more often than not but it is still unreliable.

powerarmour (powerarmour) wrote :

Same issue with me on Ubuntu 12.10 x64 with the following hardware :-

Intel Core i3-3225 (w/HD 4000)
8GB Corsair Vengeance DDR3-1866
ASRock H77M-ITX motherboard
Corsair F40 40GB SSD

On most cold boots now LightDM is failing to start, and will just display the standard display reconfigure message (which obviously doesn't actually do anything other than boot you to a terminal)

I have to start LightDM manually 'sudo lightdm' and once it's initialised everything is fine again, and I can even reboot the system and LightDM will usually start (occasionally it will still fail on a reboot).

1 comments hidden view all 134 comments
Gavin Graham (gavingraham) wrote :

Actually, manually running "sudo lightdm" does no always work for me either. I'm starting to wnder if this is fixable by modifying the init scripts or whether the problem is deeper in LightDM.

Steve Langasek (vorlon) wrote :

On Sun, Dec 02, 2012 at 06:28:09PM -0000, Gavin Graham wrote:
> Actually, manually running "sudo lightdm" does no always work for me
> either.

Then you probably need to file a separate bug report.

I haven't tried "and stopped udevtrigger" and "sleep x" yet; that's next.

Instead of 'sudo lightdm stop' and 'sudo lightdm start', I've had consistently good luck (so far) with 'sudo service lightdm --full-restart'.

RoyK (roysk) wrote :

I see this on a desktop (HP Compaq 8000 Elite (AU247AV)) with rotating rust, not SSD, so I doubt it's related to an SSDs speed

RoyK (roysk) wrote :

Just reinstalled my desktop (see comment above) with 12.10, and it came up as normal, X flashing etc. However, after updating packaes, X won't come up. Starting 'sudo lightdm' manually works. Again, this is with rotating rust, a Seagate ST3320418AS

roy

Gavin Graham (gavingraham) wrote :

In the /etc/init/lightdm.conf I've changed

exec lightdm

to

exec lightdm -d

as a way to catch some extra debugging information in the lightdm log file and lo, now it never fails.
Hmmm, I wonder if writing debug information as slowed things down just enough to avoid a race condition inside the lightdm binary.

Daniel (internalkernel) wrote :

This seems to happen everytime I boot my computer... I'm running a fresh install of 12.10 on an i7 core, primary SSD, Nvidia drivers from x-swat and Dual monitors with Twinview enabled.

What I've tried so far with no success:

- rename to *card0
  : seemed to have no effect as there is no card0 in my udev.log
  : also no PRIMARY_DISPLAY_DEVICE in there either, it's almost as if udev is not finding my graphics card properly
- added "and stopped udevtrigger"
- added a sleep 2

My last attempt was to add '-d' to exec lightdm, I was hoping to get more info on what was going on... instead it seems to work properly now.

On a side note, after much experimentation... I have a much higher chance of a correct boot when my external monitors are not attached. I've only seen it fail a handful of times whereas with external monitors attached it's nearly 100%.

Logging in and manually starting lightdm service doesn't help, it just fails immediately again.

Steve Langasek (vorlon) wrote :

> Nvidia drivers from x-swat and Dual monitors with Twinview enabled.

You are using the binary nvidia drivers, so this is not your bug. Yes, you will have neither card0 nor PRIMARY_DISPLAY_DEVICE, because the binary nvidia drivers don't provide the kernel drm driver interface. In your case, lightdm should *only* be starting after 'stopped udev-fallback-graphics', which in turn only happens after 'stopped udevtrigger'. You should probably file a new bug report for your system and attach /var/log/udev.log.

Gavin Graham (gavingraham) wrote :

@Daniel (internalkernel)

Just confirming so it isn't lost in the midst, you changed

'exec lightdm'
to
'exec lightdm -d'

and now you are getting Lightdm performing correctly each time?

Daniel (internalkernel) wrote :

That is correct... so far so good

exec lightdm -d

On Sun 16 Dec 2012 02:12:08 PM EST, Gavin Graham wrote:
> @Daniel (internalkernel)
>
> Just confirming so it isn't lost in the midst, you changed
>
> 'exec lightdm'
> to
> 'exec lightdm -d'
>
> and now you are getting Lightdm performing correctly each time?
>

dino99 (9d9) on 2013-07-21
tags: added: i386 saucy
Changed in lightdm (Ubuntu):
assignee: nobody → Dalmer (dalmer-vieira)
54 comments hidden view all 134 comments
Dean Montgomery (dmonty) wrote :

With our particular setup of school labs. 2-3 out of 30 stations would always fail - the display manager would not start.

On the kernel i turned removed: splash & quiet and noticed that the events proceeding starting lightdm were quite a bit shorter for the failed machines.

When I turned on more debugging features, all the machines boot-up fine with no errors! So my hack/solution:
* kernel boot options remove: splash quiet
* kernel boot options add: verbose init=/sbin/init --verbose
* /etc/default/rcS
     VERBOSE=yes

My guess is that dumping the verbose text out to the screen creates just enough delay to fix the race condition.

Stefan H. (stefan-h) wrote :

Same problem with my X230 since I installed a SSD drive. Previously ran the same device with a normal drive without these issues.

Lenovo X230
Ubuntu 12.10/13.04 64bit
Samsung SSD 840
i5-3320M
using the processors HD4000 graphics.

Definitely checks the boxes for a race issue. Often I can boot just fine on the first try. On other ocassions even multiple reboots don't get me to login. Switching to a console and starting lightdm always works.

As this issue has gone unfixed for so long and - at least judging from the information in this report - there is no progress made towards a solution it seems about time a workaround is developed and distributed. This issue is annoying for advanced users. Everyone else will think their computer is broken. Imho at this point even the most horrible band-aid is preferable to the current state of affairs. This issue now spans 12.04 LTS, 12.10, 13.04 and soon 13.10.

This happens sometimes on my machine:
Ubuntu 12.04 64bit
Intel SSD 335
Nvidia GTS 650 with NVIDIA drivers.
lightdm version 1.2.3-0ubuntu2.3

Tracked it to a race issue - if NVRM module is loaded before lightdm starts, all is OK. (see logs in Post Scriptum)

My current configuration in /etc/init/lightdm.conf:

start on ((filesystem
           and runlevel [!06]
           and started dbus
           and plymouth-ready)
          or runlevel PREVLEVEL=S)

Post Scriptum: Logs clearly showing that when lightdm is started before the NVRM module is loaded, and dies a horrible death:
Sep 13 14:20:07 Pendragon kernel: [ 5.944960] ADDRCONF(NETDEV_UP): wlan0: link is not ready
Sep 13 14:20:07 Pendragon kernel: [ 6.099801] init: lightdm main process (1127) terminated with status 1
Sep 13 14:20:07 Pendragon kernel: [ 6.509805] HDMI status: Codec=0 Pin=5 Presence_Detect=0 ELD_Valid=0
Sep 13 14:20:07 Pendragon kernel: [ 6.533772] HDMI status: Codec=1 Pin=5 Presence_Detect=0 ELD_Valid=0
Sep 13 14:20:07 Pendragon kernel: [ 6.557710] HDMI status: Codec=2 Pin=5 Presence_Detect=0 ELD_Valid=0
Sep 13 14:20:08 Pendragon kernel: [ 6.581717] HDMI status: Codec=3 Pin=5 Presence_Detect=0 ELD_Valid=0
Sep 13 14:20:08 Pendragon kernel: [ 6.581814] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.0/0000:01:00.1/sound/card1/input10
Sep 13 14:20:08 Pendragon kernel: [ 6.581890] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.0/0000:01:00.1/sound/card1/input11
Sep 13 14:20:08 Pendragon kernel: [ 6.581954] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.0/0000:01:00.1/sound/card1/input12
Sep 13 14:20:08 Pendragon kernel: [ 6.582180] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.0/0000:01:00.1/sound/card1/input13
Sep 13 14:20:08 Pendragon kernel: [ 6.583236] nvidia 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Sep 13 14:20:08 Pendragon kernel: [ 6.583243] nvidia 0000:01:00.0: setting latency timer to 64
Sep 13 14:20:08 Pendragon kernel: [ 6.583247] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=io+mem
Sep 13 14:20:08 Pendragon kernel: [ 6.583341] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 319.32 Wed Jun 19 15:51:20 PDT 2013
Sep 13 14:20:08 Pendragon kernel: [ 6.794290] init: udev-fallback-graphics main process (1380) terminated with status 1
Sep 13 14:20:08 Pendragon kernel: [ 6.951100] init: plymouth-stop pre-start process (1400) terminated with status 1
Sep 13 14:20:08 Pendragon kernel: [ 7.533643] cfg80211: Found new beacon on frequency: 2472 MHz (Ch 13) on phy0

Changed in lightdm (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → High

Adding some lines to /etc/init/lightdm.conf fixed the problem on my PC (Ubuntu 12.04):

start on ((filesystem
           and runlevel [!06]
           and started dbus
           and plymouth-ready
           and (graphics-device-added PRIMARY_DEVICE_FOR_DISPLAY=1 #added
           or drm-device-added PRIMARY_DEVICE_FOR_DISPLAY=1 #added
           or stopped udevtrigger or container)) #added
          or runlevel PREVLEVEL=S)

Giuseppe Fierro (gspe) wrote :

I'm experimenting this bug since 12.10 on all my systems, amd and nvidia.

Attached the lightdm.log from a filed startup.

I think it's time to fix this bug.

plum7500 (plum01) wrote :

I am running kernel 3.2.0-54 in Ubuntu 12.04. After the last update, lightdm does not start so no gui. I tried editing the config for lightdm by adding some delay and the respawn but it did not work. Can anybody here tell me what is wrong by looking at the log?

plum7500, in the log you just reported, the relevant lines are these:

[+0.03s] DEBUG: Registering seat with bus path /org/freedesktop/DisplayManager/Seat0
[+1.62s] DEBUG: Process 1470 exited with return value 127
[+1.62s] DEBUG: X server stopped

What means the X Window System Server wasn't unable to register a seat for the LightDM X Display Manager; due to a malfunction in this last one.

Changed in upstart:
status: New → Confirmed
plum7500 (plum01) wrote :

thanks for looking into this bug. I tried switching the display manager to gdm but pc kept rebooting. How would I restore old version of lightdm?

plum7500 (plum01) wrote :

I managed to downgrade lightdm to version 1.2.1-0buntu1-amd64 but I still do not get GUI when I boot up

Id2ndR (id2ndr) wrote :

I got better result using default lightdm greeter instead of ubuntu's one. Maybe this can help others.
sudo apt-get purge lightdm
sudo apt-get install ubuntu-desktop
sudo apt-get install lightdm-gtk-greeter
If lightdm-gtk-greeter is installed, you will not be able to start neither ubuntu-greeter, nor default gtk greeter. After the installation of the gtk-greeter, ligthdm will default to this one.

Id2ndR (id2ndr) wrote :

By the way I encountered this bug on an old laptop with ATI Mobility Radeon X1300 using radeon module on 13.10 (/ on btrfs fs, /home on ext4 fs ; using SSD drive).

Chris (cccsayer) wrote :

I might have some news for this bug:

First is, if i remove "quiet splash" from boot, the bug disappears as well.

More importantly though, if I have the blank login screen, and I connect an external monitor, the screen shows up.
If I go to a tty (Alt-Ctrl-F1) and back to X (Alt-Ctrl-F7), the screen is blank again.
Then, if I pull out the external monitor, the login screen shows up again.

Summed up
1. toggling external screen connection brings login screen up
2. switching to text and back to X kills it

It takes 1 to 2 seconds to bring the screen up after the cable is connected/disconnected.

I am on toshiba portege Z930-14L
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)

By the way, I noticed (very rarely) that when logged in, the "screen refresh" may "lag", that is when for example typing a command in the shell the last character of the command would show up only after pressing enter. I believe this was not particular to the shell, but since it happend only once or twice on this new laptop I cannot tell for sure.

Can I do anything to further debug this?

BR, Chris

Chris (cccsayer) wrote :

I should also mention that after the screen locks and blanks while unattended, it often comes back only on the external monitor while leaving the laptop monitor blank. Pulling the monitor cable brings it back on the main screen.

John Calla (jwcalla1) wrote :

I recommend that Canonical moves over to using GDM or another display manager if they can't get this fixed for LightDM.

If they're looking to make a good impression, resolving this critical issue should be at the absolute top of the list (and back-ported).

323232 (323232) wrote :

same problem with 13.10 on a samsung series 5 notebook with a ssd
set sleep to 5 (with lower seconds the bug still persist in a lower frequency...)

Sollmeeya (gniknivag) wrote :

Same problem, 13.04 fresh install. I have been trying to solve this bug for ages, the only workaround that has worked for me is to use GDM, not ideal, I have tried #39, since the set up is similar to mine, nvidia graphics and SSD, will report back with findings. At least we can work around it, but for some reason when I enable GDM a lot of programs like ubuntu one and the terminal from unity were removed.

Not sure if those two are somehow related.

Bryan (k1cd) wrote :

In 13.10 on an Aspire laptop with an HDMI attached 1920x1280 monitor the background on one or the other comes up blank depending upon how I set the monitor positions. When side by side with the laptop to the right, it is the laptop screen that comes up blank. Gkrellm is supposed to come up on the top right hand corner of the laptop screen but instead it is in the top left of the external monitor. -- Behavior not evident in previous versions.

I find I can fix the background situation by changing the background, Going to a terminal session and doing a restart lightdm also works. I'll try the init script suggestions provided here to see what happens.

323232 (323232) wrote :

Despite the sleep command (up to 12 at this moment) before exec lightdm in /etc/init/lightdm.conf
 still the ocasionally startup with a black screen and blincking cursir.

Also tried the lightdm-gtk-greeter and GDB. The same error remains!

Download full text (3.9 KiB)

I have a fresh Ubuntu 12.04.3 LTS installation and at first everything worked fine. 2 days and approximately 10-15 reboots after the initial installation I started experiencing this "blank screen, blinking cursor in upper left corner" problem with increasing density. After another 2-3 days and a numer of reboots I could only get this blank screen.

I followed the fix / workaround suggested in comment #95 and so far I have not experienced the problem once, i.e. no problems in rouhly a week and another 15-20 reboots.

For those working with the issue, below is a list of my HW.

H/W path Device Class Description
======================================================
                            system MS-7823 (To be filled by O.E.M.)
/0 bus Z87M-G43 (MS-7823)
/0/0 memory 64KiB BIOS
/0/3d processor Intel(R) Core(TM) i5-4430 CPU @ 3.00GHz
/0/3d/3e memory 1MiB L2 cache
/0/3d/3f memory 256KiB L1 cache
/0/3d/40 memory 6MiB L3 cache
/0/41 memory 8GiB System Memory
/0/41/0 memory 8GiB DIMM DDR3 Synchronous 1600 MHz (0.6 ns)
/0/41/1 memory DIMM [empty]
/0/41/2 memory DIMM [empty]
/0/41/3 memory DIMM [empty]
/0/100 bridge 4th Gen Core Processor DRAM Controller
/0/100/2 display Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller
/0/100/3 multimedia Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller
/0/100/14 bus 8 Series/C220 Series Chipset Family USB xHCI
/0/100/16 communication 8 Series/C220 Series Chipset Family MEI Controller #1
/0/100/1a bus 8 Series/C220 Series Chipset Family USB EHCI #2
/0/100/1b multimedia 8 Series/C220 Series Chipset High Definition Audio Controller
/0/100/1c bridge 8 Series/C220 Series Chipset Family PCI Express Root Port #1
/0/100/1c.4 bridge 8 Series/C220 Series Chipset Family PCI Express Root Port #5
/0/100/1c.4/0 eth0 network RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
/0/100/1d bus 8 Series/C220 Series Chipset Family USB EHCI #1
/0/100/1f bridge Z87 Express LPC Controller
/0/100/1f.2 storage 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]
/0/100/1f.3 bus 8 Series/C220 Series Chipset Family SMBus Controller
/0/1 scsi0 storage
/0/1/0.0.0 /dev/sda disk 240GB KINGSTON SV300S3
/0/1/0.0.0/1 /dev/sda1 volume 100MiB Windows NTFS volume
/0/1/0.0.0/2 /dev/sda2 volume 112GiB Windows NTFS volume
/0/1/0.0.0/3 /dev/sda3 volume 111GiB Extended partition
/0/1/0.0.0/3/5 /dev/sda5 volume 103GiB Linux filesy...

Read more...

323232 (323232) wrote :

Also tried the workaround #95;

<<quote>>
On the kernel i turned removed: splash & quiet and noticed that the events proceeding starting lightdm were quite a bit shorter for the failed machines.

When I turned on more debugging features, all the machines boot-up fine with no errors! So my hack/solution:
* kernel boot options remove: splash quiet
* kernel boot options add: verbose init=/sbin/init --verbose
* /etc/default/rcS
     VERBOSE=yes
<<unquote>>
Seems to work. After more then one week no meer black screens-.........!
Thanx

323232 (323232) wrote :

Commented to early, The workaround #95 did not resolve the blackscreen issues (although the frequency seems to be lower)

Now something has changed since as I took more disk to use. As the system is a dual-boot, I mounted the Windows system disk (SSD) as read-only and another disk as shared read-write for both systems to use.

What happens now, and actually started happening a couple of days after the changes described above in #113, booting stopped with an error message on the screen about mountall failing. After pressing Ctrl-ALt-Del the boot continues and I get an almost normally behaving system. I have to mount the two abovenemtioned filesystems by hand, but luckily my /home -directory from a non-SSD disk has been mounted succesfully and I can log in.

At this point, I do not have the faintest idea about what is going on.

323232 (323232) wrote :

Seems that the bug occurs more often after an update and reboot (with new profiling) although it also occurs less often without any updates..............

VladimirCZ (vlabla) wrote :

The same what 323232 wrote happened to me - had to replace lightdm with gdm.

Rocko (rockorequin) wrote :

This is still an issue in ubuntu 14.04.

D3m0n1q_733rz (cryo-rebirth) wrote :

We need to apply a fix in the startup which performs a while statement to check if dependencies are not yet loaded. This will mitigate against problems with varying sleep times which may or may not work for users.
Alternatively, having a list of dependencies to run inline may be a better option.

D3m0n1q_733rz (cryo-rebirth) wrote :

Seven months have passed with no fix in place. I have removed the assigned user so that another may step-up in their place to resolve the issue.

Changed in lightdm (Ubuntu):
assignee: Dalmer Vieira (dalmer-vieira) → nobody
walec51 (me-adamwalczak) wrote :

Yep. It's a bug with the ligthgm setup. Replacing it with gdm solves the problem

Clint Byrum (clint-fewbar) wrote :

Excerpts from walec51's message of 2014-03-01 23:53:04 UTC:
> Yep. It's a bug with the ligthgm setup. Replacing it with gdm solves the
> problem
>

Thanks for the data, that should help a bit.

However, all that proves is that gdm does not have the symptom. It does
not indicate that "the problem" is lightdm's setup. It may just be that
gdm starts up a few seconds slower and thus loses the race where lightdm
wins it.

D3m0n1q_733rz (cryo-rebirth) wrote :

I concur!
Though, I think the problem is more-so that there is a race in the first place instead of a march which is what I'm hoping we can fix.

Something has happened again. The scenario described in #116 has disappeared, that is now - for some reason, which I do not know - I am not experiencing any of the problems described in #116. Goes without saying, I haven't changed anything, just used the computer.

D3m0n1q_733rz (cryo-rebirth) wrote :

Relax, the bug is only sporadic and the slowdown on boot may be overcome with inode updates and defrags which may or may not occur automatically with the BTRFS file system. While it may happen with one system, it may not with another. While it may happen at one time with a system, it may not at another. It's based on how quickly boot occurs and whether or not lightdm loads before or after its dependencies which may or may not occur depending on how optimized the filesystem is as well as other factors.
Using the computer will add, remove, and optimize files in the filesystem as time goes on. This scenario of having lightdm load properly was expected and is part of the bug in question.
To simplify, lightdm is winning the race condition instead of losing. However, much like Global Thermonuclear War, the only way to truly win every time is not to play. Thus, the bug still exists.

Currently I've been running without issues with the following configuration in /etc/init/lightdm.conf since September 2013.

[...]
start on ((filesystem
           and runlevel [!06]
           and started dbus
           and plymouth-ready
           and stopped udevtrigger) # <- one line fix
          or runlevel PREVLEVEL=S)
[...]

My configuration is:
Ubuntu 12.04 64bit
Intel SSD 335
Nvidia GTS 650 with NVIDIA drivers.
lightdm version 1.2.3-0ubuntu2.4

Additionally, dependencies on the following were not helping to solve the problem:
graphics-device-added PRIMARY_DEVICE_FOR_DISPLAY=1 - this event is not fired on my machine
graphics-device-added - this event is fired too early - possibly on vesafb load - ligthdm crashes and the fallback mode kicks in.
drm-device-added PRIMARY_DEVICE_FOR_DISPLAY=1 - is not fired on my machine

avius (avi142) wrote :

This bug affects me also. I have not tried Ciprian's one line fix. My logs are attached.

avius (avi142) wrote :

I have applied Ciprian's "fix" from #127 to no avail.

Here are my new logs, with the change applied. I didn't spot any difference.

Avius: When lightdm fails to start, do you get sometimes X started up in "safe mode" as per Comment 2 from Serge ?

Your problem seems different as your your logs show that lightdm can successfully start and connect to the X server, and something happens during your login. Relevant extract from your logs below:
[+3.06s] DEBUG: Seat: Display server ready, starting session authentication
[...]
[+5.80s] DEBUG: Session pid=1330: Started with service 'lightdm', username 'avius'
[+5.80s] DEBUG: Session pid=1032: Greeter start authentication for avius
[+5.80s] DEBUG: Session pid=1330: Sending SIGTERM
[+5.80s] DEBUG: Seat: Setting XDG_SEAT=seat0
[+5.80s] DEBUG: Session: Setting XDG_VTNR=7
[+5.80s] DEBUG: Session pid=1331: Started with service 'lightdm', username 'avius'
[+5.80s] DEBUG: Got signal 15 from process 991
[+5.80s] DEBUG: Caught Terminated signal, shutting down

Could your problem be related to authentication ? Is your system set to autologin?
Have you tried logging into the console and running 'start lightdm'?

As per comment #97, lightdm (which in turn starts X) can be started before the display kernel modules are fully loaded (creating a race). The problem is most apparent to people running Ubuntu from an SSD. On my PC, the "one line fix" delays the lightdm start sufficiently for the kernel display modules to be fully loaded. It's really working on my system:

ls -ltr /var/log/Xorg*
-rw-r--r-- 1 root root 69099 Sep 13 2013 /var/log/Xorg.failsafe.log.old <-penultimate start of X into failsafe mode before performing the workaround
-rw-r--r-- 1 root root 69514 Mar 8 14:17 /var/log/Xorg.failsafe.log <- last start of X into failsafe mode while getting this down to a one liner in March
-rw-r--r-- 1 root root 31975 Mar 28 19:11 /var/log/Xorg.0.log.old
-rw-r--r-- 1 root root 30869 Mar 30 10:58 /var/log/Xorg.0.log

There are other workarounds from other people in this thread, which seem to have the same effect - for example someone suggests in comment #114 to make the startup verbose, which seems to solve the problem for that person.

avius (avi142) wrote :

When X fails to start, I don't get the safe mode screen; I just have a blinking cursor in the top left corner. This occurs during random bootups, and without any interaction; the failure is before the username/password textbox is presented. I'm not sure what the "something happening during login" could be. The system is not set to autologin.

During those bootups that fail, I typically log in to a tty and run "sudo service lightm restart". If I run "sudo service lightdm start", then I get a message about how lightdm is already running.

Any ideas what my issue is? Are you sure that it is not the same race condition biting me?

avius (avi142) wrote :

I forgot to mention: sometimes it takes two or three "sudo service lightdm restart" attempts before the system proceeds normally. In other words, it will bootup, go to the blinking cursor, I'll restart lightdm, and it will go to the blinking cursor again. Sometimes if I'm fast, I can catch it starting to load the lightdm greeter screen for a split second before it crashes.

Yale Huang (yale-huang) wrote :

avius, the same behavior "sudo service lightdm restart" for my at an HP N54L. I'm not sure which parameter counts, the times I tried to restart or the duration since booting.

Yale Huang (yale-huang) wrote :

It's me, again.

The N54L works now, but I don't remember the changes.

But today I failed to add SSD based EnhanceIO to several iSCSI based Ubuntu desktops (Intel G530 and G1610). They hung up with a blank screen and a mouse cursor in the middle. The kernel hung up, I cannot ping it. But when I booted with text mode and then "sudo service lightdm start", it worked for the most times and failed for several times. It seems that lightdm or X failed if the machine STARTED TOO SOON. I tried all the walkarounds above but no one works.

Displaying first 40 and last 40 comments. View all 134 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers