suddenly choked on multiseat config in a way that survives reboots and even purging it

Bug #2033569 reported by Christian Pernegger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lightdm (Ubuntu)
New
Undecided
Unassigned

Bug Description

I have a bog-standard loginctl multiseat setup, using lightdm because of #2033323. Except for the lack of session locking, it worked beautifully, across multiple reboots. Until it didn't.

Box woke from suspend, and to be fair was acting strangely even then (Steam suddenly tried to launch using the iGPU), so suspend may well be broken on this box, but if so, that's a separate issue. Anyway, I rebooted, no lightdm greeter to be seen, the screen on seat1 was black. I did not have easy access to seat0 at the time.

The journal has, looping:
systemd[1]: Failed to start Detect the available GPUs and deal with any system changes.
systemd[1]: lightdm.service: Start request repeated too quickly.
token systemd[1]: lightdm.service: Failed with result 'exit-code'.
token systemd[1]: Failed to start Light Display Manager.

The first line is from gpu-manager.service, whose log contains
Vendor/Device Id: 1002:164e
BusID "PCI:110@0:0:0"
Is boot vga? no
Error: can't access /sys/bus/pci/devices/0000:6e:00.0/driver
The device is not bound to any driver.
Vendor/Device Id: 1002:73ff
BusID "PCI:3@0:0:0"
Is boot vga? yes
Error : Failed to open /dev/dri
Error : Failed to open /dev/dri
Error : Failed to open /dev/dri
Error : Failed to open /dev/dri

x-0 log has
vesa: Ignoring device with a bound kernel driver
vesa: Ignoring device with a bound kernel driver
(EE)
Fatal server error:
(EE) no screens found(EE)

Scary.

Thing is, both the iGPU and the dGPU are claimed by amdgpu, their "driver" symlink is accessible just fine. Switch back to gdm via dpkg-reconfigure, it boots up fine again. It's just lightdm that's hosed.

I tried purging lightdm and lightdm-gtk-greeter, along with and /var/lib/lightdm, and reinstalling the packages, but no dice. What does work is starting lightdm.service manually over ssh: It takes about 1-4 tries for both gpu-manager and lightdm to successfully launch and bring up both greeters. Reboot, and it fails again.

Some kind of race condition due to too lax timing and/or dependencies in the lightdm service file? Something unrelated changed the order and or speed at which systemd executes the service files, i.e. it worked by accident before and now it doesn't?

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: lightdm 1.30.0-0ubuntu5
Uname: Linux 6.4.12-060412-generic x86_64
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: pass
Date: Thu Aug 31 03:16:56 2023
InstallationDate: Installed on 2023-08-25 (5 days ago)
InstallationMedia: Ubuntu 22.04.3 LTS "Jammy Jellyfish" - Release amd64 (20230807.2)
SourcePackage: lightdm
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Christian Pernegger (fallenguru) wrote :
Revision history for this message
Christian Pernegger (fallenguru) wrote :

gpu-manager.service is probably a red herring, or a separate bug. I (sometimes?) get "Error: can't access /sys/bus/pci/devices/0000:6e:00.0/driver \ The device is not bound to any driver." when booting with gdm as well; yet gpu-manager.service doesn't fail, and gdm comes up normally.
 I

Revision history for this message
Christian Pernegger (fallenguru) wrote :

Found a solution.

There were two separate issues:

FIRST ISSUE

Apparently systemd likes to start up lightdm early, so early that on a reasonably fast system the GPU driver won't be ready in time ...

(I thought this was what systemd's dependency handling was for, but never mind.)

This seems to be a long-standing issue; it's neither multiseat- nor GPU-driver-specific. And considering this is just a budget AMD box, albeit a new one, it's going to bite more and more people in future.

My hypothesis re. why the first few boots with lightdm worked is that I was still in the middle of setting up the box, i.e. a lot changed from one boot to the next. Either one of those changes triggered the timing issue, or "clean" boots are faster per se.

The workarounds detailed in the Arch Wiki (https://wiki.archlinux.org/title/LightDM#LightDM_does_not_appear_or_monitor_only_displays_TTY_output) both work, to wit:

1) Add

[LightDM]
logind-check-graphical=true

in /etc/lightdm/lightdm.conf

- OR -

2) Enable early KMS by adding your GPU driver module to /etc/initramfs-tools/modules and running update-initramfs -k all -u.

SECOND ISSUE:
At some point while trying to fix this I deleted /var/lib/lightdm as well, which is lightdm's home directory. Since purging lightdm does not remove the lightdm user, reinstalling it will not (re)create this directory. Unfortunately the symptoms are the same as above--lightdm goes into a restart loop, then gives up.

Now, the second issue probably isn't a bug, though the postinst script could at least print a warning if the home directory isn't present and writeable, but the first one is, IMHO.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.