GDM3 autologin might be racy on GCP resulting in inconsistent state of the wayland setup of seat0

Bug #2062534 reported by Pirouette Cacahuète
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
gdm3 (Ubuntu)
Confirmed
Undecided
Unassigned
linux-gcp (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

This might not be a bug really about linux-gcp, but this is following the work on bug 2039732 and so far I could not reproduce that locally.

Setup is 22.04 uptodate on GCP n2-standard instances, without GPU attached thus relying on vkms. I have reproduced locally a similar setup but on a KVM host.

We rely on https://github.com/taskcluster/taskcluster/tree/main/workers/generic-worker#readme to run tasks on CI, and especially generic-worker will:
 - create a new task_XXX user
 - make it autologin in gdm3 config
 - generic-worker has code to probe for existence of the GNOME Wayland session before launching the task

We relied on wl-clipboard package installed for verifying the status of wayland

On top of that setup, here is the issue.

We issue a TC task with payload:
> export WAYLAND_DISPLAY=wayland-0
> export XDG_RUNTIME_DIR=/run/user/$(id -u)
> wl-paste -l -p

We expect that payload to report "No selection", but on GCP instances we mostly always end up with "This seat has no keyboard". There were also cases were the session would not be Wayland at all but rather X11. I think this suggests something around the availability of /dev/dri/card0, but forcing the gdm3 service to wait for its availability and adding extra waiting time after card0 is present would still not get us somewhere.

We enabled gdm3 as well as mutter debugging but never found anything that would be a good lead on why it was not yet ready.

At some point, the seat0 session of our user was shown as inactive and the active one was tied to gdm so we suspected this was the reason, but both forcing the session to be active and terminating the gdm session would still not unblock us.

We also suspected the desktop to be locking itself so we disabled locking with the following, but iit did not help much:
> cat > /etc/dconf/profile/user << EOF
> user-db:user
> system-db:local
> EOF
>
> mkdir /etc/dconf/db/local.d/
> # dconf user settings
> cat > /etc/dconf/db/local.d/00-tc-gnome-settings << EOF
> # /org/gnome/desktop/session/idle-delay
> [org/gnome/desktop/session]
> idle-delay=uint32 0
> # /org/gnome/desktop/lockdown/disable-lock-screen
> [org/gnome/desktop/lockdown]
> disable-lock-screen=true
> EOF
>
> sudo dconf update

In the end, the only viable and reliable (verified over hundreds of runs now) fix that lasted was to add a "/bin/sleep 30" all to the gdm3 startup:
> mkdir -p /etc/systemd/system/gdm.service.d/
> cat > /etc/systemd/system/gdm.service.d/gdm-wait.conf << EOF
> [Unit]
> Description=Extra 30s wait
> [Service]
> ExecStartPre=/bin/sleep 30
> EOF

Tags: jammy
Revision history for this message
Pirouette Cacahuète (lissyx) wrote :

Current kernel is: Linux gecko-t-t-linux-2204-wayland-exp-a2-mld-sg4dt5wcf-acfajsmq 6.5.0-1017-gcp #17~22.04.1-Ubuntu SMP Thu Mar 14 20:30:38 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in gdm3 (Ubuntu):
status: New → Confirmed
Changed in linux-gcp (Ubuntu):
status: New → Confirmed
tags: added: jammy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.