Installing or removing apps through snap-store launches another gdm session

Bug #2020641 reported by Ghadi Rahme
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
gdm3 (Ubuntu)
Fix Released
High
Ghadi Rahme
Jammy
Fix Released
High
Ghadi Rahme
Kinetic
Won't Fix
Low
Unassigned
Lunar
Fix Released
Low
Ghadi Rahme
Mantic
Fix Released
High
Ghadi Rahme

Bug Description

gdm3 version: 42.0-1ubuntu7.22.04.2
Ubuntu version: Ubuntu 22.04.2 LTS

[Description]
Installing or removing snap packages through NICE DCV on AWS will cause the user to be kicked out of their session. The issue happens on machines running nvidia GPUs (passed through to a VM) with the GRID driver installed as well as the normal nvidia driver.

[Steps to reproduce]
Simply install or remove any snap through the snap store will trigger the issue, for example:
$ snap install skype

also running any of the following commands will also trigger the issue:
$ snap connect skype:opengl :opengl
$ snap disconnect skype:opengl :opengl
$ snap connect skype:camera :camera
$ snap disconnect skype:camera :camera

After further investigation I was able to pin down the issue to udev and could reproduce the issue by running the following command:
$ sudo udevadm trigger --parent-match=/sys/devices/pci0000:00/0000:00:1e.0/drm/card0

where "/sys/devices/pci0000:00/0000:00:1e.0/drm/card0" corresponds to the nvidia GPU of my instance.

A more generic way of triggering the issue would be running:
$ sudo udevadm trigger

[Solution]

I have investigated the issue and discovered that it lies within GDM3 in the "udev_is_settled" function (daemon/gdm-local-display-factory.c).
In the case where the udev is settled the line "g_clear_signal_handler (&factory->uevent_handler_id, factory->gudev_client);" at the end of the function is triggered however this is not the case when the function returns early and will lead to the user being logged out. In its current implementation there are three different return points before "g_clear_signal_handler" is executed where the udev devices would already have settled.

I have written a patch that fixes this issue by making sure the function "g_clear_signal_handler" is executed in all cases for which the udev is settled.

[possible regressions]

Excessive clearing of the signal handler might cause a case where it is not being re-initialized the next time it is needed.

[Base test plan]

This test plan is to make sure the issue listed in the description section is fixed.

1. Create a new nvidia accelerated VM instance on AWS with NICE DCV and the GRID driver (or normal driver) installed. Make sure the GPU is being fully passed through to the instance (do not slice the GPU).

2. Access the instance using the NICE DCV client and login.

3. Install any snap package (E.g $ snap install skype).

4. Verify there are no new sessions being launched with the user being gdm.
$ sudo loginctl
Also make sure that you don't get logged out of your own session.

5. Repeat step 4 while running the following command as well:
$ sudo udevadm trigger

6. If the current session stays active and no new sessions spawn while performing step 3 or 5, the test succeeded.

[Extra tests]

This test plan is a sanity check to make sure no regressions have been created.

1. Login to the session

2. Run a few applications (E.g: Nautilus, gedit, terminal, firefox, skype, libreoffice, mpv, vlc...) and use some on the built in features (E.g: the launcher, search...)

3. Switch sessions without terminating the current one.

4. Login as a separate user and repeat step 2

5. Go back to the first session and make sure everything behaves as expected

6. Switch to init 3:
$ init 3
and go back to init 5 afterwards:
$ init 5

7. If no issues were found, the test succeeded.

Revision history for this message
Ghadi Rahme (ghadi-rahme) wrote :
Changed in gdm3 (Ubuntu):
importance: Undecided → High
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "gdm3-logout-patch.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks for the patch. Please propose it upstream at:
https://gitlab.gnome.org/GNOME/gdm/-/merge_requests

Changed in gdm3 (Ubuntu):
status: New → In Progress
Revision history for this message
Ghadi Rahme (ghadi-rahme) wrote :
Changed in gdm3 (Ubuntu Jammy):
importance: Undecided → High
assignee: nobody → Ghadi Rahme (ghadi-rahme)
status: New → In Progress
Changed in gdm3 (Ubuntu Lunar):
importance: Undecided → High
assignee: nobody → Ghadi Rahme (ghadi-rahme)
status: New → In Progress
Revision history for this message
Ghadi Rahme (ghadi-rahme) wrote :
Revision history for this message
Ghadi Rahme (ghadi-rahme) wrote :
Changed in gdm3 (Ubuntu Mantic):
milestone: ubuntu-22.04.2 → mantic-updates
Changed in gdm3 (Ubuntu Jammy):
milestone: none → ubuntu-22.04.3
Changed in gdm3 (Ubuntu Lunar):
milestone: none → lunar-updates
Changed in gdm3 (Ubuntu Mantic):
status: In Progress → Fix Committed
Changed in gdm3 (Ubuntu Lunar):
status: In Progress → Fix Committed
Revision history for this message
Sebastien Bacher (seb128) wrote :

Thanks for the work there. I've sponsored the patch as it is currently even if the details are still being discussed upstream since there seems to be agreement that the change is right and only how it's structured is discussed at this point. You should probably try to nag the SRU team today for a review if you need to get things moving

Changed in gdm3 (Ubuntu Jammy):
status: In Progress → Fix Committed
description: updated
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Marking Kinetic as Won't Fix (release close to EOL, as mentioned in an email thread).

Changed in gdm3 (Ubuntu Kinetic):
status: New → Won't Fix
Revision history for this message
Andreas Hasenack (ahasenack) wrote (last edit ):

Please include a "normal" test for gdm3 in the test case. Something like testing that logins keep working after applying this update, a switch between "init 3" and "init 5", and that user switching works (while logged in as user A, with open apps, login as user B, launch an app, and come back to user A and you still have your stuff running).

And if the upstream MP changes significantly, please update the patch here as well.

Please take the following into consideration for the next time. It's not blocking this upload, but helps with the SRU processing and allows us to process more SRUs per day:
- use the same patch filename in all affected releases: it speeds up considerably the review of the diff, as git can show they are identical, or mostly identical. But with different filenames, and content (see below), not even git can do it
- try to use the same patch content-wise, where possible of course (most of the time the affected versions will be different). But the DEP3 headers at least can be very similar, if not identical
- complete DEP3 headers: one patch has an ubuntu bug reference but no upstream reference, while the other one has an upstream reference, but no ubuntu bug. And the code change is the same, but with a different diff.
- avoid including unrelated changes in the upload, even if they are no-ops. For example, the lunar upload is changing the Uploaders field in d/control. It makes us wonder if that was on purpose, if it was a mistake, perhaps we ask the uploader if they want to keep it or reupload, etc. Unecessary back-and-forth.

tags: added: verification-needed verification-needed-lunar
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello Ghadi, or anyone else affected,

Accepted gdm3 into lunar-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/gdm3/44.0-1ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-lunar to verification-done-lunar. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-lunar. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed-jammy
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hello Ghadi, or anyone else affected,

Accepted gdm3 into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/gdm3/42.0-1ubuntu7.22.04.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gdm3 - 44.1-1ubuntu2

---------------
gdm3 (44.1-1ubuntu2) mantic; urgency=medium

  * debian/patches/ubuntu/gitlab_clearing_signal.patch:
    - clear signal handlers after udev settle to avoid unexpected session
      logouts, thanks Ghadi Rahme (lp: #2020641)

 -- Sebastien Bacher <email address hidden> Thu, 01 Jun 2023 12:37:25 +0200

Changed in gdm3 (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

Hi,

I've been able to verify that the package in -proposed fixes the issue for Jammy:

Before applying the patch we have:

https://pastebin.ubuntu.com/p/KphZJjZRsb/

when attempting a "sudo snap disconnect skype:opengl :opengl" we can see the gdm session being launched:

https://pastebin.ubuntu.com/p/hSk3xRWxnz/

After installing gdm3 from proposed:

https://pastebin.ubuntu.com/p/MGcQmf6gzs/

Attempted the same "snap disconnect skype:opengl :opengl": and I no longer see another gdm session being launched:

https://pastebin.ubuntu.com/p/f4qWHtrzxN/

Also tested with "udevadm trigger" (which is ultimately called by snap connect/disconnect and triggers the issue) and I can't reproduce the problem anymore:

https://pastebin.ubuntu.com/p/4zdmPd5bSH/

Marking verification-done-jammy. I'll verify lunar tomorrow.

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (gdm3/42.0-1ubuntu7.22.04.3)

All autopkgtests for the newly accepted gdm3 (42.0-1ubuntu7.22.04.3) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

systemd/249.11-0ubuntu3.9 (amd64, arm64, ppc64el, s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#gdm3

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (gdm3/44.0-1ubuntu2)

All autopkgtests for the newly accepted gdm3 (44.0-1ubuntu2) for lunar have finished running.
The following regressions have been reported in tests triggered by the package:

gtg/0.6-2 (ppc64el)
systemd/252.5-2ubuntu3 (amd64, arm64, ppc64el, s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/lunar/update_excuses.html#gdm3

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

> Marking verification-done-jammy. I'll verify lunar tomorrow.

Please note my request in comment #9 about a few extra test cases.

tags: added: verification-needed-jammy
removed: verification-done-jammy
Revision history for this message
Sebastien Bacher (seb128) wrote :

Sorry about the difference-between-series that's my fault, as said on IRC I started doing the cherrypicking after being pinged via email and midway noticed there was patch and sponsoring request here, I didn't want to redo the uploads for the series already pushed but also didn't want to ignore the contributor so I sponsored for lunar the version of the bug...

Revision history for this message
Sebastien Bacher (seb128) wrote :

oh and the debian/control change is a feature of gnome-pkg-tools which regenerate debian/control from debian/control.in and common in desktop packages

Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

Hi Andreas,

I'm sorry, I missed your request for the additional tests. Just did them for Jammy. Adding here:

- Logged in as ubuntu user and launched Firefox, LibreOffice and gedit:

https://pastebin.ubuntu.com/p/H3WVmVkgJh/

- Switched user to fabiomirmar and opened Thunderbird, skype and gnome-calculator:

https://pastebin.ubuntu.com/p/By3q2kf8Nq/

- Switched back to the ubuntu user and I see its apps are still running, and so are the processes for the fabiomirmar user:

https://pastebin.ubuntu.com/p/DxMtJ2MVnN/

Also tested the "init 3" / "init 5" tests you mentioned. After the tests mentioned above, I switched to multi-user.target:

https://pastebin.ubuntu.com/p/DXPjRYsfQC/

Then went back to graphical.target:

https://pastebin.ubuntu.com/p/3ghpd2VtVZ/

Marking verification-done-jammy. Let me know if I missed anything.

tags: added: verification-done-jammy
removed: verification-needed-jammy
description: updated
Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

I'm not sure how can we verify lunar. There's no DCV available for non-LTS releases, and there's no known way of reproducing this issue without DCV.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The steps to reproduce at the top of this page appear to be independent of DCV(?). And even if they weren't, then simply verifying the patch is present would suffice.

We don't want to skip lunar since it's theoretically a generic bug that could occur without DCV.

Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

Daniel,

Thanks for chiming in.

I've tried reproducing this bug outside of DCV, but no lucky.

What I tried last is to launch a baremetal instance on AWS (g4dn.metal) with the same Nvidia GPUs as the instance type where we are consistently able to reproduce this problem, and then I:

1. Installed KVM
2. Created a Jammy VM (just to see if I would be able to reproduce it, and then I would do the same for Lunar)
3. Added an Nvidia GPU to the VM with PCI passthrough
4. Installed the nvidia driver
5. Setup desktop with Xorg (and gdm3 with Xorg) the same way we've been doing with the AWS VM
6. Tried to reproduce the problem by installing skype snap or running 'sudo udevadm trigger'

However, I'm unable to reproduce this problem, even in a Jammy VM (when not using DCV).

You mentioned "then simply verifying the patch is present would suffice", how would I do that? Just showing tha gdm3 from -proposed is enough?

If so, I did setup a Lunar VM the same way (also with the same Nvidia GPU) and did the basic tests:

One thing to notice is that "apt install gdm3" wouldn't upgrade the package to the one in -proposed:

https://pastebin.ubuntu.com/p/FDdFfcPHDd/

So, I had to specify the version:

https://pastebin.ubuntu.com/p/Kq7Zj9wxH2/

Not sure if this is a problem.

Anyway, after that I rebooted the VM and did the basic tests:

Packages installed from -proposed:

https://pastebin.ubuntu.com/p/ZTxYTSQD3v/

- Logged in as ubuntu user and launched Firefox, LibreOffice and gedit:

https://pastebin.ubuntu.com/p/xBJXTG6NgP/

- Switched user to fabiomirmar and opened Thunderbird, skype and gnome-calculator:

https://pastebin.ubuntu.com/p/RSCxqCGcrH/

- Switched back to the ubuntu user and I see its apps are still running:

https://pastebin.ubuntu.com/p/j4kBKF3bXp/

- Once again, switched back to fabiomirmar, and the processes are also running:

https://pastebin.ubuntu.com/p/q9TJFJhQgp/

Also tested the "init 3" / "init 5" tests you mentioned. After the tests mentioned above, I switched to multi-user.target:

https://pastebin.ubuntu.com/p/VkrkdwXcbt/

Then went back to graphical.target:

https://pastebin.ubuntu.com/p/7X8VyydPGz/

Another feedback that I have to share, is that although it's not consistent, on Lunar sometimes the session (gnome?) crashes when you are logging back in after switching user. i.e.: you login as ubuntu, open one or more apps (such as firefox), hit "switch user" and try to login again. Most of the times, it will work and firefox will be running, but sometimes the session crashes and you are brought back to gdm and have to login again. After logging back in, Firefox is no longer there. The logs during the crash are available here:

https://pastebin.ubuntu.com/p/bfRQnrhFNG/

However, I don't think this is related to the GDM patch from Ghadi, as I'm able to reproduce the very same problem with gdm from lunar/main from a fresh installation:

https://pastebin.ubuntu.com/p/bSH7mwVTtB/

I think this might need its own launchpad bug, but leaving my comment with the details here, so that others can share their feedback. Also, not marking this as verification-done for lunar, but I think it's done :)

Regards,
Fabio Martins

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I have confirmed the existence of ubuntu/flush-signal-handler-on-udev-settle.patch in https://launchpad.net/ubuntu/+source/gdm3/44.0-1ubuntu2 which is the best we can do for lunar.

tags: added: verification-done-lunar
removed: verification-needed-lunar
Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

On second thoughts, patching lunar is unlikely to be useful. So long as the fix is in mantic for future releases, that's enough.

tags: added: verification-needed-lunar
removed: verification-done-lunar
Changed in gdm3 (Ubuntu Lunar):
status: Fix Committed → Won't Fix
importance: High → Low
Changed in gdm3 (Ubuntu Kinetic):
importance: Undecided → Low
Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

Daniel,

For when you have a chance, we've discussed this bug a bit in #ubuntu-release in Libera.Chat. The SRU team would like to hear your opinion on SRU'ing this fix for lunar, and your comments above on mentioning that it's unlikely to be useful vs. what you mentioned previously of this being a general bug.

I would appreciate it if you can chime in and add your comments, so we can unblock this SRU.

Regards,
Fabio Martins

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I was just being overcautious and a bit nervous approving a change to a stable release that didn't seem to be necessary. Sorry that I suddenly started to contradict my previous comments. If more of the team want to release it to lunar for consistency then sure...

Changed in gdm3 (Ubuntu Lunar):
status: Won't Fix → Fix Committed
tags: added: verification-done-lunar
removed: verification-needed-lunar
tags: added: verification-done
removed: verification-needed
Revision history for this message
Ghadi Rahme (ghadi-rahme) wrote :

Hello Everyone,

It seems that the regression that appeared in the autopkgtest for Jammy has been resolved and now the test succeeds on all architectures.

However the test for Lunar fails due to an issue related to the autopkgtest script. The test in question that is failing is the "testcase_nvme_basic" test and the issue seems to be related to the test itself since the logs report a segfault occurring in the "test-functions" script. QEMU is also reporting multiple warnings about the use of a deprecated variable name being used by the test. The logs of the error can be found here: https://pastebin.canonical.com/p/cTc2RQN84m/

Also this error is occurring very early on in the boot phase (before the kernel is even up and running) so I cannot see the GDM patch being the reason or the cause of the issue.

I would love for someone to chime in and confirm if my analysis is correct or not.

Regards,
Ghadi

Revision history for this message
Andreas Hasenack (ahasenack) wrote (last edit ):

Thanks for the lunar tests in comment #22, and also for confirming that the problem you saw happened without this update. I think a separate bug for that is appropriate, if you could please file one.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gdm3 - 44.0-1ubuntu2

---------------
gdm3 (44.0-1ubuntu2) lunar; urgency=medium

  * Add d/p/ubuntu/flush-signal-handler-on-udev-settle.patch:
    Clear signal handler after udev settle to avoid sudden
    logout (LP: #2020641)

 -- Ghadi Elie Rahme <email address hidden> Thu, 01 Jun 2023 12:39:51 +0200

Changed in gdm3 (Ubuntu Lunar):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Update Released

The verification of the Stable Release Update for gdm3 has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gdm3 - 42.0-1ubuntu7.22.04.3

---------------
gdm3 (42.0-1ubuntu7.22.04.3) jammy; urgency=medium

  * debian/patches/ubuntu/gitlab_clearing_signal.patch:
    - clear signal handlers after udev settle to avoid unexpected session
      logouts, thanks Ghadi Rahme (lp: #2020641)

 -- Sebastien Bacher <email address hidden> Thu, 01 Jun 2023 12:37:58 +0200

Changed in gdm3 (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Ghadi Rahme (ghadi-rahme) wrote :

Hey everyone,

Today the changes were merged upstream after I did some slight modifications to the patch last week. The change is mainly esthetic and is not related to any possible regressions mentioned by upstream.

https://gitlab.gnome.org/GNOME/gdm/-/merge_requests/210

Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

For completeness: re: the Lunar issue that I mentioned in comment #22 and that Andreas suggested me to file a separate bug (on comment #28):

I tested a freshly installed Lunar VM, and also did 'apt upgrade' everything in my previous Lunar VM (where I was originally hitting the problem I mentioned) and I'm no longer seeing an issue on switch user, so I'm going to hold on filing another bug. I believe that problem is already resolved with newer Xorg version.

I'll file a different bug in case I hit the problem again in the future.

Revision history for this message
Ghadi Rahme (ghadi-rahme) wrote :

Below are the updated patches for each release. The changes are now the same as upstream and I renamed the patch for jammy and mantic to be more indicative of what the patches do (as well as to be the same as lunar).
If anything needs changing please let me know!

Revision history for this message
Ghadi Rahme (ghadi-rahme) wrote :
Revision history for this message
Ghadi Rahme (ghadi-rahme) wrote :
Revision history for this message
Sebastien Bacher (seb128) wrote :

@Ghadi, the fix was SRUed right? Is there any need to iterate over that fix again? I though we would just get the upstream version by updating gdm in mantic and let other releases as they are...

Revision history for this message
Ghadi Rahme (ghadi-rahme) wrote :

Hi Sebastien,

Yes the fix was SRUed but I was under the impression that if the patch changes upstream it would need to be updated here as well, but I guess it isn't the case after all. Sorry for the confusion.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.