broken ~<user>/snap/<snap>/<version> permissions: owned root:root not <user>:<group>

Bug #1992324 reported by James Dingwall
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd (Ubuntu)
Confirmed
High
Unassigned

Bug Description

This is an issue I have encountered on two (both 22.04) different systems, one with the 'teams' snap and a second with the 'slack' snap.

$ pwd
/home/user/snap/slack
$ ls -la
total 27
drwxr-xr-x 5 user @user 6 Oct 9 21:34 .
drwx------ 11 user @user 11 Oct 9 21:32 ..
drwxr-xr-x 4 user @user 6 Sep 23 07:45 65
drwx------ 4 root root 5 Oct 9 20:43 66
drwxr-xr-x 3 user @user 3 Mar 16 2022 common
lrwxrwxrwx 1 user @user 2 Oct 9 21:26 current -> 66

In this case the snap fails to start with permission errors. My guess is that the migration of user data failed part way through presumably in a code path which doesn't handle interruption or is capable of being resumed. There is approximately 1.2T free space (df -h) for /home/user so I don't think this
is a free space problem.

$ sudo du -sm 65 66
1115 65
406 66

My workaround is to point the the 'current' link at the old version and then start the snap:

$ rm -f current; sudo rm -rf 66; ln -s 65 current; snap
Importing existing Slack profile from /home/user/.config/Slack to /home/user/snap/slack/66/.config
Import done in 831.232 s

$ apt-cache policy snapd
snapd:
  Installed: 2.56.2+22.04ubuntu1
  Candidate: 2.56.2+22.04ubuntu1
  Version table:
 *** 2.56.2+22.04ubuntu1 500
        500 http://gb.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     2.55.3+22.04 500
        500 http://gb.archive.ubuntu.com/ubuntu jammy/main amd64 Packages

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.1 LTS
Release: 22.04
Codename: jammy

description: updated
description: updated
Revision history for this message
Alberto Mardegan (mardy) wrote :

Hi James, do you actually have a /home/user/.config/Slack folder in your system?

The workaround you used seems to have copied your old data from there, instead of actually using the data from the 65th revision of slack.

What I would try to do:

cd /home/user/snap/slack
rm -f current
ls -l 66 # I guess this does not exist
sudo rm -rf 66 # Just in case
sudo snap refresh --revision=65 slack
sudo snap refresh --stable slack

And see if the problem reproduces again. But before doing that, it would be nice if you could attach the output of

    sudo journalctl -u snapd

so that we can see if there are still traces of the error that snapd incurred into when upgrading.

Revision history for this message
James Dingwall (a-james-launchpad) wrote :
Download full text (9.1 KiB)

This is the snapd.service journal. During this boot I did the 20.04 to 22.04 upgrade so that could be a contributing factor. I suspect the 'cp' process mentioned is `cp 65 66` (https://github.com/snapcore/snapd/blob/3a9aaabe38e2ac4e45cfa9ca1c0c00522ca1d04f/osutil/cp.go#L186 ?). I'll try a rollback on the snap revision and avoid service restarts/reboot and that will probably work.

```
-- Boot a676a11120cc42a4aaf52d69df4f4286 --
Oct 09 20:14:34 hostname systemd[1]: Starting Snap Daemon...
Oct 09 20:14:34 hostname snapd[9546]: AppArmor status: apparmor is enabled and all features are available
Oct 09 20:14:35 hostname snapd[9546]: AppArmor status: apparmor is enabled and all features are available
Oct 09 20:14:35 hostname snapd[9546]: overlord.go:263: Acquiring state lock file
Oct 09 20:14:35 hostname snapd[9546]: overlord.go:268: Acquired state lock file
Oct 09 20:14:35 hostname snapd[9546]: daemon.go:247: started snapd/2.57.2 (series 16; classic) ubuntu/20.04 (amd64) linux/5.15.0-48-generic.
Oct 09 20:14:35 hostname snapd[9546]: daemon.go:340: adjusting startup timeout by 2m5s (pessimistic estimate of 30s plus 5s per snap)
Oct 09 20:14:35 hostname snapd[9546]: backend.go:135: snapd enabled NFS support, additional implicit network permissions granted
Oct 09 20:14:35 hostname systemd[1]: Started Snap Daemon.
Oct 09 20:24:35 hostname snapd[9546]: devicemgr.go:2000: no NTP sync after 10m0s, trying auto-refresh anyway
Oct 09 20:24:35 hostname snapd[9546]: storehelpers.go:748: cannot refresh: snap has no updates available: "bare", "core", "core18", "core20", "core22", "cups", "discord", "freecad", "gnome-3-28-1804", "gnome-3-34-1804", "gnome-3-38-2004", "gnome-42-2204", "gtk-common-themes", "kde-frameworks-5-91-qt-5-15-3-core20", "micropolis", "skype", "teams", "zoom-client"
Oct 09 20:29:25 hostname snapd[9546]: api_snaps.go:319: Installing snap "firefox" revision unset
Oct 09 20:38:33 hostname snapd[9546]: main.go:155: Exiting on terminated signal.
Oct 09 20:38:33 hostname systemd[1]: Stopping Snap Daemon...
Oct 09 20:40:03 hostname systemd[1]: snapd.service: State 'stop-sigterm' timed out. Killing.
Oct 09 20:40:03 hostname systemd[1]: snapd.service: Killing process 9546 (snapd) with signal SIGKILL.
Oct 09 20:40:03 hostname systemd[1]: snapd.service: Main process exited, code=killed, status=9/KILL
Oct 09 20:40:03 hostname systemd[1]: snapd.service: Failed with result 'timeout'.
Oct 09 20:40:03 hostname systemd[1]: snapd.service: Unit process 141980 (cp) remains running after unit stopped.
Oct 09 20:40:03 hostname systemd[1]: Stopped Snap Daemon.
Oct 09 20:40:03 hostname systemd[1]: snapd.service: Triggering OnFailure= dependencies.
Oct 09 20:40:03 hostname systemd[1]: snapd.service: Found left-over process 141980 (cp) in control group while starting unit. Ignoring.
Oct 09 20:40:03 hostname systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 09 20:40:03 hostname systemd[1]: Starting Snap Daemon...
Oct 09 20:40:03 hostname snapd[600053]: AppArmor status: apparmor is enabled and all features are available
Oct 09 20:40:03 hostname snapd[600053]: AppArmor status: apparmor is enabled a...

Read more...

Revision history for this message
Alberto Mardegan (mardy) wrote :

Thanks James, I think that your analysis is correct. I'll try to reproduce the issue locally and then think of how we can prevent it from happening.

Changed in snapd (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Alberto Mardegan (mardy) wrote :

I'm investigating the issue and I managed to get a better understanding of what happens: the copy operation for the snap data takes a long time, and when systemd wants to restart snapd, the "cp" process is not killed because snapd has "KillMode=process" in its systemd unit file.

Then snapd restarts and spawns a "cp" command again, while the previous one is still running: so, for some time we have two "cp" processes performing the same recursive copy; I'm not sure how cp is implemented, but if it doesn't use the *at() family of functions (openat, chownat), then I can imagine that we could get some data corruption.

I think we should not let the "cp" processes outlive snapd, in order to make sure that on the next iteration the copy will be restarted with no interferences.

Revision history for this message
James Dingwall (a-james-launchpad) wrote :

I think there something about the order of the operations when user data is copied that looks a vulnerable to exploitation via a symlink:

copySnapDataDirectory() starts with a a call trash which is designed to move any existing destination path aside. There is a window after it returns for the destination to be replaced with a symlink. After this the osutil.CopyFile() which calls runCpPreserveAll() and results in the cp command being started as the root user, e.g. `cp -av /home/user/snap/teams/7 /home/user/snap/teams/8` but 8 is now a symlink somewhere else.

Since the user can control the content of the source path is it possible to some how being able to write to an arbitrary location on the filesystem. (Admittedly so far I haven't been able to construct a destination to show this.)

Would it be safer to copy the user data as the user rather than root?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.