'dpkg -r snapd' WIPED /home

Bug #1989019 reported by marsteegh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd (Ubuntu)
New
Undecided
Unassigned

Bug Description

-My machine has /home on a separate partition.
-I updated to ubuntu 22.04.
-I don't know exactly how it came to be, but somehow / was mounted to /var/lib/snap/hostfs and /home was mounted to /var/lib/snap/hostfs/home . I *think* /var/lib/snap/hostfs is mounted ro, but /var/lib/snap/hostfs/home is mounted rw.
- After the update I had trouble using snaps, so I tried uninstalling snapd (the only dependency seemed to be firefox) so I could reinstall it with a clean slate.
- `dpkg -r snapd` printed a lot of warnings about not being able to delete stuff in /var/lib/snap/hostfs/ because the filestsytem was mounted ro. I didn't really think anything of it.
- the dpkg did take suspiciously long. So I aborted it. It had WIPED ~350GB of data in /var/lib/snapd/hostfs/home which was an alias for /home

I lost a *lot* of data. Luckily I have some backups but I think this should be looked at!

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Something has gone very wrong. I'm very sorry for the damage caused by the removal script. Do you have any logs you can share. Any configuration of your system that may help debug the issue, that you can think of.

Normally hostfs is never visible on the system.

As a safety mechanism we should exclude hostfs from removal, to avoid the possibility of catastrophic failure.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Can you share how you mounted /home? Ddi you use fstab or systemd mount unit or something else? Did you use a non-standard kernel? Do you have logs in journald that you could attach?

Revision history for this message
marsteegh (marsteegh) wrote :

/home is mounted from /etc/fstab, by UUID, with an order number higher than /. Bog standard stuff. root is on /dev/sdd1 (my ssd) and /home is on /dev/sda1 (an old-fashioned rotating hdd).

I'll see if I can find logs, I'm a bit too tired to dig for them right now. I don't actually think the removal script really is to blame. Though I think it should never try to remove /var/lib/snapd/hostfs !

The real problem is (I think) that though /var/lib/snapd/hostfs is mounted read-only any submounts of / are not read-only. This also disables any sandboxing snap is supposed to provide imho.

I found out that /lib/systemd/system-generators/systemd-fstab-generator is the program responsible for generating the unitfile that causes /var/lib/snapd/hostfs to be mounted. But I can't find how /home gets mounted below that.

I must say I really really long for the time when mounting was just done via fastab and nothing else. All those interconnected systemd and snap parts make it mighty confusing and brittle.

Revision history for this message
marsteegh (marsteegh) wrote :

thanks for your quick reply, btw.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Normally /var/lib/snapd/hostfs is not mounted in the initial mount namespace. It is only visible from the mount namespace of individual snap processes. Can you run /lib/systemd/system-generators/systemd-fstab-generator and provide the output it generates? Can you also share your /etc/fstab in case there is something special there?

To be clear: if you see /var/lib/snapd/hostfs mounted, then something is extremely wrong.

Revision history for this message
marsteegh (marsteegh) wrote :

first my fstab.

I'll try to gather the output of /lib/systemd/system-generators/systemd-fstab-generator
it's a bit of a hassle that it just dumps all sorts of files (with names that can easily collide with stuff already there) in /tmp. I'll try to sort it out.

Revision history for this message
marsteegh (marsteegh) wrote :

output of the generator:

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Your fstab contains the line

/ /var/lib/snapd/hostfs none defaults,ro,bind 0 0

This should never have been present there. I strongly recommend that you immediately comment-out this line.

Did you edit your fstab by hand? Do you have any idea how this could have happened?

Revision history for this message
marsteegh (marsteegh) wrote :

I'll try to get myself to dig through the logs.

some more background:

Of course this whole saga starts with a botched upgrade

I tried upgrading a focal system to jammy. The upgrade failed with a 'some files failed to download, aborting upgrade' (paraphrasing). But it failed *after* it already had downloaded and *unpacked/overwritten* a large part of the system.

So this left my system in a limbo state where it would not fully boot. It would get as far as mounting root, and then it would stop on a timeout mounting my home folder.

I dug into that and found out that the systemd unit for the block device failed, because it would try to open a socket connection to init and then after some back & forth never get a reply. (god, this whole system feels so overengineered. I'm sure it's useful for something, but why not just check if the block device is there and mount it already).

I decided it was probably a case of a half installed new system, with different parts of the systemd machinery being from different versions and misunderstanding eachother. I this the default trick for botched upgrades: start from an ubuntu 22.04 usb stick, mount the hdds, bind-mount /proc etc, chroot into the hdd and run apt dist-upgrade.

This worked and got my system back up to a running state, but it of course did not do the rest of the jammy update executable, so maybe there's the origin of hostfs being visible?

Everything seemed to run well, *except* that any exectuable from a snap refused to run because 'snap-confine' was running unrestricted (apparmore error msg. paraphrasing, I could get the exact message but it doesn't really matter for this story).

'apt reinstall snapd' didn't work. Neither did reinstalling apparmor or apparmor profiles.
I then decided to dpkg -r snapd (with the plan of then manually wiping any remains) and reinstall it. This trick has often saved me in the past and it seemed feasible, as the only thing depending on snapd (according to dpkg) was firefox. With the disastrous results following.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Do you recall if the hostfs entry in /etc/fstab was present on your system from the start? Do you know when it could have been added there.

Revision history for this message
marsteegh (marsteegh) wrote :

I don't recall for sure. But before the update I had never consciously seen it.

It's of course always possible I have added it once in a grey past to work around a badly 'snapped' package and forgot about it.

What is strange to me is that the /home gets added automatically (and rw at that!) normally when you do a bind mount any submounts don't come with it.

Revision history for this message
marsteegh (marsteegh) wrote :

I think I'd consider that to be the actual bug.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I think what you are experiencing is:

/ is mounted
/ is altered with shared mount event propagation (systemd default)
/ is bind-mounted to /var/lib/snapd/hostfs
/home is mounted

Because systemd uses shared mount propagation by default, the /home mount propagates to /var/lib/snapd/hostfs

Making a read-only bind mount does not affect, in any way, bind mounts that are made there. Propagation acts just like a bind mount.

I did a quick test: I've added this to my fstab:

none /home/tmp tmpfs defaults 0 0
/ /potato none defaults,ro,bind 0 0

Looking at mountinfo I see:

zyga@lambert:~$ cat /proc/self/mountinfo | grep home
95 30 0:37 / /home/tmp rw,relatime shared:46 - tmpfs none rw,inode64
96 92 0:37 / /potato/home/tmp rw,relatime shared:46 - tmpfs none rw,inode64

You can see that the /potato (alias for hostfs) bind-mount has access to a writable copy of the tmpfs at /home/tmp.

Revision history for this message
marsteegh (marsteegh) wrote :

> Because systemd uses shared mount propagation by default
That's definitely the cause. I consider that an extremely dangerous default. If you mount a filesystem ro you expect it to be ro. It's even one of the main use cases of bind mounts, to provide read-only access.

Whoever decided shared mount propagation was a good idea should really rethink that imho.

All in all I rather like the systemd way with unit files. It's definitely more standardized than the bunch of haphazard shellscripts we usedd to wrangle. But imho there's a bit too much 'automagic' stuff happening everywhere in the system.

Revision history for this message
marsteegh (marsteegh) wrote :

How that mountpoint ended up in my fstab I have no idea, but I guess it's probably not a result of the update.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

As for the shared propagation default, that's a systemd choice and snapd is not altering that. As for your comment on mounting I think you misunderstand how this works. Sadly read only flag is a bit more confusing as there's one on the file system and then there's a separate one on each mount point where that file system shows up (bind mount).

In both cases /home is not read only. In fact, it was not bind mounted by systemd but by the kernel due to the propagation. I agree this is non-obvious and difficult to reason about somehow but those are both external to snapd.

I would try to focus on:

1) Determining how your fstab was modified
2) Adding precautions to snapd to detect something of this sort to avoid data loss.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

In absence of further feedback I will focus on 2). Please let us know if you remember if you modified fstab yourself and (if possible) what made perform that modification.

Revision history for this message
marsteegh (marsteegh) wrote :

I'm sorry for the late reaction. Something came up and I was very busy.

I did try to find out how fstab was modified. I did a find + grep through all shell scripts I could find on my machine to see if I found one responsible for that entry, but did not find anything. So maybe I created it myself in a gray past. I can't really recover that info.

Adding a check for mounted folders on /var/lib/snap can't hurt. Or maybe do as lots of other packages do and only delete what you created below /usr/lib/snap and then only remove /usr/lib/snap if it's empty.

I think the main 'bug' is the default (and imho unexpected) mount propagation which makes the kernel ignore a 'ro' flag. Bus as you said that's a systemd problem.

if I can find the energy I'll try to report that with systemd.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

As to how that /home was mounted rw. I think it's not a bug in either systemd or kernel. Again, it's how it is documented to work: https://www.kernel.org/doc/html/latest/filesystems/sharedsubtree.html

To get what you intended to work, you'd have to either mount the block device that is responsible for /home read-only, at the file-system level _or_ adjust the propagated /home bind-mount that showed up under /var/lib/snapd/hostfs to be a read-only bind mount. Some of those operations are not atomic.

TL;DR: it's complicated and side effects can bite

I'll look at the snapd purge scripts to see if we can add a safety check when removing that specific directory.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.