package update causes all other snap package mounts to fail

Bug #1962258 reported by Brandon Locke
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Invalid
Undecided
Unassigned

Bug Description

I am experiencing an issue where whenever there is an update for a snap package, all other snap packages lose their mounts and all snaps that were not updated become broken. This is a reproducible issue on multiple servers running CloudLinux, however, CloudLinux claims it is a snapd issue and not a CloudLinux issue, despite my inability to reproduce the issue on servers not running CloudLinux.

Symptoms:

Before upgrade all mounts show with df -h:

[root@cl ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 9.5M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/vda3 48G 12G 34G 25% /
/dev/loop0 111M 111M 0 100% /var/lib/snapd/snap/core/12725
/dev/loop2 62M 62M 0 100% /var/lib/snapd/snap/core20/1328
/dev/loop3 62M 62M 0 100% /var/lib/snapd/snap/core20/1361
/dev/loop1 44M 44M 0 100% /var/lib/snapd/snap/certbot/1788
tmpfs 384M 0 384M 0% /run/user/0

To manually trigger the issue, I revert the core20 package to a previous revision using "snap revert core20 --revision 1328". I then remove the newer core20 snap with "snap remove core20 --revision 1361". Then I simply run "snap refresh core20" to grab the new revision of core20 again. After doing so, the only mount still mounted is for the new core package:

[root@cl ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 9.5M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/vda3 48G 12G 34G 25% /
tmpfs 384M 0 384M 0% /run/user/0
/dev/loop3 62M 62M 0 100% /var/lib/snapd/snap/core20/1361

/var/log/messages complains around this time of issues with being unable to find core20, core, and certbot revisions:

Feb 23 16:32:03 cl systemd: Reloading.
Feb 23 16:32:03 cl systemd: [/usr/lib/systemd/system/lvestats.service:5] Unknown lvalue 'StartLimitIntervalSec' in section 'Unit'
Feb 23 16:32:03 cl systemd: [/usr/lib/systemd/system/lvestats.service:6] Unknown lvalue 'StartLimitBurst' in section 'Unit'
Feb 23 16:32:03 cl systemd: Configuration file /usr/lib/systemd/system/svscanboot.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Feb 23 16:32:03 cl systemd: Reloading.
Feb 23 16:32:03 cl kernel: Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:52:54:00:1f:01:58:08:00 SRC=67.227.152.181 DST=255.255.255.255 LEN=173 TOS=0x00 PREC=0x00 TTL=128 ID=20313 PROTO=UDP SPT=17500 DPT=17500 LEN=153
Feb 23 16:32:03 cl systemd: [/usr/lib/systemd/system/lvestats.service:5] Unknown lvalue 'StartLimitIntervalSec' in section 'Unit'
Feb 23 16:32:03 cl systemd: [/usr/lib/systemd/system/lvestats.service:6] Unknown lvalue 'StartLimitBurst' in section 'Unit'
Feb 23 16:32:03 cl systemd: Configuration file /usr/lib/systemd/system/svscanboot.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Feb 23 16:32:03 cl systemd: var-lib-snapd-snap.mount: Directory /var/lib/snapd/snap to mount over is not empty, mounting anyway.
Feb 23 16:32:03 cl systemd: Mounting Ensure that the snap directory shares mount events....
Feb 23 16:32:03 cl systemd: Mounted Ensure that the snap directory shares mount events..
Feb 23 16:32:03 cl systemd: Mounting Mount unit for core20, revision 1361...
Feb 23 16:32:03 cl systemd: Mounted Mount unit for core20, revision 1361.
Feb 23 16:32:03 cl snapd: snapmgr.go:327: cannot read snap info of snap "core20" at revision 1328: cannot find installed snap "core20" at revision 1328: missing file /var/lib/snapd/snap/core20/1328/meta/snap.yaml
Feb 23 16:32:03 cl snapd: snapmgr.go:327: cannot read snap info of snap "core20" at revision 1328: cannot find installed snap "core20" at revision 1328: missing file /var/lib/snapd/snap/core20/1328/meta/snap.yaml
Feb 23 16:32:03 cl snapd: snapmgr.go:327: cannot read snap info of snap "core20" at revision 1328: cannot find installed snap "core20" at revision 1328: missing file /var/lib/snapd/snap/core20/1328/meta/snap.yaml
Feb 23 16:32:03 cl snapd: snapmgr.go:327: cannot read snap info of snap "core20" at revision 1328: cannot find installed snap "core20" at revision 1328: missing file /var/lib/snapd/snap/core20/1328/meta/snap.yaml
Feb 23 16:32:04 cl snapd: snapmgr.go:327: cannot read snap info of snap "core20" at revision 1328: cannot find installed snap "core20" at revision 1328: missing file /var/lib/snapd/snap/core20/1328/meta/snap.yaml
Feb 23 16:32:04 cl snapd: link.go:133: cannot update fontconfig cache: cannot get fc-cache-v6 from core: open /var/lib/snapd/snap/core/current/bin/fc-cache-v6: no such file or directory
Feb 23 16:32:04 cl snapd: snapmgr.go:327: cannot read snap info of snap "core20" at revision 1328: cannot find installed snap "core20" at revision 1328: missing file /var/lib/snapd/snap/core20/1328/meta/snap.yaml
Feb 23 16:32:04 cl snapd: snapmgr.go:327: cannot read snap info of snap "certbot" at revision 1788: cannot find installed snap "certbot" at revision 1788: missing file /var/lib/snapd/snap/certbot/1788/meta/snap.yaml
Feb 23 16:32:04 cl snapd: snapmgr.go:327: cannot read snap info of snap "core" at revision 12725: cannot find installed snap "core" at revision 12725: missing file /var/lib/snapd/snap/core/12725/meta/snap.yaml
Feb 23 16:32:04 cl snapd: storehelpers.go:721: cannot refresh snap "core20": snap has no updates available
Feb 23 16:32:04 cl snapd: snapmgr.go:327: cannot read snap info of snap "certbot" at revision 1788: cannot find installed snap "certbot" at revision 1788: missing file /var/lib/snapd/snap/certbot/1788/meta/snap.yaml
Feb 23 16:32:04 cl snapd: snapmgr.go:327: cannot read snap info of snap "core" at revision 12725: cannot find installed snap "core" at revision 12725: missing file /var/lib/snapd/snap/core/12725/meta/snap.yaml

And due to the missing mounts all snaps (except the one that just updated) are broken:

[root@cl ~]# snap list
Name Version Rev Tracking Publisher Notes
certbot - 1788 latest/stable certbot-eff✓ broken
core - 12725 latest/stable canonical✓ broken
core20 20220215 1361 latest/stable canonical✓ base

Rebooting the system restores all mounts, but that's not a possibilities every time this issue occurs. As I mentioned, this issue is reproducible at will, so if there is any other information you need, please let me know.

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

Please attach contents of /etc/os-release and the journal (system one) from the time period where this happens (eg. sudo journalctl --since 2022-02-23 --no-pager).

Changed in snapd:
status: New → Incomplete
Revision history for this message
Brandon Locke (brandonllocke) wrote :

[root@cl ~]# cat /etc/os-release
NAME="CloudLinux"
VERSION="7.9 (Boris Yegorov)"
ID="cloudlinux"
ID_LIKE="rhel fedora centos"
VERSION_ID="7.9"
PRETTY_NAME="CloudLinux 7.9 (Boris Yegorov)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:cloudlinux:cloudlinux:7.9:GA:server"
HOME_URL="https://www.cloudlinux.com/"
BUG_REPORT_URL="https://www.cloudlinux.com/support"

journalctl output is attached to this comment.

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

Feb 25 07:55:55 cl.support.iwx.rocks systemd[1]: var-lib-snapd-snap.mount: Directory /var/lib/snapd/snap to mount over is not empty, mounting anyway.

The snapd package does not ship such mount unit, so it's unclear what it really does, but if it mounts something else over /var/lib/snapd/snap that would explain why some snaps appear broken.

Changed in snapd:
status: Incomplete → Invalid
Revision history for this message
Brandon Locke (brandonllocke) wrote :

[root@cl ~]# cat var-lib-snapd-snap.mount
# Ensure that snap mount directory is mounted "shared" so snaps can be refreshed correctly (LP: #1668759).
[Unit]
Description=Ensure that the snap directory shares mount events.
[Mount]
What=/var/lib/snapd/snap
Where=/var/lib/snapd/snap
Type=none
Options=bind,shared
[Install]
WantedBy=local-fs.target

This file was not installed by any of us, it references what appears to be a LaunchPad issue, though I cannot find anything here related to that. It is recreated each time an update occurs, so simply removing the file is not an option.

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

Ok, so it's different then. This is a mount unit generated by snap-generator which detected that the / isn't mounted with shared propagation. In this case, it genrates a mount unit for systemd, which will become an implicit dependency of all mounts under /var/lib/snapd/snap. AFAIU this should only happen in containers, such as LXD. So each time snapd starts a mount unit of a snap, systemd is exepcted to ensure that the var-lib-snapd-snap.mount unit was already started. In your case it appears to be started too late?

Also, I'm looking at vanilla CentOS 7:
google:centos-7-64 .../mini/hello# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
google:centos-7-64 .../mini/hello# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

It appears that / is mounted with shared propagation:
google:centos-7-64 .../mini/hello# findmnt -o +PROPAGATION /
TARGET SOURCE FSTYPE OPTIONS PROPAGATION
/ /dev/sda2 xfs rw,relatime,seclabel,attr2,inode64,noquota shared

So it isn't clear why it's not the same on CloudLinux. What is the version of systemd? Does CloudLinux rebuild packages from CentOS/RHEL7 src.rpms? Are they up to date?

Revision history for this message
Brandon Locke (brandonllocke) wrote :

It certainly seems to be related to that propagation type.

Systemd versions look to be the same. CloudLinux does look to be repackaging upstream packages, but I'm not seeing any that are out of date (though obviously checking them all is a huge task).

The propagation type is different between the server I can reproduce on and the server that I cannot reproduce on:

[root@cl ~]# findmnt -o +PROPAGATION /
TARGET SOURCE FSTYPE OPTIONS PROPAGATION
/ /dev/vda3 ext4 rw,relatime,quota,usrquota,grpquota,data=ordered private

[root@snapd ~]# findmnt -o +PROPAGATION /
TARGET SOURCE FSTYPE OPTIONS PROPAGATION
/ /dev/vda3 ext4 rw,relatime,quota,usrquota,grpquota,data=ordered shared

Also, if I change the PROPAGATION type to shared with "mount --make-shared /" and run through the reproduction steps again, the issue does not occur. This lines up with what you mentioned. If snapd doesn't think that it can appropriately handle the mount, it's probably going to try and remount it. It seems like at this point, that is when everything breaks. I've reached back out to CloudLinux to ask if this PROPAGATION type is something they are setting themselves.

Revision history for this message
Brandon Locke (brandonllocke) wrote :

I've run some more tests and I'm now 99% sure that this is simply CloudLinux setting a setting that disrupts/breaks snapd. Unfortunately, as I believe a heavy portion of their developers are in Ukraine, I'm not sure when I may hear back from them regarding any possible mitigation/options on their side. It sounds like snapd and CloudLinux may just be incompatible.

I appreciate your troubleshooting assistance, I don't think we would have been able to trace the reason for this without your inside knowledge. This bug report can be closed (though being marked "Invalid" may already do that?)

Thanks again,

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.