Etcd fails to mount snap rootfs

Bug #1917666 reported by Sérgio Manso
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Expired
Undecided
Unassigned

Bug Description

Etcd fails to mount snap rootfs after a power outage in the host machine.

Env:
cs:etcd-546
Ubuntu 20.04
(OpenStack deployment - focal)

Logs:

2021-03-03 17:58:49 DEBUG leader-settings-changed cannot perform operation: mount --rbind /dev /tmp/snap.rootfs_XhbLr6//dev: No such file or directory
2021-03-03 17:58:49 ERROR juju-log ['/snap/bin/etcd.etcdctl', 'cluster-health']
2021-03-03 17:58:49 ERROR juju-log {'ETCDCTL_API': '2', 'ETCDCTL_CA_FILE': '/var/snap/etcd/common/ca.crt', 'ETCDCTL_CERT_FILE': '/var/snap/etcd/common/server.crt', 'ETCDCTL_KEY_FILE': '/var/snap/etcd/common/server.key'}
2021-03-03 17:58:49 ERROR juju-log b''
2021-03-03 17:58:49 ERROR juju-log None
2021-03-03 17:58:49 WARNING juju-log Notice: Unit failed cluster-health check
2021-03-03 17:58:49 DEBUG leader-settings-changed cannot perform operation: mount --rbind /dev /tmp/snap.rootfs_MRc5RU//dev: No such file or directory
2021-03-03 17:58:49 ERROR juju-log ['/snap/bin/etcd.etcdctl', 'member', 'list']
2021-03-03 17:58:49 ERROR juju-log {'ETCDCTL_API': '2', 'ETCDCTL_CA_FILE': '/var/snap/etcd/common/ca.crt', 'ETCDCTL_CERT_FILE': '/var/snap/etcd/common/server.crt', 'ETCDCTL_KEY_FILE': '/var/snap/etcd/common/server.key'}
2021-03-03 17:58:49 ERROR juju-log b''
2021-03-03 17:58:50 ERROR juju-log None
2021-03-03 17:58:50 INFO juju-log Invoking reactive handler: reactive/etcd.py:118:set_app_version
2021-03-03 17:58:50 DEBUG leader-settings-changed cannot perform operation: mount --rbind /dev /tmp/snap.rootfs_BHJ9ej//dev: No such file or directory
2021-03-03 17:58:50 ERROR juju-log Failed to get etcd version:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-etcd-0/charm/reactive/etcd.py", line 858, in etcd_version
    raw_output = check_output(
  File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/snap/bin/etcd.etcdctl', 'version']' returned non-zero exit status 1.

Workaround:
reboot the etcd unit.

Crashdump can be found here:
https://private-fileshare.canonical.com/~sergiomanso/juju-crashdump-a41f8d4b-e067-4869-8ef2-be2108e5b50c_etcd.tar.xz

summary: - Etcd fails to mount file system
+ Etcd fails to mount snap rootfs
Revision history for this message
George Kraft (cynerva) wrote :

Thanks for the report. Yep, I can definitely see in the crashdump that unit etcd/0, located on machine 0/lxd/9, started logging those errors immediately after machine 0 rebooted.

Revision history for this message
George Kraft (cynerva) wrote :

Adding snapd to the issue. Can y'all shed any light as to what happened here?

The charm was trying to call /snap/bin/etcd.etcdctl, but it failed repeatedly with:

cannot perform operation: mount --rbind /dev /tmp/snap.rootfs_XhbLr6//dev: No such file or directory

Note that this was in an LXD container and the issue started occurring after the host machine rebooted. It was fixed by a reboot of the container.

Revision history for this message
Sérgio Manso (sergiomanso) wrote :

New update: This appened again but this time no reboot was performed neither in the container or in the host machine.

Revision history for this message
Samuele Pedroni (pedronis) wrote :

it seems that /dev is absent/goes missing in some circumstances?

Revision history for this message
Sérgio Manso (sergiomanso) wrote :

I saw this same behavior happening in a different application (glance-simplestreams-sync).
I'm attanching a crashdump nad removing Etcd Charm from the bug.

no longer affects: charm-etcd
Revision history for this message
Sergio Cazzolato (sergio-j-cazzolato) wrote :

Could you please confirm if it is still reproduced?

Changed in snapd:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for snapd because there has been no activity for 60 days.]

Changed in snapd:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.