Etcd Charm

etcd shows update-status hook errors after host reboot

Bug #1934108 reported by Przemyslaw Hausman on 2021-06-30

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Etcd Charm	Confirmed	Undecided	Unassigned
	Etcd Snaps	New	Undecided	Unassigned

Bug Description

I have rebooted the host machine and now etcd unit is stuck in error state:

juju debug-log:

unit-etcd-1: 09:03:00 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-etcd-1: 09:03:01 WARNING unit.etcd/1.start cannot perform operation: mount --rbind /dev /tmp/snap.rootfs_hHlR10//dev: No such file or directory
unit-etcd-1: 09:03:01 WARNING unit.etcd/1.start cannot perform operation: mount --rbind /dev /tmp/snap.rootfs_sFhD0d//dev: No such file or directory
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start cannot perform operation: mount --rbind /dev /tmp/snap.rootfs_1oHvMf//dev: No such file or directory
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start cannot perform operation: mount --rbind /dev /tmp/snap.rootfs_Rzv1kQ//dev: No such file or directory
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start Traceback (most recent call last):
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/juju/agents/unit-etcd-1/charm/hooks/start", line 22, in <module>
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start main()
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start bus.dispatch(restricted=restricted_mode)
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start _invoke(other_handlers)
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start handler.invoke()
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/juju/agents/unit-etcd-1/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start self._action(*args)
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/var/lib/juju/agents/unit-etcd-1/charm/reactive/etcd.py", line 279, in send_cluster_connection_details
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start db.set_connection_string(connection_string, version=etcdctl.version())
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "lib/etcdctl.py", line 193, in version
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start out = check_output(
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start File "/usr/lib/python3.8/subprocess.py", line 512, in run
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start raise CalledProcessError(retcode, process.args,
unit-etcd-1: 09:03:02 WARNING unit.etcd/1.start subprocess.CalledProcessError: Command '['/snap/bin/etcd.etcdctl', 'version']' returned non-zero exit status 1.
unit-etcd-1: 09:03:02 ERROR juju.worker.uniter.operation hook "start" (via explicit, bespoke hook script) failed: exit status 1
unit-etcd-1: 09:03:02 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-etcd-1: 09:03:29 INFO juju.worker.uniter awaiting error resolution for "start" hook

I have 3 etcd units deployed in total. Only one unit is in error state. Etcd units are deployed in lxd containers.

etcd charm revision: 594

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2022-01-07:

For further info, I'm seeing this on a couple units of etcd as well.

running the command manually, you see this error

root@juju-f98bb9-2-lxd-1:/sys/fs/cgroup/freezer# etcd.etcdctl version
cannot open cgroup hierarchy /sys/fs/cgroup/freezer: No such file or directory

But oddly, the cgroup exists and should be readable, but may not be available due to snap confinement. I'd guess that cgroups got a new plug in upstream snapd, hence the effect taking place after restart. It seems that the issue is the charm's attempt to run etcdctl version command, but that etcd itself is running and functioning.

root@juju-f98bb9-2-lxd-1:/sys/fs/cgroup/freezer# find -ls
0 drwxrwxr-x 4 nobody root 0 Jul 20 20:25 .
0 -rw-rw-r-- 1 nobody root 0 Oct 1 21:03 ./cgroup.procs
0 -r--r--r-- 1 nobody nogroup 0 Jan 7 23:31 ./freezer.self_freezing
0 drwxr-xr-x 2 root root 0 Jul 20 20:25 ./snap.etcd
0 -rw-r--r-- 1 root root 0 Jul 20 20:25 ./snap.etcd/cgroup.procs
0 -r--r--r-- 1 root root 0 Jul 20 20:25 ./snap.etcd/freezer.self_freezing
0 -rw-r--r-- 1 root root 0 Jul 20 20:25 ./snap.etcd/tasks
0 -r--r--r-- 1 root root 0 Jul 20 20:25 ./snap.etcd/freezer.parent_freezing
0 -rw-r--r-- 1 root root 0 Dec 20 00:00 ./snap.etcd/freezer.state
0 -rw-r--r-- 1 root root 0 Jul 20 20:25 ./snap.etcd/notify_on_release
0 -rw-r--r-- 1 root root 0 Jul 20 20:25 ./snap.etcd/cgroup.clone_children
0 -rw-rw-r-- 1 nobody root 0 Jul 20 20:22 ./tasks
0 -r--r--r-- 1 nobody nogroup 0 Jan 7 23:31 ./freezer.parent_freezing
0 -rw-r--r-- 1 nobody nogroup 0 Jan 7 23:31 ./freezer.state
0 drwxr-xr-x 2 root root 0 Jul 20 20:23 ./snap.lxd
0 -rw-r--r-- 1 root root 0 Jul 20 20:23 ./snap.lxd/cgroup.procs
0 -r--r--r-- 1 root root 0 Jul 20 20:23 ./snap.lxd/freezer.self_freezing
0 -rw-r--r-- 1 root root 0 Jul 20 20:23 ./snap.lxd/tasks
0 -r--r--r-- 1 root root 0 Jul 20 20:23 ./snap.lxd/freezer.parent_freezing
0 -rw-r--r-- 1 root root 0 Dec 20 00:00 ./snap.lxd/freezer.state
0 -rw-r--r-- 1 root root 0 Jul 20 20:23 ./snap.lxd/notify_on_release
0 -rw-r--r-- 1 root root 0 Jul 20 20:23 ./snap.lxd/cgroup.clone_children
0 -rw-r--r-- 1 nobody nogroup 0 Jan 7 23:31 ./notify_on_release
0 -rw-r--r-- 1 nobody nogroup 0 Jan 7 23:31 ./cgroup.clone_children

For further info, I'm seeing this on a couple units of etcd as well.

running the command manually, you see this error

root@juju-f98bb9-2-lxd-1:/sys/fs/cgroup/freezer# etcd.etcdctl version
cannot open cgroup hierarchy /sys/fs/cgroup/freezer: No such file or directory

But oddly, the cgroup exists and should be readable, but may not be available due to snap confinement.  I'd guess that cgroups got a new plug in upstream snapd, hence the effect taking place after restart.  It seems that the issue is the charm's attempt to run etcdctl version command, but that etcd itself is running and functioning.

root@juju-f98bb9-2-lxd-1:/sys/fs/cgroup/freezer# find -ls
       32      0 drwxrwxr-x   4 nobody   root            0 Jul 20 20:25 .
       33      0 -rw-rw-r--   1 nobody   root            0 Oct  1 21:03 ./cgroup.procs
       38      0 -r--r--r--   1 nobody   nogroup         0 Jan  7 23:31 ./freezer.self_freezing
      120      0 drwxr-xr-x   2 root     root            0 Jul 20 20:25 ./snap.etcd
      121      0 -rw-r--r--   1 root     root            0 Jul 20 20:25 ./snap.etcd/cgroup.procs
      126      0 -r--r--r--   1 root     root            0 Jul 20 20:25 ./snap.etcd/freezer.self_freezing
      123      0 -rw-r--r--   1 root     root            0 Jul 20 20:25 ./snap.etcd/tasks
      127      0 -r--r--r--   1 root     root            0 Jul 20 20:25 ./snap.etcd/freezer.parent_freezing
      125      0 -rw-r--r--   1 root     root            0 Dec 20 00:00 ./snap.etcd/freezer.state
      124      0 -rw-r--r--   1 root     root            0 Jul 20 20:25 ./snap.etcd/notify_on_release
      122      0 -rw-r--r--   1 root     root            0 Jul 20 20:25 ./snap.etcd/cgroup.clone_children
       35      0 -rw-rw-r--   1 nobody   root            0 Jul 20 20:22 ./tasks
       39      0 -r--r--r--   1 nobody   nogroup         0 Jan  7 23:31 ./freezer.parent_freezing
       37      0 -rw-r--r--   1 nobody   nogroup         0 Jan  7 23:31 ./freezer.state
       88      0 drwxr-xr-x   2 root     root            0 Jul 20 20:23 ./snap.lxd
       89      0 -rw-r--r--   1 root     root            0 Jul 20 20:23 ./snap.lxd/cgroup.procs
       94      0 -r--r--r--   1 root     root            0 Jul 20 20:23 ./snap.lxd/freezer.self_freezing
       91      0 -rw-r--r--   1 root     root            0 Jul 20 20:23 ./snap.lxd/tasks
       95      0 -r--r--r--   1 root     root            0 Jul 20 20:23 ./snap.lxd/freezer.parent_freezing
       93      0 -rw-r--r--   1 root     root            0 Dec 20 00:00 ./snap.lxd/freezer.state
       92      0 -rw-r--r--   1 root     root            0 Jul 20 20:23 ./snap.lxd/notify_on_release
       90      0 -rw-r--r--   1 root     root            0 Jul 20 20:23 ./snap.lxd/cgroup.clone_children
       36      0 -rw-r--r--   1 nobody   nogroup         0 Jan  7 23:31 ./notify_on_release
       34      0 -rw-r--r--   1 nobody   nogroup         0 Jan  7 23:31 ./cgroup.clone_children

Changed in charm-etcd:
status:	New → Confirmed
summary:	- etcd stuck in error state after host reboot + etcd shows update-status hook errors after host reboot

Revision history for this message

George Kraft (cynerva) wrote on 2022-01-10:

The cgroup/freezer error is a different bug, being tracked here: https://bugs.launchpad.net/bugs/1933128

Please see the last few comments of that bug for potential workarounds.

I think these are two different bugs. The debug-log output from this bug's description clearly shows the command failing with:

mount --rbind /dev /tmp/snap.rootfs_hHlR10//dev: No such file or directory

Not the freezer cgroup thing.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.