"snap set system watchdog.*" settings aren't applied immediately

Bug #1854694 reported by Robert Liu on 2019-12-02
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
High
Samuele Pedroni

Bug Description

When setting/updating the hardware watchdog timer through "snap set system watchdog.[runtime-timeout|shutdown-timeout]", the new configurations aren't applied immediately. Users should reboot the system or run 'systemctl daemon-reexec' manually to apply.

"watchdog.[runtime-timeout|shutdown-timeout]" reflect to RuntimeWatchdogUSec and ShutdownWatchdogUSec of systemd-system.conf. Somehow, 'systemctl daemon-reload' doesn't reload the values.

Robert Liu (robertliu) on 2019-12-02
tags: added: original-1853401
Paweł Stołowski (stolowski) wrote :

Hi Robert, I'm not entirely familiar with semantics of watchdog in systemd, but after a quick scan of the code I can confirm that:
- we regenerate 10-snapd-watchdog.conf from core watchdog.runtime-timeout and shutdown-timeout (on any change to the value), however we don't trigger any restart/reload (just write the file for systemd).
- on snapd startup we read WATCHDOG_USEC interval and keep notyfing systemd according to it, but since we don't restart, we will not pick up any new value that systemd may give us.

Does this make sense and correctly reflects the problem?

Changed in snapd:
status: New → Triaged
importance: Undecided → High
Changed in snapd:
importance: High → Medium
summary: - Watchdog settings don't be applied immediately
+ Watchdog settings aren't applied immediately

Hi Pawel,
Thanks for the information.
I also don't know if systemd notifies the watchdog itself. If it does, I think it would be better that snapd could applied the settings immediately.

Robert Liu (robertliu) wrote :

Hi Pawel,

I've checked both systemd and snapd. Here are my findings, but I'm not sure if I missed something.
1. systemd will notify the hardware watchdog if we set the "RuntimeWatchdogSec"[1]
2. Systemd parses the .service file of snapd and pass the value through the WATCHDOG_USEC env variable. By default, the value is 5 minutes. Systemd will restart snapd, if snapd doesn't update the software watchdog within the period.

So, I think it's should be fine that using "systemctl daemon-reexec" to apply the watchdog settings immediately after updating.

[1] https://git.launchpad.net/ubuntu/+source/systemd/tree/src/core/manager.c?h=ubuntu/xenial-updates#n2019

Changed in snapd:
assignee: nobody → Samuele Pedroni (pedronis)
Michael Vogt (mvo) on 2020-03-12
Changed in snapd:
importance: Medium → High
Samuele Pedroni (pedronis) wrote :

I'm missing the relation between the runtime watchdog and snapd service watchdog ?

Also why isn't daemon-reload enough? skimming the code it seems maybe it should be?

Robert Liu (robertliu) wrote :

@pedronis,

I've updated the title and description. I think it would more clear now. The issue is happening on RuntimeWatchdogUSec and ShutdownWatchdogUSec of systemd-system.conf, but not WatchdogSec of systemd.service.

description: updated
summary: - Watchdog settings aren't applied immediately
+ "snap set system watchdog.*" settings aren't applied immediately
Samuele Pedroni (pedronis) wrote :

I can confirm at least with systemd 237, daemon-reload alone don't work.

I wonder if that's true as well for newer systemd versions?

Interestingly the doc says that's there should be almost no reason to use daemon-reexec except debugging, but here we are.

In principle we could make the watchdog code in configcore use daemon-reexec with similar code paths/rules as daemon-reload.

Changed in snapd:
status: Triaged → In Progress
Samuele Pedroni (pedronis) wrote :

I proposed:

https://github.com/snapcore/snapd/pull/8436

this will need to be tested somehow on a system with a real hardware watchdog to be properly validated. I'll report when it's in edge.

Michael Vogt (mvo) wrote :

The PR has landed now and will be in the "edge" channel for core tomorrow and in the "snapd" snap edge channel in ~1h. Please test it on the relevant hardware. It does work in our spread testing but we want to make sure it also works for the real HW.

Robert Liu (robertliu) wrote :

Verified on a AMD64 platform. The watchdog timer settings are all applied immediately.

[snaps]
$ snap list
Name Version Rev Tracking Publisher Notes
customized-gadget 18-1.1 1 latest/stable ****** gadget
customized-kernel 4.15.0-1003 1 latest/stable ****** kernel
core18 20200427 1754 latest/stable canonical✓ base
snapd 2.45~pre2+git1841.gd17b92f 7695 latest/edge canonical✓ snapd

[verification]
1. snapd has set the watchdog timer settings carried by the gadget snap properly at first-boot
  $ sudo snap get system -d
  {
   "seed": {
    "loaded": true
   },
   "watchdog": {
    "runtime-timeout": "1m",
    "shutdown-timeout": "10m"
   }
  }

2. The systemd has corresponding watchdog timer settings
  $ sudo systemctl show |grep -i watchdog
  RuntimeWatchdogUSec=1min
  ShutdownWatchdogUSec=10min
  ServiceWatchdogs=yes

3. I saw the systemd said it was re-executing in the journal log
$ sudo journalctl -b |grep -i reexec -A3
May 08 09:13:04 localhost systemd[1]: Reexecuting.
May 08 09:13:04 localhost kernel: systemd: 65 output lines suppressed due to ratelimiting
May 08 09:13:04 localhost systemd[1]: systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
May 08 09:13:04 localhost systemd[1]: Detected architecture x86-64.

Changed in snapd:
status: In Progress → Fix Committed
Zygmunt Krynicki (zyga) wrote :

I'm targeting this, conservatively, to 2.46, in absence of more accurate information.

Changed in snapd:
milestone: none → 2.46
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers