watchdog daemon going in to failed state on reboot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
watchdog (Debian) |
New
|
Undecided
|
Unassigned | ||
watchdog (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
I was testing the watchdog daemon code on the development version of 16.04 today and found the associated systemd service files for this has some bugs.
The first of these was a typo in /lib/systemd/
However, I have not found the reason for the second problem where the daemon has some fault at reboot time and goes in to "failed state" and then it will not restart with the machine booting. The typical related syslog entries are:
Jan 19 16:46:08 ubuntu watchdog[2066]: stopping daemon (5.14)
Jan 19 16:46:08 ubuntu systemd[1]: Stopping watchdog daemon...
Jan 19 16:46:09 ubuntu systemd[1]: watchdog.service: Control process exited, code=exited status=1
Jan 19 16:46:09 ubuntu systemd[1]: Stopped watchdog daemon.
Jan 19 16:46:09 ubuntu systemd[1]: watchdog.service: Unit entered failed state.
Jan 19 16:46:09 ubuntu systemd[1]: watchdog.service: Triggering OnFailure= dependencies.
Jan 19 16:46:09 ubuntu systemd[1]: watchdog.service: Failed to enqueue OnFailure= job: Resource deadlock avoided
Jan 19 16:46:09 ubuntu systemd[1]: watchdog.service: Failed with result 'exit-code'.
I am guessing this is related to the shut-down approach for the watchdog daemon where normally the wd_keepalive daemon is started afterwards (to prevent a reboot if the hardware module is configured for "no way out" so the timer cannot be stopped).
$ lsb_release -rd
Description: Ubuntu Xenial Xerus (development branch)
Release: 16.04
$ apt-cache policy watchdog
watchdog:
Installed: 5.14-3
Candidate: 5.14-3
Version table:
*** 5.14-3 500
500 http://
100 /var/lib/
I have contacted the watchdog project maintainer with a view to working out a solution to this. This bug report is more of a marker to let folks know that there is an issue here if you plan on having high-availability servers based on Ubuntu 16.04 (well, any systemd based system really..) where watchdog-based fault recovery is an expectation.
affects: | rsyslog (Ubuntu) → watchdog (Ubuntu) |
Just to add this is only a problem on shut down / reboot, you can manually stop and start the daemon without problems using:
service watchdog start
service watchdog stop