inaddy@host:~$ wget https://raw.githubusercontent.com/inaddy/notifymydog/master/notifymydog.c
inaddy@host:~/notifymydog$ gcc -Wall -D_DEBUG=0 -D_SYSLOG=1 notifymydog.c -o notifymydog
inaddy@host:~/notifymydog$ sudo ./notifymydog &
inaddy@host:~$ sudo tail -f /var/log/syslog
Mar 16 17:36:26 inaddygueto WATCHMYDOG[15766]: OK: WATCHDOG UPDATED
Mar 16 17:36:40 inaddygueto WATCHMYDOG[15766]: OK: WATCHDOG UPDATED
Mar 16 17:36:44 inaddygueto WATCHMYDOG[15766]: WARNING: WATCHDOG WAS CLOSED
Mar 16 17:36:49 inaddygueto WATCHMYDOG[15766]: WARNING: WATCHDOG WAS OPENED
So if you ever got a kernel panic on a HP Proliant Server DL360 and/or DL380 with no apparent reason and the stack trace shows NMIs generate, confirm if none of your userland programs have opened /dev/watchdog on purpose (not updating it frequent enough) and by accident (causing the watchdog HW to be triggered and panic'ing the machine after some time).
Fixing typo from previous comment:
I developed a small tool based on inotify to help users to check if their watchdog is being used.
Anyone can find instructions on how to run it here:
https:/ /github. com/inaddy/ notifymydog
Small Example:
inaddy@host:~$ wget https:/ /raw.githubuser content. com/inaddy/ notifymydog/ master/ notifymydog. c host:~/ notifymydog$ gcc -Wall -D_DEBUG=0 -D_SYSLOG=1 notifymydog.c -o notifymydog host:~/ notifymydog$ sudo ./notifymydog &
inaddy@
inaddy@
inaddy@host:~$ sudo tail -f /var/log/syslog
Mar 16 17:36:26 inaddygueto WATCHMYDOG[15766]: OK: WATCHDOG UPDATED
Mar 16 17:36:40 inaddygueto WATCHMYDOG[15766]: OK: WATCHDOG UPDATED
Mar 16 17:36:44 inaddygueto WATCHMYDOG[15766]: WARNING: WATCHDOG WAS CLOSED
Mar 16 17:36:49 inaddygueto WATCHMYDOG[15766]: WARNING: WATCHDOG WAS OPENED
So if you ever got a kernel panic on a HP Proliant Server DL360 and/or DL380 with no apparent reason and the stack trace shows NMIs generate, confirm if none of your userland programs have opened /dev/watchdog on purpose (not updating it frequent enough) and by accident (causing the watchdog HW to be triggered and panic'ing the machine after some time).
Workaround:
# echo "blacklist hpwdt" >> /etc/modprobe. d/blacklist- hp.conf
# update-initramfs -k all -u
# update-grub
# reboot