Comment 2 for bug 1432837

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Fixing typo from previous comment:

I developed a small tool based on inotify to help users to check if their watchdog is being used.

Anyone can find instructions on how to run it here:

https://github.com/inaddy/notifymydog

Small Example:

inaddy@host:~$ wget https://raw.githubusercontent.com/inaddy/notifymydog/master/notifymydog.c
inaddy@host:~/notifymydog$ gcc -Wall -D_DEBUG=0 -D_SYSLOG=1 notifymydog.c -o notifymydog
inaddy@host:~/notifymydog$ sudo ./notifymydog &
inaddy@host:~$ sudo tail -f /var/log/syslog
Mar 16 17:36:26 inaddygueto WATCHMYDOG[15766]: OK: WATCHDOG UPDATED
Mar 16 17:36:40 inaddygueto WATCHMYDOG[15766]: OK: WATCHDOG UPDATED
Mar 16 17:36:44 inaddygueto WATCHMYDOG[15766]: WARNING: WATCHDOG WAS CLOSED
Mar 16 17:36:49 inaddygueto WATCHMYDOG[15766]: WARNING: WATCHDOG WAS OPENED

So if you ever got a kernel panic on a HP Proliant Server DL360 and/or DL380 with no apparent reason and the stack trace shows NMIs generate, confirm if none of your userland programs have opened /dev/watchdog on purpose (not updating it frequent enough) and by accident (causing the watchdog HW to be triggered and panic'ing the machine after some time).

Workaround:

# echo "blacklist hpwdt" >> /etc/modprobe.d/blacklist-hp.conf
# update-initramfs -k all -u
# update-grub
# reboot