Slurmd will always fail with PIDFile set on systemd

Bug #1959309 reported by Alexandre Otto Strube
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
slurm-llnl (Ubuntu)
New
Undecided
Unassigned

Bug Description

I realized that slurmd always shows as failed on ubuntu with a

"Jan 23 18:03:05 c1-compute-1.wehi.edu.au systemd[1]: Can't open PID file /var/run/slurm/slurmd.pid (yet?) after start: No such file or directory"

According to https://bugs.schedmd.com/show_bug.cgi?id=8388#c1 ,

This is happening because we create the PID file slightly after systemd tries to read it. Commands where systemd needs to know the PID (eg systemctl restart slurmd.service) it will re-read the file (which appears to be getting created properly). From a functional standpoint, this error shouldn't have any impact on systemd or slurm.

The solution, is to remove the PIDFile line on slurmd.service:, according to upstream https://bugs.schedmd.com/show_bug.cgi?id=8388#c3 :

The quickest workaround you could use is to just comment out "PIDFile=*" line in the unit file and do a daemon-reload. instead of reading the pid file we write out, it will "guess" the main pid (and in my tests does so correctly).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.