slurm stores cluster state in /var/run, which is a tmpfs and lost on reboot!
Bug #329225 reported by
Christian Hudon
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
slurm-llnl (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: slurm-llnl
In the default configuration of the .deb for slurm-llnl, the config file keys SlurmdSpoolDir and SaveStateLocation are stored under /var/run/slurm-llnl (in subdirectories slurmd and slurmctl respectively). But /var/run is a really bad location for this directories, as it is a tmpfs whose contents are lost when the power goes down! These directories should be moved under /var/lib/
To post a comment you must log in.
Hi,
Since the version in lucid (2.0.5-1), it's supposed to be fixed:
* init.d scripts create run-time variable data directories (/var/run)
* slurm-llnl.init.d checks if StateSaveLocation SlurmdSpoolDir are
under /var/run and link them to the actual location under /var/lib
If it's not the case, please reopen this bug report by changing the status back to new.
Thanks,
Fabrice