slurm sbatch command fails

Bug #271518 reported by gs
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
slurm-llnl (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: slurm-llnl

version used is ubuntu 8.04 (hardy), architecture is amd64
source package is https://launchpad.net/ubuntu/+source/slurm-llnl/1.2.20-1

$sbatch jobscript
leads to the following error (reported in slurm-jobnumber.out)

slurmd[hostname]: error: execve(): /var/run/slurm-llnl/slurmd/job00007/script: Permission denied

using srun,
$srun jobscript
the job runs fine if the executable bit is set for jobscript.
If the executable bit is not set for jobscript, the job fails with the error

slurmd[hostname]: error: execve(): jobscript: Permission denied

The executable bit of jobscript has no influence on whether sbatch fails. It looks to me that
/var/run/slurm-llnl/slurmd/job00007/script gets created without the executable bit set leading to the error.

Revision history for this message
moe jette (jette1) wrote :

The file is always created to have read/execute permission for the owner.
Here is an excerpt of the code from
_make_batch_script(batch_job_launch_msg_t *msg, char *path)
in src/slurmd/slurmstepd/mgr.c:

        if (chown(script, (uid_t) msg->uid, (gid_t) -1) < 0) {
                error("chown(%s): %m", path);
                goto error;
        }

        if (chmod(script, 0500) < 0) {
                error("chmod: %m");
        }

Is your slurmd daemon running as user root or the person running the job?
Are your uid numbers consistent across the cluster?
Take a look in the slurmd log file (located at
"scontrol show config | grep SlurmdLog").

Revision history for this message
gs (gs-orst) wrote :
Download full text (5.8 KiB)

slurmd runs as root. I did not modify the installations and I used a slurm config file created at the slurm website.
The same setup works for me on debian and (as I tried recently) on Ubuntu Intrepid Ibex alpha as well.

Here is some details, but this bug is specific to Ubuntu Hardy Heron.

# scontrol show config | grep SlurmdLog
SlurmdLogFile = (null)

even so the logfile exists:
# ls /var/run/slurm-llnl/
slurmd slurmd.log slurmd.pid

Here is the SlurmdLog of a failed job:

[Oct 23 11:43:11] setup for a batch_job
[Oct 23 11:43:11] entering batch_job_create
[Oct 23 11:43:11] [412] Message thread started pid = 21069
[Oct 23 11:43:11] [412] eio: handling events for 1 objects
[Oct 23 11:43:11] [412] Called _msg_socket_readable
[Oct 23 11:43:11] debug3: _rpc_batch_job: return from _forkexec_slurmstepd
[Oct 23 11:43:11] [412] Entered job_manager for 412.4294967294 pid=21069
[Oct 23 11:43:11] [412] alloc LLLP
[Oct 23 11:43:11] [412] task affinity plugin loaded
[Oct 23 11:43:11] [412] mpi type = (null)
[Oct 23 11:43:11] [412] Entering _setup_normal_io
[Oct 23 11:43:11] [412] eio: handling events for 1 objects
[Oct 23 11:43:11] [412] Called _msg_socket_readable
[Oct 23 11:43:11] [412] Uncached user/gid: gs/100
[Oct 23 11:43:11] [412] eio: handling events for 1 objects
[Oct 23 11:43:11] [412] Called _msg_socket_readable
[Oct 23 11:43:11] [412] stdin file name = /dev/null
[Oct 23 11:43:11] [412] stdout file name = /home/gs/calc/tubes/cnt-4.0/tt1/slurm-412.out
[Oct 23 11:43:11] [412] stderr file name = /home/gs/calc/tubes/cnt-4.0/tt1/slurm-412.out
[Oct 23 11:43:11] [412] eio: handling events for 1 objects
[Oct 23 11:43:11] [412] Called _msg_socket_readable
[Oct 23 11:43:11] [412] eio: handling events for 1 objects
[Oct 23 11:43:11] [412] Called _msg_socket_readable
[Oct 23 11:43:11] [412] Leaving _setup_normal_io
[Oct 23 11:43:11] [412] debug level = 2
[Oct 23 11:43:11] [412] Before call to spank_init()
[Oct 23 11:43:11] [412] spank: opening plugin stack /etc/slurm-llnl/plugstack.conf
[Oct 23 11:43:11] [412] After call to spank_init()
[Oct 23 11:43:11] [412] num tasks on this node = 1
[Oct 23 11:43:11] [412] New fdpair[0] = 12, fdpair[1] = 13
[Oct 23 11:43:11] [412] eio: handling events for 1 objects
[Oct 23 11:43:11] [412] Called _msg_socket_readable
[Oct 23 11:43:11] [412] Uncached user/gid: gs/100
[Oct 23 11:43:11] [412] eio: handling events for 1 objects
[Oct 23 11:43:11] [412] Called _msg_socket_readable
[Oct 23 11:43:11] [412] Couldn't find SLURM_RLIMIT_CPU in environment
[Oct 23 11:43:11] [412] Couldn't find SLURM_RLIMIT_FSIZE in environment
[Oct 23 11:43:11] [412] Couldn't find SLURM_RLIMIT_DATA in environment
[Oct 23 11:43:11] [412] Couldn't find SLURM_RLIMIT_STACK in environment
[Oct 23 11:43:11] [412] Couldn't find SLURM_RLIMIT_CORE in environment
[Oct 23 11:43:11] [412] Couldn't find SLURM_RLIMIT_RSS in environment
[Oct 23 11:43:11] [412] Couldn't find SLURM_RLIMIT_NPROC in environment
[Oct 23 11:43:11] [412] Couldn't find SLURM_RLIMIT_NOFILE in environment
[Oct 23 11:43:11] [412] Couldn't find SLURM_RLIMIT_MEMLOCK in environment
[Oct 23 11:43:11] [412] Couldn't find SLURM_RLIMIT_AS in environment
[Oct 23 11:43:11] [412] task ...

Read more...

Revision history for this message
Christian Hudon (chrish) wrote :

This is a side-effect of bug #329225. The slurm spool dir is under /var/run, which is mounted noexec (and is also lost on reboot, small detail...). If you want a workaround, you can apply the instructions in said bug report (basically, move /var/run/slurm-llnl to /var/lib).

Revision history for this message
Fabrice Coutadeur (fabricesp) wrote :

Fixed as bug #329225 has also been fixed in Lucid, with version 2.0.5

Changed in slurm-llnl (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.