slurm-llnl cannot execve scripts

Bug #384926 reported by sysadmn
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
slurm-llnl (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: slurm-llnl

Description: Ubuntu 9.04
Release: 9.04
slurm-llnl:
  Installed: 1.3.13-1
  Candidate: 1.3.13-1
  Version table:
 *** 1.3.13-1 0
        500 http://us.archive.ubuntu.com jaunty/universe Packages
        100 /var/lib/dpkg/status

Any attempt to run a job gives the following slurm-nn.out:
slurmd[hostname]: execve(): /var/lib/slurm-llnl/slurmd/job00002/slurm_script: Bad address

It appears slurmctld creates a "job.nn" directory, but slurmstepd looks for a "jobs000nn" directory:
# strings /usr/sbin/slurmctld | grep /job
/job.%d
/job.%d/script
/job.%d/environment
/job_state.old
/job_state
/job_state.new
/job.%u
# strings /usr/sbin/slurmctld | grep /job
/job.%d
/job.%d/script
/job.%d/environment
/job_state.old
/job_state
/job_state.new
/job.%u

This occurs regardless of the value for SpoolDir in the config file.

Revision history for this message
sysadmn (paul-joslin) wrote :

Sorry, last command should have been:

# strings /usr/sbin/slurmstepd | grep /job
%s/job%05u
%s/job%05u.%05u

Revision history for this message
moe jette (jette1) wrote :

This problem is fixed in Slurm version 2.0.4 or you can apply the patch below.
The problem is caused by the sbatch command reading input from stdin and
failing to NULL terminate the argv argument to the execve() function. As a
work-around you can supply a batch script file as an argument to sbatch
rather than as stdin (e.g. "sbatch my.script"). Here's the patch:

Index: src/slurmd/slurmstepd/slurmstepd_job.c
===================================================================
--- src/slurmd/slurmstepd/slurmstepd_job.c (revision 18091)
+++ src/slurmd/slurmstepd/slurmstepd_job.c (revision 18092)
@@ -382,7 +382,7 @@
                /* job script has not yet been written out to disk --
                 * argv will be filled in later by _make_batch_script()
                 */
- job->argv = (char **) xmalloc(sizeof(char *));
+ job->argv = (char **) xmalloc(2 * sizeof(char *));
        }

        job->task = (slurmd_task_info_t **)

Revision history for this message
Fabrice Coutadeur (fabricesp) wrote :

Hi,

As lucid has version 2.0.5, I'll close this bug report as fixed. If you feel it's not the case, please reopen it by changing the status back to New.

Thanks,
Fabrice

Changed in slurm-llnl (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.