srun with pmix plugin searches .so file at wrong location
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
slurm-llnl (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Which version of ubuntu:
distribution is newest daily of ubuntu focal with all updates as of today.
What i want to accomplish:
use slurm-wlm in combination with openmpi with a simple test.
How to reproduce:
1) Use slurm-wlm on a small single node test setup, with just the example slurm conf copied, and the server name changed accordingly in the slurm.conf file at
> SlurmctldHost=srv0
and
> NodeName=srv0 State=UNKNOWN
2) start slurmctld and slurmd daemons
3) create small sample hello world for mpi test
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_
// Get the rank of the process
int world_rank;
MPI_
// Get the name of the processor
char processor_
int name_len;
MPI_
// Print off a hello world message
printf("Hello world from processor %s, rank %d out of %d processors\n",
// Finalize the MPI environment.
MPI_Finalize();
}
4) mpicc test.c
5) mpirun ./a.out works correctly
6) trying the same with slurm: srun --mpi=pmix ./a.out
which gives me
> srun: error: (null) [0] /mpi_pmix.c:133 [init] mpi/pmix: ERROR: pmi/pmix: can not load PMIx library
> srun: error: Couldn't load specified plugin name for mpi/pmix: Plugin init() callback failed
> srun: error: cannot create mpi context for mpi/pmix
> srun: error: invalid MPI type 'pmix', --mpi=list for acceptable types
7) test if the pmix plugin is really supported with: srun --mpi=list
gives me
> srun: MPI types are...
> srun: pmix_v3
> srun: none
> srun: pmi2
> srun: pmix
> srun: openmpi
8) more verbose output of the failing command: strace srun --mpi=pmix ./a.out
shorter output: (tells me that the library is actually not at the path that slurm expects it to be)
openat(AT_FDCWD, "/usr/lib/
poll([{fd=2, events=POLLOUT}], 1, 5000) = 1 ([{fd=2, revents=POLLOUT}])
fstat(2, {st_mode=
write(2, "srun: error: (null) [0] /mpi_pmi"..., 100srun: error: (null) [0] /mpi_pmix.c:133 [init] mpi/pmix: ERROR: pmi/pmix: can not load PMIx library
) = 100
9) check the real content of the directory of the library with: ls -lh /usr/lib/
total 2,6M
lrwxrwxrwx 1 root root 29 Okt 19 19:57 libmca_
-rw-r--r-- 1 root root 59K Okt 19 19:57 libmca_
lrwxrwxrwx 1 root root 16 Okt 19 19:57 libpmi2.so.1 -> libpmi2.so.1.0.0
-rw-r--r-- 1 root root 863K Okt 19 19:57 libpmi2.so.1.0.0
lrwxrwxrwx 1 root root 15 Okt 19 19:57 libpmi.so.1 -> libpmi.so.1.0.1
-rw-r--r-- 1 root root 863K Okt 19 19:57 libpmi.so.1.0.1
lrwxrwxrwx 1 root root 17 Okt 19 19:57 libpmix.so.2 -> libpmix.so.2.2.24
-rw-r--r-- 1 root root 847K Okt 19 19:57 libpmix.so.2.2.24
drwxr-xr-x 3 root root 4,0K Feb 11 21:43 pmix
10) it seems the library is actually there, but the name is not perfectly correct:
slurm wants "libpmix.so" but the real name is "libpmix.so.2"
11) try to make a link with the name:
ln -s /usr/lib/
12) try slurm again:
$ srun --mpi=pmix -n4 ~segler/a.out
Hello world from processor srv0, rank 0 out of 4 processors
Hello world from processor srv0, rank 2 out of 4 processors
Hello world from processor srv0, rank 1 out of 4 processors
Hello world from processor srv0, rank 3 out of 4 processors
13) it works!!! :)
could you add the library link to the package of libpmix2?
$ dpkg -S /usr/lib/
libpmix2:amd64: /usr/lib/
thank you!
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: slurm-wlm 19.05.5-1
ProcVersionSign
Uname: Linux 5.4.0-12-generic x86_64
ApportVersion: 2.20.11-0ubuntu16
Architecture: amd64
Date: Tue Feb 11 23:01:18 2020
InstallationDate: Installed on 2020-02-11 (0 days ago)
InstallationMedia: Ubuntu-Server 20.04 LTS "Focal Fossa" - Alpha amd64 (20200124)
ProcEnviron:
SHELL=/bin/bash
LANG=de_DE.UTF-8
TERM=xterm-
PATH=(custom, no user)
SourcePackage: slurm-llnl
UpgradeStatus: No upgrade log present (probably fresh install)