srun with pmix plugin searches .so file at wrong location

Bug #1862854 reported by segler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
slurm-llnl (Ubuntu)
New
Undecided
Unassigned

Bug Description

Which version of ubuntu:
distribution is newest daily of ubuntu focal with all updates as of today.

What i want to accomplish:
use slurm-wlm in combination with openmpi with a simple test.

How to reproduce:
1) Use slurm-wlm on a small single node test setup, with just the example slurm conf copied, and the server name changed accordingly in the slurm.conf file at
> SlurmctldHost=srv0
and
> NodeName=srv0 State=UNKNOWN
2) start slurmctld and slurmd daemons
3) create small sample hello world for mpi test
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}
4) mpicc test.c
5) mpirun ./a.out works correctly

6) trying the same with slurm: srun --mpi=pmix ./a.out
which gives me
> srun: error: (null) [0] /mpi_pmix.c:133 [init] mpi/pmix: ERROR: pmi/pmix: can not load PMIx library
> srun: error: Couldn't load specified plugin name for mpi/pmix: Plugin init() callback failed
> srun: error: cannot create mpi context for mpi/pmix
> srun: error: invalid MPI type 'pmix', --mpi=list for acceptable types

7) test if the pmix plugin is really supported with: srun --mpi=list
gives me
> srun: MPI types are...
> srun: pmix_v3
> srun: none
> srun: pmi2
> srun: pmix
> srun: openmpi

8) more verbose output of the failing command: strace srun --mpi=pmix ./a.out
shorter output: (tells me that the library is actually not at the path that slurm expects it to be)

openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
poll([{fd=2, events=POLLOUT}], 1, 5000) = 1 ([{fd=2, revents=POLLOUT}])
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
write(2, "srun: error: (null) [0] /mpi_pmi"..., 100srun: error: (null) [0] /mpi_pmix.c:133 [init] mpi/pmix: ERROR: pmi/pmix: can not load PMIx library
) = 100

9) check the real content of the directory of the library with: ls -lh /usr/lib/x86_64-linux-gnu/pmix/lib/

total 2,6M
lrwxrwxrwx 1 root root 29 Okt 19 19:57 libmca_common_dstore.so.1 -> libmca_common_dstore.so.1.0.1
-rw-r--r-- 1 root root 59K Okt 19 19:57 libmca_common_dstore.so.1.0.1
lrwxrwxrwx 1 root root 16 Okt 19 19:57 libpmi2.so.1 -> libpmi2.so.1.0.0
-rw-r--r-- 1 root root 863K Okt 19 19:57 libpmi2.so.1.0.0
lrwxrwxrwx 1 root root 15 Okt 19 19:57 libpmi.so.1 -> libpmi.so.1.0.1
-rw-r--r-- 1 root root 863K Okt 19 19:57 libpmi.so.1.0.1
lrwxrwxrwx 1 root root 17 Okt 19 19:57 libpmix.so.2 -> libpmix.so.2.2.24
-rw-r--r-- 1 root root 847K Okt 19 19:57 libpmix.so.2.2.24
drwxr-xr-x 3 root root 4,0K Feb 11 21:43 pmix

10) it seems the library is actually there, but the name is not perfectly correct:
slurm wants "libpmix.so" but the real name is "libpmix.so.2"

11) try to make a link with the name:
ln -s /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.24 /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so

12) try slurm again:
$ srun --mpi=pmix -n4 ~segler/a.out
Hello world from processor srv0, rank 0 out of 4 processors
Hello world from processor srv0, rank 2 out of 4 processors
Hello world from processor srv0, rank 1 out of 4 processors
Hello world from processor srv0, rank 3 out of 4 processors

13) it works!!! :)

could you add the library link to the package of libpmix2?

$ dpkg -S /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.24
libpmix2:amd64: /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.24

thank you!

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: slurm-wlm 19.05.5-1
ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8
Uname: Linux 5.4.0-12-generic x86_64
ApportVersion: 2.20.11-0ubuntu16
Architecture: amd64
Date: Tue Feb 11 23:01:18 2020
InstallationDate: Installed on 2020-02-11 (0 days ago)
InstallationMedia: Ubuntu-Server 20.04 LTS "Focal Fossa" - Alpha amd64 (20200124)
ProcEnviron:
 SHELL=/bin/bash
 LANG=de_DE.UTF-8
 TERM=xterm-256color
 PATH=(custom, no user)
SourcePackage: slurm-llnl
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
segler (segler-alex) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.