Open MPI 2.1.1 sample code failure

Bug #1838684 reported by Jeff Squyres on 2019-08-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openmpi (Ubuntu)
Undecided
Unassigned

Bug Description

As reported at the upstream Open MPI users list (https://<email address hidden>/msg33383.html), the Open MPI v2.1.1 bundled in Ubuntu 18.04.x seems to have a problem. I am one of the upstream Open MPI developers, so I am filing this issue here.

When the trivial attached program (any.c) is compiled and run with openmpi-bin 2.1.1-8, it fails:

-----
apt-get install -y openmpi-bin
mpicc any.c -o any
./any
# The "any" program aborts
-----

However, when I compile/install Open MPI v2.1.1 from the upstream web site (https://www.open-mpi.org/software/ompi/v2.1/), the attached any.c program works fine:

-----
wget https://download.open-mpi.org/release/open-mpi/v2.1/openmpi-2.1.1.tar.bz2
tar xf openmpi-2.1.1.tar.bz2
cd openmpi-2.1.1
./configure --prefix=$HOME/bogus && make -j 8 && make install

export PATH=$HOME/bogus/bin:$PATH
export LD_LIBRARY_PATH=$HOME/lib:$LD_LIBRARY_PATH

cd /path/to/sample/program
mpicc any.c -o any
./any
# The "any" program succeeds
-----

Specifically: the "any" program works fine for me with Open MPI v2.1.1 hand-compiled by me, both on Ubuntu 18.04, on other Linux distros, and on MacOS. It also works fine with Open MPI v2.1.6, the most recent version of the Open MPI v2.1.x series.

This suggests that there may be something wonky with the Open MPI 2.1.1-8 that is bundled in Ubuntu 18.04. I don't know if there are any Ubuntu-specific patches in your Open MPI package, or if there are odd dependency interactions, or ... something else. But it's clearly giving the wrong result somehow.

Additionally, it should be noted that the Open MPI v2.1.x series is pretty ancient. Open MPI v2.1.1 was released in May of 2017. The most current release of Open MPI is v4.0.1 (v4.0.2 should be out "soon").

At the very minimum, Ubuntu should upgrade its Open MPI to the latest in the v2.1.x series -- v2.1.6, which contains a ton of bug fixes compared to v2.1.1, and should be ABI compatible with v2.1.1 (jumping to v4.0.x would break ABI, which I assume is not desirable in Ubuntu 18.04 LTS).

Ubuntu version:

$ lsb_release -rd
Description: Ubuntu 18.04.2 LTS
Release: 18.04

$ apt-cache policy openmpi-bin
openmpi-bin:
  Installed: 2.1.1-8
  Candidate: 2.1.1-8
  Version table:
 *** 2.1.1-8 500
        500 http://archive.ubuntu.com/ubuntu bionic/universe amd64 Packages
        100 /var/lib/dpkg/status

Jeff Squyres (jsquyres-cisco) wrote :
Paul White (paulw2u) on 2019-08-01
affects: xubuntu-meta (Ubuntu) → openmpi (Ubuntu)
tags: added: bionic
Jeff Squyres (jsquyres-cisco) wrote :

It looks like the issue is because the Ubuntu package is compiled with `--enable-heterogeneous` (this is not a default option). Specifically: if I compile Open MPI 2.1.1 with `--enable-heterogeneous`, I am able to replicate the issue.

It further looks like we elected not to fix this issue upstream in the v2.x series; only the v3.x series has the fix. I would advise the Ubuntu package to:

1. Do not use `--enable-heterogeneous` (it's not common usage, anyway).
2. Upgrade to Open MPI v2.1.6 (for ABI compatibility with the already-shipping openmpi-dev-2.1.1-8). If you don't care about ABI, upgrade to whatever the latest version is on www.open-mpi.org.

Jeff Squyres (jsquyres-cisco) wrote :

I forgot to link to the upstream bug where this was initially reported: https://github.com/open-mpi/ompi/pull/4501

Jeff Squyres (jsquyres-cisco) wrote :

This bug is a dup of an almost 2-year old bug: https://bugs.launchpad.net/ubuntu/+source/openmpi/+bug/1731938.

Ubuntu package maintainer: please fix this issue. It's very easy to fix.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers