mpich-3.3b2 critical bug (deadlock) and patch

Bug #1802372 reported by Jed Brown on 2018-11-08
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mpich (Ubuntu)
Undecided
Unassigned

Bug Description

MPICH 3.3b2 deadlocks when large tags are used (the issue was identified by the PETSc team, but it affects other packages). The fix is one line:

commit c597c8d79deea220a42751fda0f01ce70764c260
Author: Min Si <email address hidden>
Date: Wed Apr 18 10:15:25 2018 -0500

    ch3: Fix tag upper limit initialization

    The value of tag_ub is initialized in MPIR_Init_thread, but was
    incorrectly reset in ch3 device initialization. This patch fixes it.

    Signed-off-by: Ken Raffenetti <email address hidden>

diff --git a/src/mpid/ch3/src/mpid_init.c b/src/mpid/ch3/src/mpid_init.c
index f7664fd2e..298ef4bd5 100644
--- a/src/mpid/ch3/src/mpid_init.c
+++ b/src/mpid/ch3/src/mpid_init.c
@@ -157,7 +157,6 @@ int MPID_Init(int *argc, char ***argv, int requested, int *provided,
      * Set global process attributes. These can be overridden by the channel
      * if necessary.
      */
- MPIR_Process.attrs.tag_ub = INT_MAX;
     MPIR_Process.attrs.io = MPI_ANY_SOURCE;

     /*

See also https://github.com/pmodels/mpich/pull/3097/

I have confirmed that this bug is present in the current package distributed with Ubuntu 18.10.

I am aware of another bug that was fixed at a similar time and should also be patched on any Ubuntu releases that stick with mpich-3.3b2. (These are both fixed in mpich-3.3b3.)

commit 8edabc7373b82dd660019e53d246131765819294
Author: Rob Latham <email address hidden>
Date: Tue Apr 17 11:20:25 2018 -0500

    fix uninitialized variable

    Closes pmodels/mpich#2892

diff --git a/src/mpi/romio/adio/ad_nfs/ad_nfs_read.c b/src/mpi/romio/adio/ad_nfs/ad_nfs_read.c
index e01cc21b0..5b8f0b88f 100644
--- a/src/mpi/romio/adio/ad_nfs/ad_nfs_read.c
+++ b/src/mpi/romio/adio/ad_nfs/ad_nfs_read.c
@@ -158,7 +158,7 @@ void ADIOI_NFS_ReadStrided(ADIO_File fd, void *buf, int count,

     ADIOI_Flatlist_node *flat_buf, *flat_file;
     ADIO_Offset i_offset, new_brd_size, brd_size, size;
- int i, j, k, err, err_flag, st_index = 0;
+ int i, j, k, err, err_flag=0, st_index = 0;
     MPI_Count num, bufsize;
     int n_etypes_in_filetype;
     ADIO_Offset n_filetypes, etype_in_filetype, st_n_filetypes, size_in_filetype;

https://github.com/pmodels/mpich/commit/8edabc7373b82dd660019e53d246131765819294

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers