Comment 15 for bug 1931063

Revision history for this message
Paride Legovini (paride) wrote :

Hi @lamm, I managed to reproduce the "fork() system call" failure you described. For some reason my /tmp/node* directories do not get automatically created/populated, despite testconfig.sh.example saying that "they will be created". However by manually populating them with the required test files I got the save error message as yours.

Here is my take on it:

1. You are right saying that libfabric is involved. That error message has been introduced in this libfabric commit:

  https://github.com/ofiwg/libfabric/commit/b40ce3531dcfc79f3356e2c01701058a8e2ef4f4

AIUI that commit disabled the default usage of a fork-safety mode, as it affected performance and was imperfect in any case. The suggestion is to force a (different/better?) fork-safe mode by setting RDMAV_FORK_SAFE=1.

2. Apparently rpmem_fip/TEST0 has fork() calls, and thus triggers the warning and abort() implemented in that b40ce35 commit. Setting RDMAV_FORK_SAFE=1, i.e. setting the following in testconfig.sh:

NODE_ENV[0]="PMEM_IS_PMEM_FORCE=1 RDMAV_FORK_SAFE=1"

(and similar for the other NODEs) makes rpmem_fip/TEST0 pass for me. So this is probably not a bug in libfabric and strictly speaking not a bug in pmdk, but the pmdk tests may need to be updated to work by default with the newer versions of libfabric. It may be worth filing an upstream pmdk bug.

3. Commit b40ce35 was first released in libfabric 1.11.0, which was first released in Hirsute. This is consistent with the fact that you were not seeing that failure in the pre-Hirsute Ubuntu releases. It is worth checking which version of libfabric is in the other distros you tested. If it's < 1.11.0 then it's all consistent.

4. If we agree we don't have a regression in pmdk here, let's go back to the issue with RPMEM_RAW_BUFF_SIZE and LANE_ALIGN_SIZE. That issue should be fixed in the test packages in my PPA (https://launchpad.net/~paride/+archive/ubuntu/pmdk-lp1931063). By setting RDMAV_FORK_SAFE=1 you should be able to verify, so we can proceed with the SRU for Hirsute.

Let me know WDYT, and thanls again for the feedback and testing.