glusterfs-server timesout starting when rdma-core is installed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
glusterfs (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Trying to get a glusterfs file share running over infiniband.
If the rdma_ucm kernel module is loaded (as it is when rdma-core is installed) the glusterd service cannot start. It just hangs when starting and the kernel generates...
May 18 01:29:42 gfsa kernel: [ 605.323874] INFO: task glusterd:2765 blocked for more than 120 seconds.
May 18 01:29:42 gfsa kernel: [ 605.324099] Tainted: G I 4.15.0-20-generic #21-Ubuntu
May 18 01:29:42 gfsa kernel: [ 605.324294] "echo 0 > /proc/sys/
May 18 01:29:42 gfsa kernel: [ 605.324519] glusterd D 0 2765 2763 0x00000000
May 18 01:29:42 gfsa kernel: [ 605.324522] Call Trace:
May 18 01:29:42 gfsa kernel: [ 605.324532] __schedule+
May 18 01:29:42 gfsa kernel: [ 605.324536] schedule+0x2c/0x80
May 18 01:29:42 gfsa kernel: [ 605.324538] schedule_
May 18 01:29:42 gfsa kernel: [ 605.324542] ? flush_workqueue
May 18 01:29:42 gfsa kernel: [ 605.324546] wait_for_
May 18 01:29:42 gfsa kernel: [ 605.324550] ? wake_up_q+0x80/0x80
May 18 01:29:42 gfsa kernel: [ 605.324556] ucma_destroy_
May 18 01:29:42 gfsa kernel: [ 605.324560] ? common_
May 18 01:29:42 gfsa kernel: [ 605.324563] ucma_write+
May 18 01:29:42 gfsa kernel: [ 605.324567] __vfs_write+
May 18 01:29:42 gfsa kernel: [ 605.324570] vfs_write+
May 18 01:29:42 gfsa kernel: [ 605.324573] SyS_write+0x55/0xc0
May 18 01:29:42 gfsa kernel: [ 605.324577] do_syscall_
May 18 01:29:42 gfsa kernel: [ 605.324580] entry_SYSCALL_
May 18 01:29:42 gfsa kernel: [ 605.324582] RIP: 0033:0x7faf58c882b7
May 18 01:29:42 gfsa kernel: [ 605.324584] RSP: 002b:00007ffeec
May 18 01:29:42 gfsa kernel: [ 605.324587] RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007faf58c882b7
May 18 01:29:42 gfsa kernel: [ 605.324588] RDX: 0000000000000018 RSI: 00007ffeecbceb20 RDI: 000000000000000b
May 18 01:29:42 gfsa kernel: [ 605.324590] RBP: 00007ffeecbceb20 R08: 0000000000000000 R09: 00007faf599d9540
May 18 01:29:42 gfsa kernel: [ 605.324591] R10: 00000000ffffff78 R11: 0000000000000293 R12: 0000000000000018
May 18 01:29:42 gfsa kernel: [ 605.324593] R13: 00007ffeecbcebd0 R14: 00007ffeecbcec70 R15: 00007ffeecbcec50
May 18 01:31:11 gfsa systemd[1]: glusterd.service: Start operation timed out. Terminating.
May 18 01:31:11 gfsa systemd[1]: glusterd.service: Failed with result 'timeout'.
May 18 01:31:11 gfsa systemd[1]: Failed to start LSB: Gluster File System service for volume management.
If I remove rdma-core then glusterd will start but I cannot use transport tcp,rdma for volumes.
Ubuntu server 18.04 doesn't provide a glusterfs-server that is compatible with RDMA infiniband. I'd be happy to find out what module/package I'm missing loaded rather than this actually be a bug.
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: glusterfs-server 3.13.2-1build1
ProcVersionSign
Uname: Linux 4.15.0-20-generic x86_64
ApportVersion: 2.20.9-0ubuntu7
Architecture: amd64
Date: Fri May 18 01:37:45 2018
ProcEnviron:
TERM=xterm
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: glusterfs
UpgradeStatus: No upgrade log present (probably fresh install)
I also encountered this problem on Ubuntu MATE 19.10, glusterd would refuse to start with RDMA. No useful diagnostic information, the glusterd program would just crash during startup. I found this bug report and tried unloading the rdma_ucm kernel module and that worked.
However, the rdma-core package is not actually installed on my systems. I'm using Mellanox's OFED v4.7 distribution, specifically: MLNX_OFED_ LINUX-4. 7-3.2.9. 0-ubuntu19. 10-x86_ 64.tgz