udaddy is failing to get address handle in Ubuntu 14.10 (Mellanox)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu on IBM z Systems |
Fix Released
|
High
|
Unassigned | ||
libibverbs (Ubuntu) |
Fix Released
|
High
|
Adam Conrad | ||
Trusty |
Won't Fix
|
High
|
Adam Conrad |
Bug Description
---Problem Description---
udaddy is giving segmentation fault in ubuntu14.10 guest VM.
---uname output---
Linux ubuntu 3.16.0-9-generic #14-Ubuntu SMP Fri Aug 15 15:03:36 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
---Additional Hardware Info---
0001:00:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Machine Type = P8
---Steps to Reproduce---
Install two P8 machines with Power KVM build releases.
Then do PCI pass through of the Mellanox Technologies MT27500 Family [ConnectX-3] adapter to ubuntu 14.10 guest VM on one P8 server.
Also, do PCI pass through similarly in another P8 machine to another guest VM.
Then install all of the OFED packages available in ubuntu repo in the ubuntu 14.10 guest VM.
On the server side Ubuntu 14.10 guest VM:
=======
root@ubuntu:~# udaddy
udaddy: starting server
receiving data transfers
sending replies
Segmentation fault
root@ubuntu:~# echo $?
139
root@ubuntu:~# dmesg | tail
[ 67.760069] systemd-
[ 67.761358] systemd-
[ 84.157069] mlx4_en: eth1: frag:0 - size:1526 prefix:0 align:0 stride:1536
[ 113.906624] sda2: WRITE SAME failed. Manually zeroing.
[ 1207.663187] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)
[ 1372.419099] udaddy[10624]: unhandled signal 11 at 000000000000001c nip 00003fffb2833940 lr 00003fffb28332f8 code 30001
[ 1391.318442] udaddy[10625]: unhandled signal 11 at 000000000000001c nip 00003fff843c3940 lr 00003fff843c32f8 code 30001
[ 1605.929122] udaddy[10641]: unhandled signal 11 at 000000000000001c nip 00003fff97203940 lr 00003fff972032f8 code 30001
[ 1869.130536] udaddy[10648]: unhandled signal 11 at 000000000000001c nip 00003fff94523940 lr 00003fff945232f8 code 30001
[ 2124.361751] udaddy[10652]: unhandled signal 11 at 000000000000001c nip 00003fff89d33940 lr 00003fff89d332f8 code 30001
On the client node with other distro guest VM:
=======
[root@localhost ~]# udaddy -s 10.10.10.15
udaddy: starting client
udaddy: connecting
initiating data transfers
receiving data transfers
root@ubuntu:~# uname -a
Linux ubuntu 3.16.0-9-generic #14-Ubuntu SMP Fri Aug 15 15:03:36 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
root@ubuntu:~# which udaddy
/usr/bin/udaddy
root@ubuntu:~# dpkg -S /usr/bin/udaddy
rdmacm-utils: /usr/bin/udaddy
root@ubuntu:~# dpkg --list | grep rdmacm-utils
ii rdmacm-utils 1.0.16-1 ppc64el Examples for the librdmacm library
root@ubuntu:~#
root@ubuntu:~# lspci
0000:00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
0001:00:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
root@ubuntu:~# lspci -v
0000:00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
Subsystem: Red Hat, Inc Device 0001
Flags: bus master, fast devsel, latency 0, IRQ 17
I/O ports at 0020 [size=32]
Memory at 100b0000000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at 100b0040000 [disabled] [size=256K]
Kernel driver in use: virtio-pci
0001:00:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: IBM Device 04b5
Flags: bus master, fast devsel, latency 0, IRQ 18
Memory at 130b0000000 (64-bit, non-prefetchable) [size=1M]
Memory at 130a0000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at 130b0100000 [disabled] [size=1M]
Kernel driver in use: mlx4_core
Userspace tool common name: /usr/bin/udaddy
The userspace tool has the following bit modes: 64-bit
Userspace rpm: rdmacm-utils
I will look into this tomorrow I have to read the code in libmlx4 and libibverbs in ubuntu 14.10. There is a lot of changes for RoCE UD that I am not sure what Ubuntu 14.10 took.
I just tried to run it in Ubuntu 14.10 but just using the same Ubuntu machine as server and client and I get this
udaddy -s 20.20.20.20
udaddy: starting client
udaddy: connecting
udaddy: failure creating address handle
test complete
return status -1
Ubuntu is missing some code to make this work. The issue is that libmlx4 and libibverbs is missing the code for RoCE UD neighboor code.
Here I found the series of patches needed:
For libmlx4:
[PATCH libmlx4 V4 0/2] Add RoCE IP based addressing support for UD QPs
[PATCH libmlx4 V4 1/2] Add ibv_query_port caching support
[PATCH libmlx4 V4 2/2] Add RoCE IP based addressing support for UD QPs
For libibverbs:
[PATCH libibverbs V5 0/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
[PATCH libibverbs V5 1/2] Add ibv_port_cap_flags
[PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
tags: | added: architecture-ppc64le bugnameltc-115089 severity-high targetmilestone-inin1410 |
affects: | ubuntu → libibverbs (Ubuntu) |
Changed in libibverbs (Ubuntu): | |
assignee: | nobody → Taco Screen team (taco-screen-team) |
tags: |
added: targetmilestone-inin1504 removed: targetmilestone-inin1410 |
Changed in libibverbs (Ubuntu): | |
assignee: | Taco Screen team (taco-screen-team) → Steve Langasek (vorlon) |
Changed in libibverbs (Ubuntu): | |
assignee: | Steve Langasek (vorlon) → Adam Conrad (adconrad) |
Changed in libibverbs (Ubuntu): | |
importance: | Undecided → High |
status: | Confirmed → Triaged |
Changed in libibverbs (Ubuntu Trusty): | |
assignee: | nobody → Adam Conrad (adconrad) |
importance: | Undecided → High |
status: | New → Triaged |
tags: | added: roce |
Changed in ubuntu-z-systems: | |
status: | New → Triaged |
importance: | Undecided → High |
------- Comment From <email address hidden> 2014-09-19 16:05 EDT-------
> Carol, can you attach the missing libraries that you want Canonical to pick
> up.
libibverbs & libmlx4 -1 ?
The libraries are in Canonical, they just need to pick up some patches that I pointed for libibverbs and libmlx4-1