2019-04-23 07:43:46 |
Christian Ehrhardt |
description |
== Comment: #0 - David J. Wilder - 2019-04-05 12:44:56 ==
---Problem Description---
dpdk-testpmd is failing in net_mlx5.
/usr/bin/dpdk-testpmd \
-w 0000:01:00.0 \
-l 0-3 \
-n 4 -- \
-i -a
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1019 net_mlx5
net_mlx5: probe of PCI device 0000:01:00.0 aborted after encountering an error: Unknown error -95
testpmd: No probed ethernet devices
Interactive-mode selected
Auto-start selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Done
Start automatic packet forwarding
io packet forwarding - ports=0 - cores=0 - streams=0 - NUMA support enabled, MP allocation mode: native
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=0
Contact Information = David Wilder/wilder@us.ibm.com
---uname output---
Linux ltc17u31 5.0.0-8-generic #9-Ubuntu SMP Tue Mar 12 21:59:39 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = 9006-22P Boston
---Debugger---
A debugger is not configured
---Steps to Reproduce---
Installed 19.04 (ppc64le)
Installed dpdk and dpdk-dev
----
run dpdk-testpmd
/usr/bin/dpdk-testpmd \
-w 0000:01:00.0 \
-l 0-3 \
-n 4 -- \
-i -a
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1019 net_mlx5
net_mlx5: probe of PCI device 0000:01:00.0 aborted after encountering an error: Unknown error -95
testpmd: No probed ethernet devices
Interactive-mode selected
Auto-start selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Done
Start automatic packet forwarding
io packet forwarding - ports=0 - cores=0 - streams=0 - NUMA support enabled, MP allocation mode: native
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=0
Userspace tool common name: testpmd
The userspace tool has the following bit modes: 64-bit
Userspace rpm: dpdk-dev/disco,now 18.11-6 ppc64el
== Comment: #1 - David J. Wilder - 2019-04-05 12:45:35 ==
# lspci -vvv -s 0000:01:00.0
0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Subsystem: IBM MT28800 Family [ConnectX-5 Ex]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 24
NUMA node: 0
Region 0: Memory at 6000800000000 (64-bit, prefetchable) [size=512M]
[virtual] Expansion ROM at 600c000000000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported, Exit Latency L0s unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [48] Vital Product Data
Product Name: PCIe4 2-port 100Gb EDR Adapter x16
Read-only fields:
[PN] Part number: 00WT174
[EC] Engineering changes: P40094
[VF] Vendor specific: 00WT176
[SN] Serial number: YA50YF7CE0V3
[Z0] Unknown: 49 42 4d 30 30 30 30 30 30 30 30 30 32
[VC] Vendor specific: EC64
[MN] Manufacture ID: 37 35 30 58 30 39 31 37 32 35 33 30 38 37 20
[VH] Vendor specific: 2CF2
[VK] Vendor specific: ipzSeries
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [9c] MSI-X: Enable+ Count=256 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
Status: D0 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 04, GenCap+ CGenEn+ ChkCap+ ChkEn+
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
VF offset: 2, stride: 1, Device ID: 101a
Supported Page Size: 000007ff, System Page Size: 00000010
Region 0: Memory at 0006000000000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [1c0 v1] #19
Capabilities: [230 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [320 v1] #27
Capabilities: [370 v1] #26
Capabilities: [420 v1] #25
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
== Comment: #2 - David J. Wilder - 2019-04-05 12:54:17 ==
Building from git://dpdk.org/dpdk tag=v18.11 in the same environment also shows the same error.
== Comment: #4 - David J. Wilder - 2019-04-05 12:56:25 ==
Testing dpdk on beta 19.04 is showing an error with Mellanox Technologies MT28800 Family [ConnectX-5 Ex] ethernet controller.
== Comment: #6 - David J. Wilder - 2019-04-05 13:35:12 ==
Chasing the source of the error.
gdb dpdk/ppc_64-power8-linuxapp-gcc/app/testpmd
<....>
(gdb) break mlx5_ind_table_ibv_drop_new
Breakpoint 1 at 0x4998e8: file /home/wilder/ubuntu-19.04-debug/dpdk/drivers/net/mlx5/mlx5_rxq.c, line 2067.
(gdb) run -w 0000:01:00.0 -l 0-3 -n 4 -- -i -a
Starting program: /home/wilder/ubuntu-19.04-debug/dpdk/ppc_64-power8-linuxapp-gcc/app/testpmd -w 0000:01:00.0 -l 0-3 -n 4 -- -i -a
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
[New Thread 0x7ffff795dc90 (LWP 117018)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
[New Thread 0x7ffff714dc90 (LWP 117019)]
EAL: No free hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
[New Thread 0x7ffff693dc90 (LWP 117020)]
[New Thread 0x7ffff612dc90 (LWP 117021)]
[New Thread 0x7ffff591dc90 (LWP 117022)]
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1019 net_mlx5
Thread 1 "testpmd" hit Breakpoint 1, 0x00000001004998e8 in mlx5_ind_table_ibv_drop_new (dev=0x100d97580 <rte_eth_devices>)
at /home/wilder/ubuntu-19.04-debug/dpdk/drivers/net/mlx5/mlx5_rxq.c:2067
2067 {
(gdb) list
2062 * @return
2063 * The Verbs object initialised, NULL otherwise and rte_errno is set.
2064 */
2065 struct mlx5_ind_table_ibv *
2066 mlx5_ind_table_ibv_drop_new(struct rte_eth_dev *dev)
2067 {
2068 struct priv *priv = dev->data->dev_private;
2069 struct mlx5_ind_table_ibv *ind_tbl;
2070 struct mlx5_rxq_ibv *rxq;
2071 struct mlx5_ind_table_ibv tmpl;
(gdb)
2072
2073 rxq = mlx5_rxq_ibv_drop_new(dev);
2074 if (!rxq)
2075 return NULL;
2076 tmpl.ind_table = mlx5_glue->create_rwq_ind_table
2077 (priv->ctx,
2078 &(struct ibv_rwq_ind_table_init_attr){
2079 .log_ind_tbl_size = 0,
2080 .ind_tbl = &rxq->wq,
2081 .comp_mask = 0,
(gdb)
2082 });
2083 if (!tmpl.ind_table) {
2084 DEBUG("port %u cannot allocate indirection table for drop"
2085 " queue",
2086 dev->data->port_id);
2087 rte_errno = errno;
2088 goto error;
2089 }
2090 ind_tbl = rte_calloc(__func__, 1, sizeof(*ind_tbl), 0);
2091 if (!ind_tbl) {
(gdb) break 2084
Breakpoint 2 at 0x1004999d0: file /home/wilder/ubuntu-19.04-debug/dpdk/drivers/net/mlx5/mlx5_rxq.c, line 2084.
(gdb) cont
Continuing.
Thread 1 "testpmd" hit Breakpoint 2, mlx5_ind_table_ibv_drop_new (dev=0x100d97580 <rte_eth_devices>)
at /home/wilder/ubuntu-19.04-debug/dpdk/drivers/net/mlx5/mlx5_rxq.c:2087
2087 rte_errno = errno;
(gdb) print errno
$1 = 95
(gdb)
------
== Comment: #7 - David J. Wilder - 2019-04-05 18:53:33 ==
Interesting excerpt from strace:
write(1, "mlx5_glue_create_rwq_ind_table: "..., 65) = 65
ioctl(23, RDMA_VERBS_IOCTL, 0x7fffe3966c70) = -1 EOPNOTSUPP (Operation not supported)
== Comment: #8 - David J. Wilder <wilder@us.ibm.com> - 2019-04-05 21:05:21 ==
ConnectX-5 Firmware version:
# mstflint -d 0000:01:00.0 q
Image type: FS4
FW Version: 16.23.1020
FW Release Date: 10.7.2018
Product Version: 16.23.1020
Description: UID GuidsNumber
Base GUID: ec0d9a0300cab17c 4
Base MAC: ec0d9acab17c 4
Image VSD: N/A
Device VSD: N/A
PSID: IBM0000000020
Security Attributes: N/A |
[Impact]
* a missing memset can make rdma (users) use uninitialized memory
In the reported case this was a fail to initialize DPDK devices on
ppc64, but it could be almost anything else using the cmd buffers
* The patch is already at the v22 stable branch (backported and
intended to be in v22.2 once released)
[Test Case]
* So far the only way to trigger this that was found is to run a
Connect-X5 card on ppc64 (power9) and try to initialize it, e.g.
$ /usr/bin/dpdk-testpmd -i -a
This requires special HW, but I hope due to the patch bein a simple
one liner that should not be concerning for the SRU.
[Regression Potential]
* Without the memset it would be random memory, I could imagine a lucky
case that ran despite this issue but I can not imagine an issue
"relying" on the memory being not-set-to-zero (unless stealing data
was your use case).
[Other Info]
* n/a
---- originla bug report ----
== Comment: #0 - David J. Wilder - 2019-04-05 12:44:56 ==
---Problem Description---
dpdk-testpmd is failing in net_mlx5.
/usr/bin/dpdk-testpmd \
-w 0000:01:00.0 \
-l 0-3 \
-n 4 -- \
-i -a
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1019 net_mlx5
net_mlx5: probe of PCI device 0000:01:00.0 aborted after encountering an error: Unknown error -95
testpmd: No probed ethernet devices
Interactive-mode selected
Auto-start selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Done
Start automatic packet forwarding
io packet forwarding - ports=0 - cores=0 - streams=0 - NUMA support enabled, MP allocation mode: native
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=0
Contact Information = David Wilder/wilder@us.ibm.com
---uname output---
Linux ltc17u31 5.0.0-8-generic #9-Ubuntu SMP Tue Mar 12 21:59:39 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = 9006-22P Boston
---Debugger---
A debugger is not configured
---Steps to Reproduce---
Installed 19.04 (ppc64le)
Installed dpdk and dpdk-dev
----
run dpdk-testpmd
/usr/bin/dpdk-testpmd \
-w 0000:01:00.0 \
-l 0-3 \
-n 4 -- \
-i -a
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1019 net_mlx5
net_mlx5: probe of PCI device 0000:01:00.0 aborted after encountering an error: Unknown error -95
testpmd: No probed ethernet devices
Interactive-mode selected
Auto-start selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Done
Start automatic packet forwarding
io packet forwarding - ports=0 - cores=0 - streams=0 - NUMA support enabled, MP allocation mode: native
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=0
Userspace tool common name: testpmd
The userspace tool has the following bit modes: 64-bit
Userspace rpm: dpdk-dev/disco,now 18.11-6 ppc64el
== Comment: #1 - David J. Wilder - 2019-04-05 12:45:35 ==
# lspci -vvv -s 0000:01:00.0
0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Subsystem: IBM MT28800 Family [ConnectX-5 Ex]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 24
NUMA node: 0
Region 0: Memory at 6000800000000 (64-bit, prefetchable) [size=512M]
[virtual] Expansion ROM at 600c000000000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported, Exit Latency L0s unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [48] Vital Product Data
Product Name: PCIe4 2-port 100Gb EDR Adapter x16
Read-only fields:
[PN] Part number: 00WT174
[EC] Engineering changes: P40094
[VF] Vendor specific: 00WT176
[SN] Serial number: YA50YF7CE0V3
[Z0] Unknown: 49 42 4d 30 30 30 30 30 30 30 30 30 32
[VC] Vendor specific: EC64
[MN] Manufacture ID: 37 35 30 58 30 39 31 37 32 35 33 30 38 37 20
[VH] Vendor specific: 2CF2
[VK] Vendor specific: ipzSeries
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [9c] MSI-X: Enable+ Count=256 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
Status: D0 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 04, GenCap+ CGenEn+ ChkCap+ ChkEn+
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
VF offset: 2, stride: 1, Device ID: 101a
Supported Page Size: 000007ff, System Page Size: 00000010
Region 0: Memory at 0006000000000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [1c0 v1] #19
Capabilities: [230 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [320 v1] #27
Capabilities: [370 v1] #26
Capabilities: [420 v1] #25
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
== Comment: #2 - David J. Wilder - 2019-04-05 12:54:17 ==
Building from git://dpdk.org/dpdk tag=v18.11 in the same environment also shows the same error.
== Comment: #4 - David J. Wilder - 2019-04-05 12:56:25 ==
Testing dpdk on beta 19.04 is showing an error with Mellanox Technologies MT28800 Family [ConnectX-5 Ex] ethernet controller.
== Comment: #6 - David J. Wilder - 2019-04-05 13:35:12 ==
Chasing the source of the error.
gdb dpdk/ppc_64-power8-linuxapp-gcc/app/testpmd
<....>
(gdb) break mlx5_ind_table_ibv_drop_new
Breakpoint 1 at 0x4998e8: file /home/wilder/ubuntu-19.04-debug/dpdk/drivers/net/mlx5/mlx5_rxq.c, line 2067.
(gdb) run -w 0000:01:00.0 -l 0-3 -n 4 -- -i -a
Starting program: /home/wilder/ubuntu-19.04-debug/dpdk/ppc_64-power8-linuxapp-gcc/app/testpmd -w 0000:01:00.0 -l 0-3 -n 4 -- -i -a
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
[New Thread 0x7ffff795dc90 (LWP 117018)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
[New Thread 0x7ffff714dc90 (LWP 117019)]
EAL: No free hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
[New Thread 0x7ffff693dc90 (LWP 117020)]
[New Thread 0x7ffff612dc90 (LWP 117021)]
[New Thread 0x7ffff591dc90 (LWP 117022)]
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1019 net_mlx5
Thread 1 "testpmd" hit Breakpoint 1, 0x00000001004998e8 in mlx5_ind_table_ibv_drop_new (dev=0x100d97580 <rte_eth_devices>)
at /home/wilder/ubuntu-19.04-debug/dpdk/drivers/net/mlx5/mlx5_rxq.c:2067
2067 {
(gdb) list
2062 * @return
2063 * The Verbs object initialised, NULL otherwise and rte_errno is set.
2064 */
2065 struct mlx5_ind_table_ibv *
2066 mlx5_ind_table_ibv_drop_new(struct rte_eth_dev *dev)
2067 {
2068 struct priv *priv = dev->data->dev_private;
2069 struct mlx5_ind_table_ibv *ind_tbl;
2070 struct mlx5_rxq_ibv *rxq;
2071 struct mlx5_ind_table_ibv tmpl;
(gdb)
2072
2073 rxq = mlx5_rxq_ibv_drop_new(dev);
2074 if (!rxq)
2075 return NULL;
2076 tmpl.ind_table = mlx5_glue->create_rwq_ind_table
2077 (priv->ctx,
2078 &(struct ibv_rwq_ind_table_init_attr){
2079 .log_ind_tbl_size = 0,
2080 .ind_tbl = &rxq->wq,
2081 .comp_mask = 0,
(gdb)
2082 });
2083 if (!tmpl.ind_table) {
2084 DEBUG("port %u cannot allocate indirection table for drop"
2085 " queue",
2086 dev->data->port_id);
2087 rte_errno = errno;
2088 goto error;
2089 }
2090 ind_tbl = rte_calloc(__func__, 1, sizeof(*ind_tbl), 0);
2091 if (!ind_tbl) {
(gdb) break 2084
Breakpoint 2 at 0x1004999d0: file /home/wilder/ubuntu-19.04-debug/dpdk/drivers/net/mlx5/mlx5_rxq.c, line 2084.
(gdb) cont
Continuing.
Thread 1 "testpmd" hit Breakpoint 2, mlx5_ind_table_ibv_drop_new (dev=0x100d97580 <rte_eth_devices>)
at /home/wilder/ubuntu-19.04-debug/dpdk/drivers/net/mlx5/mlx5_rxq.c:2087
2087 rte_errno = errno;
(gdb) print errno
$1 = 95
(gdb)
------
== Comment: #7 - David J. Wilder - 2019-04-05 18:53:33 ==
Interesting excerpt from strace:
write(1, "mlx5_glue_create_rwq_ind_table: "..., 65) = 65
ioctl(23, RDMA_VERBS_IOCTL, 0x7fffe3966c70) = -1 EOPNOTSUPP (Operation not supported)
== Comment: #8 - David J. Wilder <wilder@us.ibm.com> - 2019-04-05 21:05:21 ==
ConnectX-5 Firmware version:
# mstflint -d 0000:01:00.0 q
Image type: FS4
FW Version: 16.23.1020
FW Release Date: 10.7.2018
Product Version: 16.23.1020
Description: UID GuidsNumber
Base GUID: ec0d9a0300cab17c 4
Base MAC: ec0d9acab17c 4
Image VSD: N/A
Device VSD: N/A
PSID: IBM0000000020
Security Attributes: N/A |
|