dpdk app is reporting: net_mlx5: probe of PCI device xxxx aborted after encountering an error: Unknown error -95
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Fix Released
|
High
|
Canonical Server | ||
Ubuntu on IBM z Systems |
Fix Released
|
Undecided
|
Unassigned | ||
dpdk (Ubuntu) |
Invalid
|
Undecided
|
Ubuntu on IBM Power Systems Bug Triage | ||
Disco |
Invalid
|
Undecided
|
Unassigned | ||
rdma-core (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Disco |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
* a missing memset can make rdma (users) use uninitialized memory
In the reported case this was a fail to initialize DPDK devices on
ppc64, but it could be almost anything else using the cmd buffers
* The patch is already at the v22 stable branch (backported and
intended to be in v22.2 once released)
[Test Case]
* So far the only way to trigger this that was found is to run a
Connect-X5 card on ppc64 (power9) and try to initialize it, e.g.
$ /usr/bin/
This requires special HW, but I hope due to the patch bein a simple
one liner that should not be concerning for the SRU.
[Regression Potential]
* Without the memset it would be random memory, I could imagine a lucky
case that ran despite this issue but I can not imagine an issue
"relying" on the memory being not-set-to-zero (unless stealing data
was your use case).
[Other Info]
* n/a
---- originla bug report ----
== Comment: #0 - David J. Wilder - 2019-04-05 12:44:56 ==
---Problem Description---
dpdk-testpmd is failing in net_mlx5.
/usr/bin/
-w 0000:01:00.0 \
-l 0-3 \
-n 4 -- \
-i -a
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/
EAL: No free hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1019 net_mlx5
net_mlx5: probe of PCI device 0000:01:00.0 aborted after encountering an error: Unknown error -95
testpmd: No probed ethernet devices
Interactive-mode selected
Auto-start selected
testpmd: create a new mbuf pool <mbuf_pool_
testpmd: preferred mempool ops selected: ring_mp_mc
Done
Start automatic packet forwarding
io packet forwarding - ports=0 - cores=0 - streams=0 - NUMA support enabled, MP allocation mode: native
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=0
Contact Information = David <email address hidden>
---uname output---
Linux ltc17u31 5.0.0-8-generic #9-Ubuntu SMP Tue Mar 12 21:59:39 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = 9006-22P Boston
---Debugger---
A debugger is not configured
---Steps to Reproduce---
Installed 19.04 (ppc64le)
Installed dpdk and dpdk-dev
----
run dpdk-testpmd
/usr/bin/
-w 0000:01:00.0 \
-l 0-3 \
-n 4 -- \
-i -a
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/
EAL: No free hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1019 net_mlx5
net_mlx5: probe of PCI device 0000:01:00.0 aborted after encountering an error: Unknown error -95
testpmd: No probed ethernet devices
Interactive-mode selected
Auto-start selected
testpmd: create a new mbuf pool <mbuf_pool_
testpmd: preferred mempool ops selected: ring_mp_mc
Done
Start automatic packet forwarding
io packet forwarding - ports=0 - cores=0 - streams=0 - NUMA support enabled, MP allocation mode: native
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=0
Userspace tool common name: testpmd
The userspace tool has the following bit modes: 64-bit
Userspace rpm: dpdk-dev/disco,now 18.11-6 ppc64el
== Comment: #1 - David J. Wilder - 2019-04-05 12:45:35 ==
# lspci -vvv -s 0000:01:00.0
0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Subsystem: IBM MT28800 Family [ConnectX-5 Ex]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 24
NUMA node: 0
Region 0: Memory at 6000800000000 (64-bit, prefetchable) [size=512M]
[virtual] Expansion ROM at 600c000000000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported, Exit Latency L0s unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCo
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationCom
Equalizatio
Capabilities: [48] Vital Product Data
Product Name: PCIe4 2-port 100Gb EDR Adapter x16
Read-only fields:
[PN] Part number: 00WT174
[EC] Engineering changes: P40094
[VF] Vendor specific: 00WT176
[SN] Serial number: YA50YF7CE0V3
[Z0] Unknown: 49 42 4d 30 30 30 30 30 30 30 30 30 32
[VC] Vendor specific: EC64
[MN] Manufacture ID: 37 35 30 58 30 39 31 37 32 35 33 30 38 37 20
[VH] Vendor specific: 2CF2
[VK] Vendor specific: ipzSeries
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [9c] MSI-X: Enable+ Count=256 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-
Status: D0 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 04, GenCap+ CGenEn+ ChkCap+ ChkEn+
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
VF offset: 2, stride: 1, Device ID: 101a
Supported Page Size: 000007ff, System Page Size: 00000010
Region 0: Memory at 0006000000000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [1c0 v1] #19
Capabilities: [230 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [320 v1] #27
Capabilities: [370 v1] #26
Capabilities: [420 v1] #25
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
== Comment: #2 - David J. Wilder - 2019-04-05 12:54:17 ==
Building from git://dpdk.org/dpdk tag=v18.11 in the same environment also shows the same error.
== Comment: #4 - David J. Wilder - 2019-04-05 12:56:25 ==
Testing dpdk on beta 19.04 is showing an error with Mellanox Technologies MT28800 Family [ConnectX-5 Ex] ethernet controller.
== Comment: #6 - David J. Wilder - 2019-04-05 13:35:12 ==
Chasing the source of the error.
gdb dpdk/ppc_
<....>
(gdb) break mlx5_ind_
Breakpoint 1 at 0x4998e8: file /home/wilder/
(gdb) run -w 0000:01:00.0 -l 0-3 -n 4 -- -i -a
Starting program: /home/wilder/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
[New Thread 0x7ffff795dc90 (LWP 117018)]
EAL: Multi-process socket /var/run/
[New Thread 0x7ffff714dc90 (LWP 117019)]
EAL: No free hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
[New Thread 0x7ffff693dc90 (LWP 117020)]
[New Thread 0x7ffff612dc90 (LWP 117021)]
[New Thread 0x7ffff591dc90 (LWP 117022)]
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1019 net_mlx5
Thread 1 "testpmd" hit Breakpoint 1, 0x00000001004998e8 in mlx5_ind_
at /home/wilder/
2067 {
(gdb) list
2062 * @return
2063 * The Verbs object initialised, NULL otherwise and rte_errno is set.
2064 */
2065 struct mlx5_ind_table_ibv *
2066 mlx5_ind_
2067 {
2068 struct priv *priv = dev->data-
2069 struct mlx5_ind_table_ibv *ind_tbl;
2070 struct mlx5_rxq_ibv *rxq;
2071 struct mlx5_ind_table_ibv tmpl;
(gdb)
2072
2073 rxq = mlx5_rxq_
2074 if (!rxq)
2075 return NULL;
2076 tmpl.ind_table = mlx5_glue-
2077 (priv->ctx,
2078 &(struct ibv_rwq_
2079 .log_ind_tbl_size = 0,
2080 .ind_tbl = &rxq->wq,
2081 .comp_mask = 0,
(gdb)
2082 });
2083 if (!tmpl.ind_table) {
2084 DEBUG("port %u cannot allocate indirection table for drop"
2085 " queue",
2086 dev->data-
2087 rte_errno = errno;
2088 goto error;
2089 }
2090 ind_tbl = rte_calloc(
2091 if (!ind_tbl) {
(gdb) break 2084
Breakpoint 2 at 0x1004999d0: file /home/wilder/
(gdb) cont
Continuing.
Thread 1 "testpmd" hit Breakpoint 2, mlx5_ind_
at /home/wilder/
2087 rte_errno = errno;
(gdb) print errno
$1 = 95
(gdb)
------
== Comment: #7 - David J. Wilder - 2019-04-05 18:53:33 ==
Interesting excerpt from strace:
write(1, "mlx5_glue_
ioctl(23, RDMA_VERBS_IOCTL, 0x7fffe3966c70) = -1 EOPNOTSUPP (Operation not supported)
== Comment: #8 - David J. Wilder <email address hidden> - 2019-04-05 21:05:21 ==
ConnectX-5 Firmware version:
# mstflint -d 0000:01:00.0 q
Image type: FS4
FW Version: 16.23.1020
FW Release Date: 10.7.2018
Product Version: 16.23.1020
Description: UID GuidsNumber
Base GUID: ec0d9a0300cab17c 4
Base MAC: ec0d9acab17c 4
Image VSD: N/A
Device VSD: N/A
PSID: IBM0000000020
Security Attributes: N/A
Related branches
- Andreas Hasenack: Approve
- Canonical Server packageset reviewers: Pending requested
- Canonical Server: Pending requested
-
Diff: 66 lines (+46/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/lp-1823836-clear-cmd-buffer.patch (+38/-0)
debian/patches/series (+1/-0)
Changed in ubuntu-power-systems: | |
importance: | Undecided → High |
assignee: | nobody → Canonical Server Team (canonical-server) |
Changed in ubuntu-power-systems: | |
status: | New → Triaged |
Changed in ubuntu-power-systems: | |
status: | Triaged → In Progress |
Changed in dpdk (Ubuntu): | |
status: | New → Triaged |
Changed in dpdk (Ubuntu Disco): | |
status: | New → Triaged |
tags: |
added: targetmilestone-inin1910 removed: targetmilestone-inin--- |
Changed in rdma-core (Ubuntu Disco): | |
status: | Triaged → In Progress |
Changed in ubuntu-power-systems: | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done verification-done-disco removed: verification-needed verification-needed-disco |
Changed in ubuntu-power-systems: | |
status: | Fix Committed → Fix Released |
Changed in ubuntu-z-systems: | |
status: | New → Fix Released |
Default Comment by Bridge