[hns3-0115] add 8 BD limit for tx flow
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| kunpeng920 |
Undecided
|
Ike Panhc | ||
| Ubuntu-18.04 |
Undecided
|
Ike Panhc | ||
| Ubuntu-18.04-hwe |
Undecided
|
Unassigned | ||
| Ubuntu-20.04 |
Undecided
|
Unassigned | ||
| Upstream-kernel |
Undecided
|
Unassigned | ||
| linux (Ubuntu) |
Undecided
|
Unassigned | ||
| Bionic |
Medium
|
Ike Panhc |
Bug Description
[Impact]
We get reports that iscsi and spark tests fail on hns3
[Fix]
Cherry-
net: hns3: add 8 BD limit for tx flow
net: hns3: avoid mult + div op in critical data path
net: hns3: remove some ops in struct hns3_nic_ops
net: hns3: fix for not calculating tx bd num correctly
net: hns3: unify maybe_stop_tx for TSO and non-TSO case
net: hns3: add check for max TX BD num for tso and non-tso case
net: hns3: fix for TX queue not restarted problem
net: hns3: fix a use after free problem in hns3_nic_
[Test]
No known way to reproduce it in our lab. Regression test only.
[Regression Potential]
Patchset only affects hns3 driver. Minimal risk for other drivers and platform.
[Bug Description]
A single transmit packet can span up to 8 descriptors,
TSO transmit packet can be stored up to 63 descriptors
and each segment within the TSO should be spanned up to
8 descriptors.
If the packet needs more than 8 BD, and the total size of
every 7 continuous frags more than MSS, HW does not support
it, and it need driver makes SKB Linearized.
[Actual Results]
iscsi and bigdata spark test OK
[Expected Results]
iscsi and bigdata spark test OK
[Reproducibility]
Inevitably
[Additional information]
Hardware: D06
Firmware: NA
Kernel: NA
DTS2018091810050
[Resolution]
SW use skb_copy to merge frag;
51e8439f3496 net: hns3: add 8 BD limit for tx flow
5f543a54eec0 net: hns3: fix for not calculating tx bd num correctly
tags: | added: ikeradar |
description: | updated |
Ike Panhc (ikepanhc) wrote : | #1 |
Ike Panhc (ikepanhc) wrote : | #2 |
So not suitable for 4.15 kernel.
Changed in kunpeng920: | |
status: | New → Fix Committed |
tags: | removed: ikeradar |
Changed in kunpeng920: | |
status: | Fix Committed → Fix Released |
Fred Kimmy (kongzizaixian) wrote : | #3 |
net: hns3: add 8 BD limit for tx flow
net: hns3: fix a use after free problem in hns3_nic_
net: hns3: avoid mult + div op in critical data path
net: hns3: fix for not calculating tx bd num correctly
this patchset have cause some error for net card as following:
IPv6: ADDRCONF(
Apr 30 10:57:22 arm-u18-48c kernel: [ 15.050113] hns3 0000:bd:00.0 eth0: link up
Apr 30 10:57:22 arm-u18-48c kernel: [ 15.050130] IPv6: ADDRCONF(
Apr 30 11:00:07 arm-u18-48c kernel: [ 181.144833] Netfilter messages via NETLINK v0.30.
Apr 30 11:00:07 arm-u18-48c kernel: [ 181.151372] ip_set: protocol 6
Apr 30 11:14:59 arm-u18-48c kernel: [ 1073.485529] hrtimer: interrupt took 660 ns
Apr 30 11:17:09 arm-u18-48c kernel: [ 1202.814563] hns3 0000:bd:00.0: PPU_PF_
Apr 30 11:17:09 arm-u18-48c kernel: [ 1202.826307] hns3 0000:bd:00.0: PF Reset requested
Apr 30 11:17:09 arm-u18-48c kernel: [ 1202.878715] hns3 0000:bd:00.0: PF failed(=-5) to send mailbox message to VF
Apr 30 11:17:09 arm-u18-48c kernel: [ 1202.909837] hns3 0000:bd:00.0: inform reset to vf(1) failed -5!
Apr 30 11:17:09 arm-u18-48c kernel: [ 1202.918236] hns3 0000:bd:00.0: PF failed(=-5) to send mailbox message to VF
Apr 30 11:17:09 arm-u18-48c kernel: [ 1202.936307] hns3 0000:bd:00.0: inform reset to vf(2) failed -5!
Apr 30 11:17:09 arm-u18-48c kernel: [ 1202.954199] hns3 0000:bd:00.0: PF failed(=-5) to send mailbox message to VF
Apr 30 11:17:09 arm-u18-48c kernel: [ 1202.964959] hns3 0000:bd:00.0: inform reset to vf(3) failed -5!
Apr 30 11:17:09 arm-u18-48c kernel: [ 1202.978401] hns3 0000:bd:00.0: PF failed(=-5) to send mailbox message to VF
Apr 30 11:17:09 arm-u18-48c kernel: [ 1202.994278] hns3 0000:bd:00.0: inform reset to vf(4) failed -5!
Apr 30 11:17:09 arm-u18-48c kernel: [ 1203.006549] hns3 0000:bd:00.0: PF failed(=-5) to send mailbox message to VF
Apr 30 11:17:09 arm-u18-48c kernel: [ 1203.016382] hns3 0000:bd:00.0: inform reset to vf(5) failed -5!
Apr 30 11:17:09 arm-u18-48c kernel: [ 1203.026513] hns3 0000:bd:00.0: PF failed(=-5) to send mailbox message to VF
Apr 30 11:17:09 arm-u18-48c kernel: [ 1203.036399] hns3 0000:bd:00.0: inform reset to vf(6) failed -5!
Apr 30 11:17:09 arm-u18-48c kernel: [ 1203.050229] hns3 0000:bd:00.0: PF failed(=-5) to send mailbox message to VF
Apr 30 11:17:09 arm-u18-48c kernel: [ 1203.059686] hns3 0000:bd:00.0: inform reset to vf(7) failed -5!
Apr 30 11:17:10 arm-u18-48c kernel: [ 1204.236266] hns3 0000:bd:00.0 eth0: link down
Apr 30 11:17:10 arm-u18-48c kernel: [ 1204.364172] hns3 0000:bd:00.0: prepare wait ok
Apr 30 11:17:10 arm-u18-48c kernel: [ 1204.600847] hns3 0000:bd:00.0: The firmware version is 0109210a
Apr 30 11:17:10 arm-u18-48c kernel: [ 1204.613248] hns3 0000:bd:00.0: Reset done, hclge driver initialization finished.
Apr 30 11:17:11 arm-u18-48c kernel: [ 1205.522648] hns3 0000:bd:00.0: SSU_PORT_
Apr 30 11:17:11 arm-u18-48c kernel: [ 1205.522658] hns3 0000:bd:00.0: PPU_PF_
Apr 30 11:17:11 arm-u1...
Changed in kunpeng920: | |
status: | Fix Released → New |
dann frazier (dannf) wrote : | #4 |
@Fred: In comment #3 you state "this patchset have cause some error". If this patch set has introduced a bug, please report that in a new bug. However, since you moved the Ubuntu-18.04 task back to "New" at that time, I wonder if your intent was to demonstrate that those patches are *required to fix* a bug in 4.15.
1) Can you clarify the above?
2) Which kernel version created the log in Comment #3?
Fred Kimmy (kongzizaixian) wrote : | #5 |
=>@Fred: In comment #3 you state "this patchset have cause some error". If this patch set has =>introduced a bug, please report that in a new bug. However, since you moved the Ubuntu-18.04 t=>ask back to "New" at that time, I wonder if your intent was to demonstrate that those patches =>are *required to fix* a bug in 4.15.
=> 1) Can you clarify the above?
=>2) Which kernel version created the log in Comment #3?
If not merge this aboving patchset, ubuntu 18.04.1 version will reproduce this error log, Can you backport it into ubuntu 18.04.1 update version?
Ike Panhc (ikepanhc) wrote : | #6 |
Hi Xinwei,
I get lots of conflict on cherry-pick d1a37dedcfcf ("net: hns3: fix a use after free problem in hns3_nic_
and in bug description it says iscsi and spark test. Could you also provide how to reproduce the failure?
Andrew Cloke (andrew-cloke) wrote : | #7 |
Marking as incomplete while waiting for a detailed reproducer and assistance with the merge conflict.
Changed in kunpeng920: | |
status: | New → Incomplete |
Andrew Cloke (andrew-cloke) wrote : | #8 |
Working with Huawei to identify a patchset that will cleanly apply to 4.15.
Changed in kunpeng920: | |
assignee: | nobody → Ike Panhc (ikepanhc) |
Andrew Cloke (andrew-cloke) wrote : | #9 |
Summary of email conversation between Ike Panhc and <email address hidden>:
On 2020/6/1 12:17, Ike Panhc wrote:
...
> Since our target is 51e8439f3496 ("net: hns3: add 8 BD limit for tx flow"),
> and we have its fix d1a37dedcfcf ("net: hns3: fix a use after free problem in hns3_nic_
>
> Are patches 3fe13ed95dd3 ("net: hns3: avoid mult + div op in critical data path") and
> 5f543a54eec0 ("net: hns3: fix for not calculating tx bd num correctly") needed too?
>
> If they are not, it will be much simpler and less risk for regression.
>
Hi Ike:
This two is not need. Thanks.
<End of summary>
Based on this, the next step is to investigate backporting the two patches that directly address the subject of this bug report.
Changed in kunpeng920: | |
status: | Incomplete → Triaged |
Ike Panhc (ikepanhc) wrote : | #10 |
I remembered wrong patch that introduces conflicts when cherry-picking to 4.15. We still need to work on d1a37dedcfcf ("net: hns3: fix a use after free problem in hns3_nic_
Ike Panhc (ikepanhc) wrote : | #11 |
Finished backporting and its git branch is here[1]. Also build debs[2].
[1] https:/
[2] https:/
Ike Panhc (ikepanhc) wrote : | #12 |
Running iperf3 test with kernel deb in #11 and on eno3/4 I can reach its limitation for 1hr each.
https:/
Ike Panhc (ikepanhc) wrote : | #13 |
Another iperf3 testing on eno3/4 of d06ES is passed.
https:/
Next step for me is to run iperf3 test on eno1 of d061, which 10Gb/s connected.
Changed in kunpeng920: | |
status: | Triaged → In Progress |
Ike Panhc (ikepanhc) wrote : | #14 |
Long term run on eno1 of d061 looks good to me.
ubuntu@scobee:~$ iperf -c 10.228.68.67 -t 18000
-------
Client connecting to 10.228.68.67, TCP port 5001
TCP window size: 85.0 KByte (default)
-------
[ 3] local 10.228.68.118 port 42408 connected with 10.228.68.67 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-18000.0 sec 17.8 TBytes 8.70 Gbits/sec
ubuntu@scobee:~$ iperf -c 10.228.68.67 -t 18000 -P2
-------
Client connecting to 10.228.68.67, TCP port 5001
TCP window size: 85.0 KByte (default)
-------
[ 4] local 10.228.68.118 port 42424 connected with 10.228.68.67 port 5001
[ 3] local 10.228.68.118 port 42422 connected with 10.228.68.67 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-18000.0 sec 9.56 TBytes 4.67 Gbits/sec
[ 4] 0.0-18000.0 sec 9.42 TBytes 4.60 Gbits/sec
[SUM] 0.0-18000.0 sec 19.0 TBytes 9.27 Gbits/sec
ubuntu@scobee:~$ ifconfig | grep -B2 118
eno1: flags=4163<
inet 10.228.68.118 netmask 255.255.255.0 broadcast 10.228.68.255
ubuntu@scobee:~$ uname -a
Linux scobee 4.15.0-106-generic #107-Ubuntu SMP Thu Jun 4 11:28:55 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
Ike Panhc (ikepanhc) wrote : | #15 |
Also run 96 threads on 10Gb/s hns3 for 5hr and no kernel error message.
@Xinwei,
Could you or your colleague run regression test on kernel debs in #11 and let me know if the backport is ok?
kernel debs are at https:/
Andrew Cloke (andrew-cloke) wrote : | #16 |
Marking as incomplete while waiting for the regression test runs requested in the last comment.
Changed in kunpeng920: | |
status: | In Progress → Incomplete |
Fred Kimmy (kongzizaixian) wrote : | #17 |
=>Could you or your colleague run regression test on kernel debs in #11 and let me know if the backport is ok?
test is ok in our CI environment.
Ike Panhc (ikepanhc) wrote : | #18 |
Thanks. I will make final regression test and then propose those patches for SRU process.
Changed in kunpeng920: | |
status: | Incomplete → In Progress |
tags: | added: ikeradar |
Changed in linux (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
status: | New → Fix Released |
description: | updated |
Ike Panhc (ikepanhc) wrote : | #19 |
Patches sent to kernel-team mailing list.
https:/
Changed in linux (Ubuntu Bionic): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Bionic): | |
assignee: | nobody → Ike Panhc (ikepanhc) |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
Changed in kunpeng920: | |
status: | In Progress → Fix Committed |
Ike Panhc (ikepanhc) wrote : | #20 |
These patches now are targeting 18.04.5-sru-1
tags: | removed: ikeradar |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-bionic |
Ike Panhc (ikepanhc) wrote : | #22 |
Thanks. Ubuntu-5.4.0-43.47 works good to me
tags: |
added: verification-done-bionic removed: verification-needed-bionic |
Launchpad Janitor (janitor) wrote : | #23 |
This bug was fixed in the package linux - 4.15.0-115.116
---------------
linux (4.15.0-115.116) bionic; urgency=medium
* bionic/linux: 4.15.0-115.116 -proposed tracker (LP: #1893055)
* [Potential Regression] dscr_inherit_
ubuntu_
- powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
linux (4.15.0-114.115) bionic; urgency=medium
* bionic/linux: 4.15.0-114.115 -proposed tracker (LP: #1891052)
* ipsec: policy priority management is broken (LP: #1890796)
- xfrm: policy: match with both mark and mask on user interfaces
linux (4.15.0-113.114) bionic; urgency=medium
* bionic/linux: 4.15.0-113.114 -proposed tracker (LP: #1890705)
* Packaging resync (LP: #1786013)
- update dkms package versions
* Reapply "usb: handle warm-reset port requests on hub resume" (LP: #1859873)
- usb: handle warm-reset port requests on hub resume
* Bionic update: upstream stable patchset 2020-07-29 (LP: #1889474)
- gpio: arizona: handle pm_runtime_get_sync failure case
- gpio: arizona: put pm_runtime in case of failure
- pinctrl: amd: fix npins for uart0 in kerncz_groups
- mac80211: allow rx of mesh eapol frames with default rx key
- scsi: scsi_transport_spi: Fix function pointer check
- xtensa: fix __sync_
- xtensa: update *pos in cpuinfo_op.next
- drivers/
- net: sky2: initialize return of gm_phy_read
- drm/nouveau/
- irqdomain/treewide: Keep firmware node unconditionally allocated
- SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO
compeletion")
- spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours
- IB/umem: fix reference count leak in ib_umem_odp_get()
- uprobes: Change handle_swbp() to send SIGTRAP with si_code=SI_KERNEL, to fix
GDB regression
- ALSA: info: Drop WARN_ON() from buffer NULL sanity check
- ASoC: rt5670: Correct RT5670_LDO_SEL_MASK
- btrfs: fix double free on ulist after backref resolution failure
- btrfs: fix mount failure caused by race with umount
- btrfs: fix page leaks after failure to lock page for delalloc
- bnxt_en: Fix race when modifying pause settings.
- hippi: Fix a size used in a 'pci_free_
path
- ax88172a: fix ax88172a_unbind() failures
- net: dp83640: fix SIOCSHWTSTAMP to update the struct with actual
configuration
- drm: sun4i: hdmi: Fix inverted HPD result
- net: smc91x: Fix possible memory leak in smc_drv_probe()
- bonding: check error value of register_
- mlxsw: destroy workqueue when trap_register in mlxsw_emad_init
- ipvs: fix the connection sync failed in some cases
- i2c: rcar: always clear ICSAR to avoid side effects
- bonding: check return value of register_
- serial: exar: Fix GPIO configuration for Sealevel cards based on XR17V35X
- scripts/
- HID: i...
Changed in linux (Ubuntu Bionic): | |
status: | Fix Committed → Fix Released |
Changed in kunpeng920: | |
status: | Fix Committed → Fix Released |
Patch 5f543a54eec0 ("net: hns3: fix for not calculating tx bd num correctly") fixes 3fe13ed95dd3 ("net: hns3: avoid mult + div op in critical data path"), which is merged into mainline since 5.1