[hns3-0901]add hns3_gro_complete for HW GRO process

Bug #1893711 reported by Fred Kimmy on 2020-09-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kunpeng920
Undecided
Unassigned
Ubuntu-18.04
Critical
Ike Panhc
Ubuntu-18.04-hwe
Undecided
Unassigned
Ubuntu-20.04
Undecided
Unassigned
Upstream-kernel
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned
Bionic
High
Ike Panhc

Bug Description

[Impact]
kernel oops on hns3 driver when GRO is enabled.

[Fix]
Cherry-pick patches from upstream
d474d88f8826 net: hns3: add hns3_gro_complete for HW GRO process
a4d2cdcbb878 net: hns3: minor refactor for hns3_rx_checksum

[Test]
No known way to reproduce it in our lab. Regression test only.

[Regression Potential]
Patchset only affects hns3 driver. Minimal risk for other drivers and platform.
Stress test on hns3 driver looks good and we also have positive
feedback from different lab.
Patches also in Ubuntu kernel since Eoan and no regression observed.

------------------------
[Bug Description]
When a GRO packet is received by driver, the cwr field in the
struct tcphdr needs to be checked to decide whether to set the
SKB_GSO_TCP_ECN for skb_shinfo(skb)->gso_type.

[Steps to Reproduce]
1.load PF driver
2.turn off GRO of stack, turn on HW GRO

[Actual Results]
[ 32.597752] bond-dcn: link status definitely up for interface enp189s0f0, 10000 Mbps full duplex
[1048422.589438] Unable to handle kernel paging request at virtual address ffff806000005d0c
[1048422.597506] Mem abort info:
[1048422.600463] ESR = 0x96000005
[1048422.603679] Exception class = DABT (current EL), IL = 32 bits
[1048422.609747] SET = 0, FnV = 0
[1048422.612963] EA = 0, S1PTW = 0
[1048422.616265] Data abort info:
[1048422.619309] ISV = 0, ISS = 0x00000005
[1048422.623301] CM = 0, WnR = 0
[1048422.626431] swapper pgtable: 4k pages, 48-bit VAs, pgd = 0000000096615bf4
[1048422.633360] [ffff806000005d0c] *pgd=0000205fffff6003, *pud=0000000000000000
[1048422.640465] Internal error: Oops: 96000005 [#1] SMP
[1048422.645496] Modules linked in: bonding zfs(PO) zunicode(PO) zavl(PO) icp(PO) nls_iso8859_1 zcommon(PO) znvpair(PO) spl(O) joydev input_leds ipmi_ssif ipmi_si ipmi_devintf shpchp ipmi_msghandler cppc_cpufreq sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 xfs btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure hibmc_drm aes_ce_blk aes_ce_cipher ttm realtek crc32_ce drm_kms_helper crct10dif_ce syscopyarea ghash_ce hisi_sas_v3_hw sysfillrect sha2_ce sysimgblt hns3 nvme hisi_sas_main sha256_arm64 fb_sys_fops sha1_ce drm hclge libsas nvme_core ahci megaraid_sas hnae3 scsi_transport_sas libahci gpio_dwapb hid_generic
[1048422.715911] usbhid hid aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[1048422.723192] Process swapper/22 (pid: 0, stack limit = 0x00000000dc9798e5)
[1048422.730122] CPU: 22 PID: 0 Comm: swapper/22 Tainted: P O 4.15.0-96-generic #97-Ubuntu
[1048422.739297] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.35 04/30/2020
[1048422.747695] pstate: 80400009 (Nzcv daif +PAN -UAO)
[1048422.752641] pc : tcp_gro_complete+0x4c/0x80
[1048422.756988] lr : hns3_clean_rx_ring+0x63c/0x6f0 [hns3]
[1048422.762274] sp : ffff000009893d00
[1048422.765746] x29: ffff000009893d00 x28: ffffa05de384d900
[1048422.771207] x27: ffffa05dc660c6c0 x26: ffffa05dc7a6c280
[1048422.776668] x25: 0000000000000040 x24: ffffa05dc7a4e000
[1048422.782130] x23: 0000000000000002 x22: 0000000000000000
[1048422.787590] x21: 0000000000000000 x20: 0000000000000000
[1048422.793051] x19: ffffa05de384d900 x18: 0000ffffa3bf2a70
[1048422.798512] x17: 0000ffffa3b68698 x16: ffff000008307aa0
[1048422.803973] x15: 00000d920112ac4e x14: 0c96b6405c2a0a08
[1048422.809435] x13: 010100001cc0f601 x12: 188058b201fc85fd
[1048422.814896] x11: cd979f72c04ce5db x10: 2087e1db2087679d
[1048422.820358] x9 : 0640004090cff807 x8 : 00450008f034d971
[1048422.825820] x7 : 1502726647903506 x6 : 0000000000000002
[1048422.831281] x5 : ffffa05dc7ad0480 x4 : 0000000000000002
[1048422.836743] x3 : ffff805fffff5d00 x2 : 0000000000000060
[1048422.842203] x1 : ffff805fffff5f00 x0 : ffff806000005cff
[1048422.847665] Call trace:
[1048422.850276] tcp_gro_complete+0x4c/0x80
[1048422.854274] hns3_clean_rx_ring+0x63c/0x6f0 [hns3]
[1048422.859217] hns3_nic_common_poll+0x98/0x220 [hns3]
[1048422.864247] net_rx_action+0x160/0x3d8
[1048422.868153] __do_softirq+0x134/0x330
[1048422.871973] irq_exit+0xcc/0xe0
[1048422.875275] __handle_domain_irq+0x6c/0xc0
[1048422.879526] gic_handle_irq+0x84/0x180
[1048422.883431] el1_irq+0xe8/0x180
[1048422.886733] arch_cpu_idle+0x30/0x180
[1048422.890553] do_idle+0x138/0x1f0
[1048422.893941] cpu_startup_entry+0x28/0x30
[1048422.898022] secondary_start_kernel+0x114/0x128
[1048422.902705] Code: 79407a64 8b202060 39024262 79000c24 (39c03400)
[1048422.909033] SMP: stopping secondary CPUs
[1048422.918364] Starting crashdump kernel...
[1048422.922444] Bye!

[Expected Results]
GRO run ok

[Reproducibility]
Inevitably

[Additional information]
Hardware: D06
Firmware: NA
Kernel: NA

[Resolution]
Adds hns3_gro_complete to do that, and adds the
hns3_handle_bdinfo to handle the hns3_gro_complete and
hns3_rx_checksum.

net: hns3: add hns3_gro_complete for HW GRO process
net: hns3: minor refactor for hns3_rx_checksum

CVE References

Ike Panhc (ikepanhc) wrote :

d474d88f8826 <email address hidden> 2019-04-14 13:47:35 -0700 net: hns3: add hns3_gro_complete for HW GRO process
a4d2cdcbb878 <email address hidden> 2019-04-14 13:47:35 -0700 net: hns3: minor refactor for hns3_rx_checksum

Changed in kunpeng920:
status: New → In Progress
Ike Panhc (ikepanhc) wrote :

$ uname -a
Linux x6000 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:42:54 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
$ while true; do sudo ethtool -K enp189s0f0 gro off; sleep 1; sudo ethtool -K enp189s0f0 gro on; sleep 1; sudo ethtool -k enp189s0f0 | grep generic-receive-offload; done
generic-receive-offload: on
generic-receive-offload: on

Trying to reproduce.

Ike Panhc (ikepanhc) on 2020-09-01
tags: added: ikeradar
Ike Panhc (ikepanhc) wrote :

Patch d474d88f8826 depends on c376fa1aae632 ("net: hns3: add rx multicast packets statistic") and e8149933b1fa ("net: hns3: remove hnae3_get_bit in data path")

Ike Panhc (ikepanhc) wrote :

Since e8149933b1fa ("net: hns3: remove hnae3_get_bit in data path") does not have functional changes, we can drop the patch and backport patch d474d88f8826 without it.

Ike Panhc (ikepanhc) wrote :

Backport complete and branch has been push to

  https://kernel.ubuntu.com/git/ikepanhc/public.git/log/?h=lp1893711

Test kernel debs are also available at

  https://kernel.ubuntu.com/~ikepanhc/lp1893711/

Please install all debs there and test if this issue has been fixed.

Ike Panhc (ikepanhc) wrote :

Set to incomplete and wait for test result.

Changed in kunpeng920:
status: In Progress → Incomplete
Ike Panhc (ikepanhc) wrote :

Get report that test kernel working fine. Regression test looks good also.

Changed in kunpeng920:
status: Incomplete → In Progress
Ike Panhc (ikepanhc) on 2020-10-08
Changed in linux (Ubuntu):
status: New → In Progress
assignee: nobody → Ike Panhc (ikepanhc)
Changed in linux (Ubuntu Bionic):
status: New → In Progress
assignee: nobody → Ike Panhc (ikepanhc)
Changed in linux (Ubuntu):
assignee: Ike Panhc (ikepanhc) → nobody
status: In Progress → Fix Released
Ike Panhc (ikepanhc) on 2020-10-08
description: updated
Stefan Bader (smb) on 2020-10-08
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
Ian (ian-may) on 2020-10-08
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in kunpeng920:
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Ike Panhc (ikepanhc) wrote :

Thanks. 4.15.0-125.128 works for me.

tags: added: verification-done-bionic
removed: ikeradar verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (33.1 KiB)

This bug was fixed in the package linux - 4.15.0-126.129

---------------
linux (4.15.0-126.129) bionic; urgency=medium

  * bionic/linux: 4.15.0-126.129 -proposed tracker (LP: #1905305)

  * CVE-2020-4788
    - SAUCE: powerpc/64s: Define MASKABLE_RELON_EXCEPTION_PSERIES_OOL
    - SAUCE: powerpc/64s: move some exception handlers out of line
    - powerpc/64s: flush L1D on kernel entry
    - SAUCE: powerpc: Add a framework for user access tracking
    - powerpc: Implement user_access_begin and friends
    - powerpc: Fix __clear_user() with KUAP enabled
    - powerpc/uaccess: Evaluate macro arguments once, before user access is
      allowed
    - powerpc/64s: flush L1D after user accesses

linux (4.15.0-125.128) bionic; urgency=medium

  * bionic/linux: 4.15.0-125.128 -proposed tracker (LP: #1903137)

  * Update kernel packaging to support forward porting kernels (LP: #1902957)
    - [Debian] Update for leader included in BACKPORT_SUFFIX

  * Avoid double newline when running insertchanges (LP: #1903293)
    - [Packaging] insertchanges: avoid double newline

  * EFI: Fails when BootCurrent entry does not exist (LP: #1899993)
    - efivarfs: Replace invalid slashes with exclamation marks in dentries.

  * CVE-2020-14351
    - perf/core: Fix race in the perf_mmap_close() function

  * raid10: Block discard is very slow, causing severe delays for mkfs and
    fstrim operations (LP: #1896578)
    - md: add md_submit_discard_bio() for submitting discard bio
    - md/raid10: extend r10bio devs to raid disks
    - md/raid10: pull codes that wait for blocked dev into one function
    - md/raid10: improve raid10 discard request
    - md/raid10: improve discard request for far layout

  * Bionic: btrfs: kernel BUG at /build/linux-
    eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254)
    - btrfs: use offset_in_page instead of open-coding it
    - btrfs: use BUG() instead of BUG_ON(1)
    - btrfs: drop unnecessary offset_in_page in extent buffer helpers
    - btrfs: extent_io: do extra check for extent buffer read write functions
    - btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()
    - btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref()
    - btrfs: ctree: check key order before merging tree blocks

  * Bionic update: upstream stable patchset 2020-11-04 (LP: #1902943)
    - USB: gadget: f_ncm: Fix NDP16 datagram validation
    - gpio: tc35894: fix up tc35894 interrupt configuration
    - vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock
    - vsock/virtio: stop workers during the .remove()
    - vsock/virtio: add transport parameter to the
      virtio_transport_reset_no_sock()
    - net: virtio_vsock: Enhance connection semantics
    - Input: i8042 - add nopnp quirk for Acer Aspire 5 A515
    - ftrace: Move RCU is watching check after recursion check
    - drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config
    - drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices
    - drm/sun4i: mixer: Extend regmap max_register
    - net: dec: de2104x: Increase receive ring size for Tulip
    - rndis_host: increase sleep time in the query-response loop
    - nvme-core: get/put ctrl ...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Ike Panhc (ikepanhc) on 2020-12-02
Changed in kunpeng920:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers