[hns3-0114]net: hns3: fix a wrong reset interrupt status mask

Bug #1859570 reported by Fred Kimmy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kunpeng920
Fix Released
Undecided
Unassigned
Ubuntu-18.04
Won't Fix
Undecided
Unassigned
Ubuntu-18.04-hwe
Fix Released
Undecided
Unassigned
Upstream-kernel
Fix Released
Undecided
Unassigned

Bug Description

[Bug Description]
There is timing window between ring_space checking and
netif_stop_subqueue when transmiting a SKB, and the TX BD
cleaning may be executed during the time window, which may
caused TX queue not restarted problem.

[Steps to Reproduce]
1.run IO with high throughput

[Actual Results]
cause tx timeout

[Expected Results]
IO ok

[Reproducibility]
Inevitably

[Additional information]
Hardware: D06
Firmware: NA
Kernel: NA

[Resolution]
This patch fixes it by rechecking the ring_space after
netif_stop_subqueue to make sure TX queue is restarted.

Also, the ring->next_to_clean is updated even when pkts is
zero, because all the TX BD cleaned may be non-SKB, so it
needs to check if TX queue need to be restarted.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Is this patch the fix? It can not match the bug description.

commit 74e78d6bae1904e87469da5ed87e9f6bd1131f46
Author: Huazhong Tan <email address hidden>
Date: Wed Nov 20 10:07:15 2019 +0800

    net: hns3: fix a wrong reset interrupt status mask

    According to hardware user manual, bits5~7 in register
    HCLGE_MISC_VECTOR_INT_STS means reset interrupts status,
    but HCLGE_RESET_INT_M is defined as bits0~2 now. So it
    will make hclge_reset_err_handle() read the wrong reset
    interrupt status.

    This patch fixes this wrong bit mask.

    Fixes: 2336f19d7892 ("net: hns3: check reset interrupt status when reset fails")
    Signed-off-by: Huazhong Tan <email address hidden>
    Signed-off-by: David S. Miller <email address hidden>

Changed in kunpeng920:
status: New → Incomplete
Revision history for this message
Ike Panhc (ikepanhc) wrote :

If patch 74e78d6bae ("net: hns3: fix a wrong reset interrupt status mask") is the fix, it is not necessary to backport to bionic GA kernel since it's fixes 2336f19d7892 ("net: hns3: check reset interrupt status when reset fails") is not applied to bionic GA kernel.

And patch 74e78d6bae is already in focal GA kernel.

Revision history for this message
Fred Kimmy (kongzizaixian) wrote :

this two pathset should merge it. this patch 74e78d6bae ("net: hns3: fix a wrong reset interrupt status mask") also fix other bug such as hns resetting status.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

ikepanhc@ubuntu:~/ubuntu-focal$ gitoneline | grep -e 'fix a wrong reset interrupt status mask' -e 'check reset interrupt status when reset fails'
74e78d6bae19 <email address hidden> 2019-11-19 19:09:53 -0800 net: hns3: fix a wrong reset interrupt status mask
2336f19d7892 <email address hidden> 2019-08-29 16:57:44 -0700 net: hns3: check reset interrupt status when reset fails

Patches in comment #2 already in focal kernel, so it will be in bionic HWE kernel when 18.04.5 release

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Patch 74e78d6bae19 ("net: hns3: fix a wrong reset interrupt status mask") fixes 2336f19d7892 ("net: hns3: check reset interrupt status when reset fails")

Patch 2336f19d7892 ("net: hns3: check reset interrupt status when reset fails") fixes 72e2fb07997c ("net: hns3: clear reset interrupt status in hclge_irq_handle()")

Patch 72e2fb07997c ("net: hns3: clear reset interrupt status in hclge_irq_handle()") fixes 4ed340ab8f49 ("net: hns3: Add reset process in hclge_main")

But patch 72e2fb07997c fails to clean cherry-pick.

ikepanhc@ubuntu:~/ubuntu-bionic$ git cherry-pick -x -s 72e2fb07997c
Auto-merging drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
Auto-merging drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
CONFLICT (content): Merge conflict in drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
Auto-merging drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
CONFLICT (content): Merge conflict in drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
Auto-merging drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your merge.renamelimit variable to at least 10159 and retry the command.
error: could not apply 72e2fb07997c... net: hns3: clear reset interrupt status in hclge_irq_handle()
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'

Since we can run iperf testing on hns3 and get > 80% of limitation for an hour and multiple times, I don't think this is a issue we need to fix for bionic GA kernel.

Changed in kunpeng920:
status: Incomplete → Fix Committed
Ike Panhc (ikepanhc)
Changed in kunpeng920:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.