[hns-1126] net: hns3: fix race conditions between reset and module loading & unloading
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kunpeng920 |
Fix Released
|
Undecided
|
Unassigned | ||
Ubuntu-18.04 |
Won't Fix
|
Undecided
|
Unassigned | ||
Ubuntu-18.04-hwe |
Fix Released
|
Undecided
|
Unassigned | ||
Ubuntu-19.04 |
Won't Fix
|
Undecided
|
Unassigned | ||
Ubuntu-19.10 |
Fix Released
|
Undecided
|
Unassigned | ||
Upstream-kernel |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Bug Description]
When doing reset and unloading driver at the same, there will be some problem,
such as NULL pointer panic, hardware error.
[Steps to Reproduce]
1.load PF & VF driver
2.run iperf & reset & bind & unbind
[Actual Results]
panic or hardware error.
[ 4392.974255] hns3 0000:7d:00.0: fail to instantiate client, ret = -16
[ 4392.976412] hns3 0000:7d:00.2: Reset done, hclge driver initialization finished.
[ 4392.980600] hns3 0000:7d:00.0: match and instantiation failed for port, ret = -16
[ 4392.995450] hns3 0000:7d:00.1: Device is busy in resetting state.
[ 4392.995450] please retry later.
[ 4393.004658] hns3 0000:7d:00.1: fail to instantiate client, ret = -16
[ 4393.011005] hns3 0000:7d:00.1: match and instantiation failed for port, ret = -16
[ 4393.018477] hns3 0000:7d:00.2: Device is busy in resetting state.
[ 4393.018477] please retry later.
[ 4393.019768] hns3 0000:7d:00.2: In reset process RoCE client reinit.
[ 4393.027686] hns3 0000:7d:00.2: fail to instantiate client, ret = -16
[ 4393.027689] hns3 0000:7d:00.2: match and instantiation failed for port, ret = -16
[ 4393.033972] Unable to handle kernel NULL pointer dereference at virtual address 000000000000043e
[ 4393.040285] hns3 0000:7d:00.3: Device is busy in resetting state.
[ 4393.040285] please retry later.
[ 4393.040286] hns3 0000:7d:00.3: fail to instantiate client, ret = -16
[ 4393.040287] hns3 0000:7d:00.3: match and instantiation failed for port, ret = -16
[ 4393.079540] Mem abort info:
[ 4393.082322] ESR = 0x96000004
[ 4393.085366] Exception class = DABT (current EL), IL = 32 bits
[ 4393.091274] SET = 0, FnV = 0
[ 4393.094317] EA = 0, S1PTW = 0
[ 4393.097447] Data abort info:
[ 4393.100314] ISV = 0, ISS = 0x00000004
[ 4393.104137] CM = 0, WnR = 0
[ 4393.107095] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000c87f115d
[ 4393.113698] [000000000000043e] pgd=00000000000
[ 4393.118566] Internal error: Oops: 96000004 [#1] SMP
[ 4393.123432] CPU: 11 PID: 30404 Comm: kworker/11:0 Tainted: G W OE 4.19.30-
[ 4393.134892] Hardware name: Huawei Technologies Co., Ltd. EVBCS/EVBCS, BIOS CS B078 1P TA 05/25/2019
Message from syslogd@localhos[ 4393.143928] Workqueue: events hclge_reset_
t at Feb 15 12:16:06 ...
kernel:[ 4393.118566] Internal error: Oops: 96000004 [#1] SMP
[ 4393.160589] pstate: 80c00009 (Nzcv daif +PAN +UAO)
[ 4393.165368] pc : hclge_ae_
[ 4393.170753] lr : 0xffff000000c38f7c
[ 4393.174227] sp : ffff8023801ebc40
[ 4393.177527] x29: ffff8023801ebc40 x28: 0000000000000000
[ 4393.182825] x27: 0000000000000000 x26: ffff8023ac29a698
[ 4393.188123] x25: 0000000000000001 x24: ffff80232a2f8500
[ 4393.193421] x23: 0000000000000041 x22: ffff802356a856ac
[ 4393.198719] x21: ffff8023ac29a5b0 x20: 0000000000000041
[ 4393.204016] x19: ffff802356a84000 x18: 0000000000000010
[ 4393.209314] x17: 000000002edd842f x16: 000000000a36b2a4
[ 4393.214611] x15: ffff0000895099df x14: 0000000000000004
[ 4393.219909] x13: ffff0000095099ed x12: ffff00000930b838
[ 4393.225207] x11: ffff8023801ebc40 x10: ffff8023801ebc40
[ 4393.230505] x9 : 00000000ffffffd8 x8 : fffffffffffffffe
[ 4393.235802] x7 : 0000000000000004 x6 : 0000000000000000
[ 4393.241100] x5 : 0000000000000004 x4 : ffff802356a8400f
[ 4393.246397] x3 : ffff0a00ffffff04 x2 : 584dcc94a82bab00
[ 4393.251695] x1 : ffff000000ae0a48 x0 : 0000000000000036
[ 4393.256993] Process kworker/11:0 (pid: 30404, stack limit = 0x000000004925a8db)
[ 4393.264286] Call trace:
[ 4393.266722] hclge_ae_
[ 4393.271758] 0xffff000000c430bc
[ 4393.274887] hclge_notify_
[ 4393.280099] hclge_reset+
[ 4393.284356] hclge_reset_
[ 4393.289743] process_
[ 4393.293738] worker_
[ 4393.297386] kthread+0x134/0x138
[ 4393.300601] ret_from_
[ 4393.304163] Code: d1120273 f9423e60 f9400bf3 a8c27bfd (b9440800)
[ 4393.310242] Modules linked in: hns_roce(OE) hns3_dfx(OE) rdma_ucm(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_uverbs(E) ib_core(OE) hns3(OE) hclge(OE) hnae3(OE) mem_drv(OE) [last unloaded: hns_roce_pci]
[ 4393.327442] ---[ end trace 525e14504e091414 ]---
[ 4393.332045] Kernel panic - not syncing: Fatal exception
[ 4393.337256] kernel fault(0x5) notification starting on CPU 11
[ 4393.342987] kernel fault(0x5) notification finished on CPU 11
[ 4393.348718] SMP: stopping secondary CPUs
[ 4393.352635] Kernel Offset: disabled
[ 4393.356110] CPU features: 0x2,a2a00a38
[ 4393.359844] Memory Limit: none
[ 4393.362896] kernel reboot(0x2) notification starting on CPU 11
[ 4393.368714] kernel reboot(0x2) notification finished on CPU 11
[ 4393.374533] ---[ end Kernel panic - not syncing: Fatal exception ]---
[Expected Results]
ok
[Reproducibility]
Inevitably
[Additional information]
Hardware: D06
Firmware: NA
Kernel: NA
[Resolution]
adds flag to indicate whether the client is registered, and does not
schedule reset task while unloading, also fixes some bugs.
net: hns3: fix race conditions between reset and module loading & unloading registered
net: hns3: fix a memory leak issue for hclge_map_
net: hns3: adjust hns3_uninit_phy()'s location in the hns3_client_
net: hns3: stop schedule reset service while unloading driver
net: hns3: add handshake with hardware while doing reset
net: hns3: use HCLGEVF_
net: hns3: use HCLGE_STATE_
net: hns3: use HCLGE_STATE_
description: | updated |
no longer affects: | kunpeng920/ubuntu-20.04 |
Changed in kunpeng920: | |
status: | Fix Committed → Fix Released |
Each of these commits were introduced upstream in v5.3. v5.3 will be the new HWE base kernel for 18.04.4.
Note that the current SRU cycle is targeted for 18.04.4: /lists. ubuntu. com/archives/ kernel- sru-announce/ 2019-October/ 000158. html
https:/
The "last-commit" date for this cycle was 11-Nov. Since 18.04.4 will switch the HWE kernel from 5.0 to 5.3, backporting these changes to the 5.0 branch would be of no benefit to Ubuntu LTS. Therefore, marking 19.04 "Won't Fix" and targeting Ubuntu-18.04-hwe to Ubuntu-18.04.4.