HiSilicon HNS3 ethernet broken

Bug #1892347 reported by torel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kunpeng920
Incomplete
Undecided
Unassigned
Ubuntu-18.04
Incomplete
Undecided
Unassigned
linux (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

[Impact]
The TM210 (verified) and TM280 (probably) driver hns3 is broken in Ubuntu 18.04.5 LTS kernel 4.15.0-112-generic. Server Huawei TM200-2280 with Kunpeng920 SOCs. Huawei provides binary distributed driver NIC-hisi_eth-Ubuntu18.04.1-hns3-1.0.2-aarch64.deb but it is only for kernel 4.15.0-29-generic.

root@n012:~# uname -ar
Linux n012 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:42:54 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

root@n012:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic

root@n012:~# dmesg |grep hns3
[ 3.775711] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version
[ 3.789796] hns3: Copyright (c) 2017 Huawei Corporation.
[ 4.295868] hns3 0000:7d:00.0: The firmware version is 01092806
[ 4.395325] hns3 0000:7d:00.0 eth0: No phy led trigger registered for speed(-1)
[ 4.498584] hns3 0000:7d:00.1: The firmware version is 01092806
[ 4.634770] hns3 0000:7d:00.1 eth1: No phy led trigger registered for speed(-1)
[ 4.671546] hns3 0000:7d:00.2: The firmware version is 01092806
[ 4.791311] hns3 0000:7d:00.2 eth2: No phy led trigger registered for speed(-1)
[ 4.813538] hns3 0000:7d:00.3: The firmware version is 01092806
[ 4.915305] hns3 0000:7d:00.3 eth3: No phy led trigger registered for speed(-1)
[ 4.937256] hns3 0000:bd:00.0: The firmware version is 01092806
[ 4.994060] hns3 0000:bd:00.1: The firmware version is 01092806
[ 5.049951] hns3 0000:bd:00.2: The firmware version is 01092806
[ 5.107165] hns3 0000:bd:00.3: The firmware version is 01092806
[ 5.159285] hns3 0000:7d:00.0 enp125s0f0: renamed from eth0
[ 5.379348] hns3 0000:bd:00.2 enp189s0f2: renamed from eth6
[ 5.435880] hns3 0000:bd:00.1 enp189s0f1: renamed from eth5
[ 5.903915] hns3 0000:7d:00.3 enp125s0f3: renamed from eth3
[ 5.999350] hns3 0000:7d:00.1 enp125s0f1: renamed from eth1
[ 6.155353] hns3 0000:7d:00.2 enp125s0f2: renamed from eth2
[ 6.295332] hns3 0000:bd:00.0 enp189s0f0: renamed from eth4
[ 6.443835] hns3 0000:bd:00.3 enp189s0f3: renamed from eth7
[ 18.031167] hns3 0000:7d:00.0 enp125s0f0: link up
[77661.965968] beegfs: enabling unsafe global rkey
[79999.642438] hns3 0000:7d:00.0: PPU_PF_ABNORMAL_INT_ST over_8bd_no_fe found [error status=0x1]
[79999.642466] hns3 0000:7d:00.0: PF Reset requested
[79999.642491] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[79999.650298] hns3 0000:7d:00.0: inform reset to vf(1) failed -5!
[79999.650315] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[79999.654571] hns3 0000:7d:00.0: inform reset to vf(2) failed -5!
[79999.654588] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[79999.658807] hns3 0000:7d:00.0: inform reset to vf(3) failed -5!
[79999.689650] hns3 0000:7d:00.0 enp125s0f0: link down
[79999.797516] hns3 0000:7d:00.0: prepare wait ok
[79999.908488] hns3 0000:7d:00.0: The firmware version is 01092806
[79999.915807] hns3 0000:7d:00.0: Reset done, hclge driver initialization finished.
[79999.945923] hns3 0000:7d:00.0: PPU_PF_ABNORMAL_INT_ST over_8bd_no_fe found [error status=0x1]
[79999.945976] hns3 0000:7d:00.0: PF Reset requested
[79999.946065] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[79999.950200] hns3 0000:7d:00.0: inform reset to vf(1) failed -5!
[79999.950218] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[79999.954274] hns3 0000:7d:00.0: inform reset to vf(2) failed -5!
[79999.954292] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[79999.958067] hns3 0000:7d:00.0: inform reset to vf(3) failed -5!
[80000.093493] hns3 0000:7d:00.0: prepare wait ok
[80000.203854] hns3 0000:7d:00.0: The firmware version is 01092806
[80000.210947] hns3 0000:7d:00.0: Reset done, hclge driver initialization finished.
[80001.269514] hns3 0000:7d:00.0 enp125s0f0: link up
[80001.269832] hns3 0000:7d:00.0: PPU_PF_ABNORMAL_INT_ST over_8bd_no_fe found [error status=0x1]
[80001.269858] hns3 0000:7d:00.0: PF Reset requested
[80001.269881] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80001.273380] hns3 0000:7d:00.0: inform reset to vf(1) failed -5!
[80001.273401] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80001.276876] hns3 0000:7d:00.0: inform reset to vf(2) failed -5!
[80001.276902] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80001.280295] hns3 0000:7d:00.0: inform reset to vf(3) failed -5!
[80001.305621] hns3 0000:7d:00.0 enp125s0f0: link down
[80001.413473] hns3 0000:7d:00.0: prepare wait ok
[80001.523836] hns3 0000:7d:00.0: The firmware version is 01092806
[80001.530925] hns3 0000:7d:00.0: Reset done, hclge driver initialization finished.
[80002.581453] hns3 0000:7d:00.0 enp125s0f0: link up
[80002.869622] hns3 0000:7d:00.0: PPU_PF_ABNORMAL_INT_ST over_8bd_no_fe found [error status=0x1]
[80002.869649] hns3 0000:7d:00.0: PF Reset requested
[80002.869688] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80002.872958] hns3 0000:7d:00.0: inform reset to vf(1) failed -5!
[80002.872980] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80002.876161] hns3 0000:7d:00.0: inform reset to vf(2) failed -5!
[80002.876187] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80002.879278] hns3 0000:7d:00.0: inform reset to vf(3) failed -5!
[80002.905600] hns3 0000:7d:00.0 enp125s0f0: link down
[80003.013444] hns3 0000:7d:00.0: prepare wait ok
[80003.123765] hns3 0000:7d:00.0: The firmware version is 01092806
[80003.131051] hns3 0000:7d:00.0: Reset done, hclge driver initialization finished.
[80004.181481] hns3 0000:7d:00.0 enp125s0f0: link up
[80006.229759] hns3 0000:7d:00.0: PPU_PF_ABNORMAL_INT_ST over_8bd_no_fe found [error status=0x1]
[80006.229785] hns3 0000:7d:00.0: PF Reset requested
[80006.229808] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80006.232868] hns3 0000:7d:00.0: inform reset to vf(1) failed -5!
[80006.232889] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80006.235955] hns3 0000:7d:00.0: inform reset to vf(2) failed -5!
[80006.235980] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80006.238986] hns3 0000:7d:00.0: inform reset to vf(3) failed -5!
[80006.265515] hns3 0000:7d:00.0 enp125s0f0: link down
[80006.373383] hns3 0000:7d:00.0: prepare wait ok
[80006.483732] hns3 0000:7d:00.0: The firmware version is 01092806
[80006.490824] hns3 0000:7d:00.0: Reset done, hclge driver initialization finished.
[80007.541401] hns3 0000:7d:00.0 enp125s0f0: link up
[80013.141464] hns3 0000:7d:00.0: PPU_PF_ABNORMAL_INT_ST over_8bd_no_fe found [error status=0x1]
[80013.141489] hns3 0000:7d:00.0: PF Reset requested
[80013.141528] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80013.144375] hns3 0000:7d:00.0: inform reset to vf(1) failed -5!
[80013.144396] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80013.147296] hns3 0000:7d:00.0: inform reset to vf(2) failed -5!
[80013.147319] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[80013.150041] hns3 0000:7d:00.0: inform reset to vf(3) failed -5!
[80013.177392] hns3 0000:7d:00.0 enp125s0f0: link down
[80013.285261] hns3 0000:7d:00.0: prepare wait ok
[80013.395611] hns3 0000:7d:00.0: The firmware version is 01092806
[80013.402701] hns3 0000:7d:00.0: Reset done, hclge driver initialization finished.
[80014.453270] hns3 0000:7d:00.0 enp125s0f0: link up

[Test Case]
dmesg | egrep -i -e "hns3|enp125s0f0"

[81688.574030] hns3 0000:7d:00.0 enp125s0f0: link up
[81694.458120] hns3 0000:7d:00.0 enp125s0f0: link down
[81695.741981] hns3 0000:7d:00.0 enp125s0f0: link up
[81708.794075] hns3 0000:7d:00.0 enp125s0f0: link down
[81710.077966] hns3 0000:7d:00.0 enp125s0f0: link up
[81738.489985] hns3 0000:7d:00.0 enp125s0f0: link down
[81743.869872] hns3 0000:7d:00.0 enp125s0f0: link up

[Fix]
Backport hns3 driver from linux-hwe linux-image-5.4.0-42-generic to linux-image-4.15.0-xxx-generic (Ubuntu 18.04.5LTS or later).

[Regression Risk]
Restricted to the hns3 driver, which is only used by certain HiSilicon SOCs.

Other software dependencies (BeeGFS) does not currently allow us to upgrade to 5.x kernels.
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Aug 21 11:55 seq
 crw-rw---- 1 root audio 116, 33 Aug 21 11:55 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.16
Architecture: arm64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
InstallationDate: Installed on 2020-05-24 (88 days ago)
InstallationMedia: Ubuntu-Server 18.04.4 LTS "Bionic Beaver" - Release arm64 (20200203.1)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
MachineType: Huawei TaiShan 200 (Model 2280)
Package: linux (not installed)
PciMultimedia:

ProcFB:
 0 EFI VGA
 1 hibmcdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-112-generic root=UUID=b6ea2069-8977-4f07-8391-b326a06dd584 ro
ProcVersionSignature: Ubuntu 4.15.0-112.113-generic 4.15.18
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-112-generic N/A
 linux-backports-modules-4.15.0-112-generic N/A
 linux-firmware 1.173.19
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic
Uname: Linux 4.15.0-112-generic aarch64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: video
_MarkForUpload: True
dmi.bios.date: 07/04/2020
dmi.bios.vendor: Huawei Corp.
dmi.bios.version: 1.38
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: BC82AMDD
dmi.board.vendor: Huawei
dmi.board.version: V200R002C00
dmi.chassis.asset.tag: To be filled by O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Huawei
dmi.chassis.version: To be filled by O.E.M.
dmi.modalias: dmi:bvnHuaweiCorp.:bvr1.38:bd07/04/2020:svnHuawei:pnTaiShan200(Model2280):pvrTobefilledbyO.E.M.:rvnHuawei:rnBC82AMDD:rvrV200R002C00:cvnHuawei:ct17:cvrTobefilledbyO.E.M.:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: TaiShan 200 (Model 2280)
dmi.product.version: To be filled by O.E.M.
dmi.sys.vendor: Huawei

Revision history for this message
torel (torehl) wrote :

Note! ntttcp is not sufficient to trigger issue. Some heavy MPI or NFS traffic is needed to trigger issue.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1892347

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Revision history for this message
torel (torehl) wrote :

Installed kernel 4.15.0-29-generic and Huawei's propriatory hns3 driver package NIC-hisi_eth-Ubuntu18.04.1-hns3-1.0.2-aarch64.deb which obviously is 1.0.2 rather that 1.0 on 4.15.0-112 and 5.4.0-42. Obviously the hns3 1.0 driver in 5.4.0-42 is different to the 1.0 driver in 4.15.0-112.

Anyway, using downrev kernel and binary hse3 drver seem to work fine, but would like to be on latest kernel for security reasons.

root@n009:~# uname -ar
Linux n009 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:41:03 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux

root@n009:~# modinfo hns3
filename: /lib/modules/4.15.0-29-generic/updates/drivers/net/ethernet/hisilicon/hns3/hns3.ko
version: 1.0.2
alias: pci:hns-nic
license: GPL
author: Huawei Tech. Co., Ltd.
description: HNS3: Hisilicon Ethernet Driver
srcversion: 0EF59799F6C1BB69074D893
alias: pci:v000019E5d0000A22Fsv*sd*bc*sc*i*
alias: pci:v000019E5d0000A22Esv*sd*bc*sc*i*
alias: pci:v000019E5d0000A226sv*sd*bc*sc*i*
alias: pci:v000019E5d0000A225sv*sd*bc*sc*i*
alias: pci:v000019E5d0000A224sv*sd*bc*sc*i*
alias: pci:v000019E5d0000A223sv*sd*bc*sc*i*
alias: pci:v000019E5d0000A222sv*sd*bc*sc*i*
alias: pci:v000019E5d0000A221sv*sd*bc*sc*i*
alias: pci:v000019E5d0000A220sv*sd*bc*sc*i*
depends: hnae3
name: hns3
vermagic: 4.15.0-29-generic SMP mod_unload aarch64
parm: debug: Network interface message level setting (int)

root@n009:~# dmesg | grep hns3
[ 4.067119] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version
[ 4.067119] hns3: Copyright (c) 2017 Huawei Corporation.
[ 4.859148] hns3 0000:7d:00.0 enp125s0f0: renamed from eth0
[ 4.909343] hns3 0000:7d:00.2 enp125s0f2: renamed from eth2
[ 5.086063] hns3 0000:bd:00.2 enp189s0f2: renamed from eth6
[ 5.169404] hns3 0000:7d:00.3 enp125s0f3: renamed from eth3
[ 5.349699] hns3 0000:7d:00.1 enp125s0f1: renamed from eth1
[ 5.437326] hns3 0000:bd:00.3 enp189s0f3: renamed from eth7
[ 5.589360] hns3 0000:bd:00.0 enp189s0f0: renamed from eth4
[ 5.709287] hns3 0000:bd:00.1 enp189s0f1: renamed from eth5
[ 11.575815] hns3 0000:7d:00.0 enp125s0f0: net open
[ 15.885156] hns3 0000:7d:00.0 enp125s0f0: link up

Revision history for this message
torel (torehl) wrote : CRDA.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
torel (torehl) wrote : CurrentDmesg.txt

apport information

Revision history for this message
torel (torehl) wrote : Lspci.txt

apport information

Revision history for this message
torel (torehl) wrote : Lsusb.txt

apport information

Revision history for this message
torel (torehl) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
torel (torehl) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
torel (torehl) wrote : ProcEnviron.txt

apport information

Revision history for this message
torel (torehl) wrote : ProcInterrupts.txt

apport information

Revision history for this message
torel (torehl) wrote : ProcModules.txt

apport information

Revision history for this message
torel (torehl) wrote : UdevDb.txt

apport information

Revision history for this message
torel (torehl) wrote : WifiSyslog.txt

apport information

Revision history for this message
torel (torehl) wrote :

Had to login on different interface (ib0) and unload and load hns3 module to get primary network up. Then I could run apport-collect 1892347.

Changed in kunpeng920:
status: New → Incomplete
Revision history for this message
Fred Kimmy (kongzizaixian) wrote :

pls refer to this follow link issue:

https://bugs.launchpad.net/kunpeng920/+bug/1859756

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Hi torel,

As mentioned in #16, this issue might be solved in -proposed kernel. Would you might test again with bionic-proposed kernel?

Please use follow command to enable -proposed pocket for your Ubuntu system

$ sudo apt-add-repository "deb http://ports.ubuntu.com/ubuntu-ports bionic-proposed main"
$ sudo apt update
$ sudo apt dist-upgrade

and then reboot for -proposed kernel.

Revision history for this message
torel (torehl) wrote :

Ike, Do you mean the 4.15.0-103 kernel in https://kernel.ubuntu.com/~ikepanhc/lp1859756/? Linux.-hwe 5.4 is not an option for me as it breaks parallel filesystem BeeGFS.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Hi Torel,

No. please get the kernel debs from Ubuntu archive. Here are the steps

1) sudo apt-add-repository "deb http://ports.ubuntu.com/ubuntu-ports bionic-proposed main"
2) sudo apt update
3) sudo apt dist-upgrade

Let me know if you need to manual install kernel debs and I can put those debs somewhere you can download.

Revision history for this message
torel (torehl) wrote :

Tested 4.15.0-114-generic it works fine. Thx!

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Thanks Torel,

4.15.0-114.115 will be released to -update on August 31th according to kernel SRU schedule[1].
All you need to do is `sudo apt update;sudo apt upgrade` and you can use 4.15 kernel with fix.

Set this bug as duplicate of bug 1859756

[1] https://kernel.ubuntu.com/

Revision history for this message
torel (torehl) wrote :

Small comment; - Any reason why hns3 is not version 1.0.2 as the proprietary driver for 18.04.1 LTS kernel 4.15.0-29-generic?

root@n012:~# find /lib/modules/4.15.0-29-generic/updates/drivers/net/ethernet/hisilicon/hns3 -name "*.ko" -type f -exec modinfo {} \; | egrep -i -e "name|version|srcv"
filename: /lib/modules/4.15.0-29-generic/updates/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge.ko
version: 1.0.2
srcversion: C608A5D8EF6F248597CA9E5
name: hclge
filename: /lib/modules/4.15.0-29-generic/updates/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf.ko
version: 1.0.2
srcversion: 83B44FBC5FE1D6324B5F3D6
name: hclgevf
filename: /lib/modules/4.15.0-29-generic/updates/drivers/net/ethernet/hisilicon/hns3/hnae3.ko
version: 1.0.2
srcversion: 9097930440A454C1B6A0217
name: hnae3
filename: /lib/modules/4.15.0-29-generic/updates/drivers/net/ethernet/hisilicon/hns3/hns3.ko
version: 1.0.2
srcversion: 0EF59799F6C1BB69074D893
name: hns3

Revision history for this message
dann frazier (dannf) wrote :

TLDR; ignore in-tree driver version strings, they don't really mean anything.

I have no idea what the proprietary driver is or how it is versioned, so I can't speak to that. I can say that version strings for Linux drivers are pretty arbitrary. There's no clear waterline for when a driver should go from say 1.0 to 2.0 - at some point a driver maintainer just submits a patch to change that string using whatever criteria they use. There could be 1000s of code changes to a driver in between version changes. And there are likely many drivers out there that claim to be the same version but have drastically different features/fixes.

Consider vendor trees, like Ubuntu. We will obviously start out w/ the same version of the driver as the upstream base kernel (v4.15 here), but we're possibly going to backport additional features (in early devel) and maintenance fixes (for ~10 years of support). And for hns3, we have applied many so far:

$ git log --oneline v4.15..Ubuntu-4.15.0-114.115 drivers/net/ethernet/hisilicon/hns3 | wc -l
588

But should this change our driver version? If it did, what would that tell you? Our driver won't be the same as upstream's, Red Hat's, etc. All you can say is that it is the version that shipped with Ubuntu-4.15.0-114.115. If you want to know more, you can look at change control to see if it has the changes you are looking for.

Now, that said, version strings may provide useful context for hardware manufacturers that provide out-of-tree drivers. They may provide regular driver releases that can be used with Ubuntu 18.04. In that case, they know what exactly was in version 1.0foo.3 and what issues would be fixed if you upgraded to 1.0foo.4 - but that's a different release model.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.