qlcnic firmware hang detected kvm ganeti
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
1) Ubuntu release:
Description: Ubuntu 16.04.3 LTS
Release: 16.04
2) Package version:
* linux-image-
* Also with HWE kernel (4.10.x)
3) What I expect:
I have a 10G interface (HP NC523SFP 10Gb 2-port) in a HP ProLiant DL380p Gen8, BIOS P70 07/01/2015. The interface is configured using the module qlcnic and it works with the names ens2f0 and ens2f1. They also have VLANs configured.
I have installed Ganeti software and bridges over those interfaces, br-dmz over ens2f0 and br-str over ens2f1.
Everything should work without connectivity loss.
4) What happened instead:
The interface loses the connectivity from time to time, although it recovers itself, with the following error:
Oct 12 18:23:14 mazinger kernel: [107906.678468] qlcnic 0000:07:00.1: Pause control frames disabled on all ports
Oct 12 18:23:14 mazinger kernel: [107906.678470] qlcnic 0000:07:00.0: Pause control frames disabled on all ports
Oct 12 18:23:14 mazinger kernel: [107906.678475] qlcnic 0000:07:00.0: firmware hang detected
Oct 12 18:23:14 mazinger kernel: [107906.678482] qlcnic 0000:07:00.0: Dumping hw/fw registers
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3e1f80,
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_0_PC: 0x6d920, PEG_NET_1_PC: 0x6d976,
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6edbe,
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_4_PC: 0x1e2f3
Oct 12 18:23:14 mazinger kernel: [107906.680107] qlcnic 0000:07:00.1: firmware hang detected
Oct 12 18:23:14 mazinger kernel: [107906.680385] qlcnic 0000:07:00.1: Dumping hw/fw registers
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3e1f80,
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_0_PC: 0x6d920, PEG_NET_1_PC: 0x6d976,
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6edbe,
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_4_PC: 0x1e2f3
Oct 12 18:23:14 mazinger kernel: [107906.695571] br-dmz: port 1(ens2f0.2) entered disabled state
Oct 12 18:23:15 mazinger kernel: [107907.690629] br-str: port 1(ens2f1.10) entered disabled state
Oct 12 18:23:16 mazinger kernel: [107908.706988] qlcnic 0000:07:00.1: Detected state change from DEV_NEED_RESET, skipping ack check
Oct 12 18:23:17 mazinger kernel: [107909.423713] qlcnic 0000:07:00.0 ens2f0: Dump data 15044136 bytes captured, dump data address = ffffc900334c3000, template header size 36864 bytes, template address = ffffc900193da000
Oct 12 18:23:21 mazinger kernel: [107912.800338] qlcnic 0000:07:00.0: loading firmware from flash
Oct 12 18:23:27 mazinger kernel: [107919.137580] qlcnic 0000:07:00.0: Driver v5.3.63, firmware v4.20.1
Oct 12 18:23:27 mazinger kernel: [107919.501555] qlcnic 0000:07:00.1: Driver v5.3.63, firmware v4.20.1
Oct 12 18:23:28 mazinger kernel: [107920.425737] qlcnic 0000:07:00.0 ens2f0: Rx Context[0] Created, state 0x2
Oct 12 18:23:28 mazinger kernel: [107920.435780] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8000] Created, state 0x2
Oct 12 18:23:28 mazinger kernel: [107920.453103] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8008] Created, state 0x2
Oct 12 18:23:29 mazinger kernel: [107921.598651] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x800a] Created, state 0x2
Oct 12 18:23:29 mazinger kernel: [107921.615752] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x800c] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.196706] qlcnic 0000:07:00.1 ens2f1: Rx Context[1] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.406680] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x8001] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.422646] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x8009] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.439890] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x800b] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.456417] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x800d] Created, state 0x2
Oct 12 18:23:31 mazinger kernel: [107923.500128] qlcnic 0000:07:00.0 ens2f0: NIC Link is up
Oct 12 18:23:31 mazinger kernel: [107923.500360] br-dmz: port 1(ens2f0.2) entered forwarding state
Oct 12 18:23:31 mazinger kernel: [107923.500375] br-dmz: port 1(ens2f0.2) entered forwarding state
Oct 12 18:23:31 mazinger kernel: [107923.500680] qlcnic 0000:07:00.1 ens2f1: NIC Link is up
Oct 12 18:23:31 mazinger kernel: [107923.500971] br-str: port 1(ens2f1.10) entered forwarding state
Oct 12 18:23:31 mazinger kernel: [107923.500985] br-str: port 1(ens2f1.10) entered forwarding state
---------------
Sometimes it also has kernel errors and need to be rebooted to recover the connectivity:
Oct 9 14:36:41 mazinger kernel: [262273.497512] ------------[ cut here ]------------
Oct 9 14:36:41 mazinger kernel: [262273.497821] WARNING: CPU: 6 PID: 0 at /build/
Oct 9 14:36:41 mazinger kernel: [262273.498083] NETDEV WATCHDOG: ens2f0 (qlcnic): transmit queue 0 timed out
Oct 9 14:36:41 mazinger kernel: [262273.498579] Modules linked in: joydev binfmt_misc hpwdt ipmi_ssif bridge intel_rapl x86_pkg_
Oct 9 14:36:41 mazinger kernel: [262273.498651] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.4.0-96-generic #119-Ubuntu
Oct 9 14:36:41 mazinger kernel: [262273.498652] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
Oct 9 14:36:41 mazinger kernel: [262273.498654] 0000000000000286 fc090740aa4761f7 ffff881fbf783d98 ffffffff813fabd3
Oct 9 14:36:41 mazinger kernel: [262273.498666] ffff881fbf783de0 ffffffff81d715f8 ffff881fbf783dd0 ffffffff810812e2
Oct 9 14:36:41 mazinger kernel: [262273.498668] 0000000000000000 ffff881fade31b00 0000000000000006 ffff881fade30000
Oct 9 14:36:41 mazinger kernel: [262273.498681] Call Trace:
Oct 9 14:36:41 mazinger kernel: [262273.498683] <IRQ> [<ffffffff813fa
Oct 9 14:36:41 mazinger kernel: [262273.498691] [<ffffffff81081
Oct 9 14:36:41 mazinger kernel: [262273.498693] [<ffffffff81081
Oct 9 14:36:41 mazinger kernel: [262273.498697] [<ffffffff8175e
Oct 9 14:36:41 mazinger kernel: [262273.498700] [<ffffffff8175e
Oct 9 14:36:41 mazinger kernel: [262273.498705] [<ffffffff810ed
Oct 9 14:36:41 mazinger kernel: [262273.498708] [<ffffffff8175e
Oct 9 14:36:41 mazinger kernel: [262273.498711] [<ffffffff810ed
Oct 9 14:36:41 mazinger kernel: [262273.498714] [<ffffffff81085
Oct 9 14:36:41 mazinger kernel: [262273.498717] [<ffffffff81086
Oct 9 14:36:41 mazinger kernel: [262273.498721] [<ffffffff81845
Oct 9 14:36:41 mazinger kernel: [262273.498724] [<ffffffff81843
Oct 9 14:36:41 mazinger kernel: [262273.498726] <EOI> [<ffffffff816d6
Oct 9 14:36:41 mazinger kernel: [262273.498731] [<ffffffff816d6
Oct 9 14:36:41 mazinger kernel: [262273.498735] [<ffffffff810c4
Oct 9 14:36:41 mazinger kernel: [262273.498737] [<ffffffff816d6
Oct 9 14:36:41 mazinger kernel: [262273.498739] [<ffffffff810c4
Oct 9 14:36:41 mazinger kernel: [262273.498743] [<ffffffff81051
Oct 9 14:36:41 mazinger kernel: [262273.498749] ---[ end trace 6388d35f388918bc ]---
Oct 9 14:36:41 mazinger kernel: [262273.498765] qlcnic 0000:07:00.0 ens2f0: rds_ring=0 crb_rcv_
Oct 9 14:36:41 mazinger kernel: [262273.498773] qlcnic 0000:07:00.0 ens2f0: rds_ring=1 crb_rcv_
Oct 9 14:36:41 mazinger kernel: [262273.498781] qlcnic 0000:07:00.0 ens2f0: sds_ring=0 crb_sts_
Oct 9 14:36:41 mazinger kernel: [262273.498788] qlcnic 0000:07:00.0 ens2f0: sds_ring=1 crb_sts_
Oct 9 14:36:41 mazinger kernel: [262273.498792] qlcnic 0000:07:00.0 ens2f0: sds_ring=2 crb_sts_
Oct 9 14:36:41 mazinger kernel: [262273.498796] qlcnic 0000:07:00.0 ens2f0: sds_ring=3 crb_sts_
Oct 9 14:36:41 mazinger kernel: [262273.498798] qlcnic 0000:07:00.0 ens2f0: Tx ring=0 Context Id=0x8000
Oct 9 14:36:41 mazinger kernel: [262273.498800] qlcnic 0000:07:00.0 ens2f0: xmit_finished=
Oct 9 14:36:41 mazinger kernel: [262273.498802] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0
Oct 9 14:36:41 mazinger kernel: [262273.498805] qlcnic 0000:07:00.0 ens2f0: hw_producer=481, sw_producer=481 sw_consumer=491, hw_consumer=491
Oct 9 14:36:41 mazinger kernel: [262273.498807] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10
Oct 9 14:36:41 mazinger kernel: [262273.498809] qlcnic 0000:07:00.0 ens2f0: Tx ring=1 Context Id=0x8008
Oct 9 14:36:41 mazinger kernel: [262273.498811] qlcnic 0000:07:00.0 ens2f0: xmit_finished=
Oct 9 14:36:41 mazinger kernel: [262273.498813] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0
Oct 9 14:36:41 mazinger kernel: [262273.498816] qlcnic 0000:07:00.0 ens2f0: hw_producer=81, sw_producer=81 sw_consumer=91, hw_consumer=91
Oct 9 14:36:41 mazinger kernel: [262273.498818] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10
Oct 9 14:36:41 mazinger kernel: [262273.498819] qlcnic 0000:07:00.0 ens2f0: Tx ring=2 Context Id=0x800a
Oct 9 14:36:41 mazinger kernel: [262273.498821] qlcnic 0000:07:00.0 ens2f0: xmit_finished=
Oct 9 14:36:41 mazinger kernel: [262273.498824] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0
Oct 9 14:36:41 mazinger kernel: [262273.498827] qlcnic 0000:07:00.0 ens2f0: hw_producer=572, sw_producer=572 sw_consumer=582, hw_consumer=582
Oct 9 14:36:41 mazinger kernel: [262273.498828] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10
Oct 9 14:36:41 mazinger kernel: [262273.498830] qlcnic 0000:07:00.0 ens2f0: Tx ring=3 Context Id=0x800c
Oct 9 14:36:41 mazinger kernel: [262273.498836] qlcnic 0000:07:00.0 ens2f0: xmit_finished=
Oct 9 14:36:41 mazinger kernel: [262273.498843] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0
Oct 9 14:36:41 mazinger kernel: [262273.498850] qlcnic 0000:07:00.0 ens2f0: hw_producer=568, sw_producer=568 sw_consumer=578, hw_consumer=578
Oct 9 14:36:41 mazinger kernel: [262273.498857] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10
Oct 9 14:36:41 mazinger kernel: [262273.498863] qlcnic 0000:07:00.0 ens2f0: Tx timeout, reset adapter context.
Oct 9 14:36:43 mazinger kernel: [262275.251864] qlcnic 0000:07:00.0: CDRP command failed: [7]
Oct 9 14:36:43 mazinger kernel: [262275.252143] qlcnic 0000:07:00.0: Host MBX regs(2)
Oct 9 14:36:43 mazinger kernel: [262275.252146] 00000039
Oct 9 14:36:43 mazinger kernel: [262275.252148] 00050032 <6>[262275.252150]
Oct 9 14:36:43 mazinger kernel: [262275.252153] qlcnic 0000:07:00.0: FW MBX regs(3)
Oct 9 14:36:43 mazinger kernel: [262275.252155] 00000007
Oct 9 14:36:43 mazinger kernel: [262275.252156] 00000000 00000000
Oct 9 14:36:43 mazinger kernel: [262275.252158]
Oct 9 14:36:43 mazinger kernel: [262275.252166] qlcnic 0000:07:00.0 ens2f0: Failed to Delete interrupts 7
Oct 9 14:36:43 mazinger kernel: [262275.279376] br-dmz: port 1(ens2f0.2) entered disabled state
Oct 9 14:36:43 mazinger kernel: [262275.447095] qlcnic 0000:07:00.0 ens2f0: Rx Context[0] Created, state 0x2
Oct 9 14:36:43 mazinger kernel: [262275.493365] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8000] Created, state 0x2
Oct 9 14:36:43 mazinger kernel: [262275.509816] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x800e] Created, state 0x2
Oct 9 14:36:43 mazinger kernel: [262275.527651] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8010] Created, state 0x2
Oct 9 14:36:43 mazinger kernel: [262275.543852] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8012] Created, state 0x2
Oct 9 14:36:43 mazinger kernel: [262275.545966] qlcnic 0000:07:00.0 ens2f0: qlcnic_
-----------
What I have tried to fix it:
- I have upgraded the interface firmware to the latest version provided by HP:
# ethtool -i ens2f0
driver: qlcnic
version: 5.3.63
firmware-version: 4.20.1
expansion-
bus-info: 0000:07:00.0
supports-
supports-test: yes
supports-
supports-
supports-
- I have opened a case with HP. Following their recomendations I have upgraded the firmware of the server to the latest version. After capturing a AHS (Active Health System) log the have told me there isn't a hardware problem and it should be a software issue.
- I have tried HWE Kernel (version 4.10.x) which comes with a newer version of qlcnic module (5.3.65) but it didn't solved the problem.
- After reading about some problems with TOS and virtual environments, I have disabled TOS/GOS and other configuration in the interfaces:
auto <iface>
iface <iface> inet manual
pre-up /sbin/ethtool --offload <iface> gso off tso off sg off gro off
I have found similar problems googling but all of them were solved applying one/some of those things. The issue seems to be related to this kind of interfaces and using them with virtual environments.
affects: | kernel-package (Ubuntu) → linux (Ubuntu) |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1723482
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.