E810 Ubuntu 20.04 LTS - Crash - Device is in unrecoverable state

Bug #2023674 reported by A. Saber Shenouda
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Hi,

On intensive load, the interface goes in "unrecoverable state". We already using the latest firmware from Dell https://www.dell.com/support/home/en-ca/drivers/driversdetails?driverid=25ffj which is supposed to prevent that issue

"- Resolved an issue with E810 adapters where the queues are not freed up properly and causes
kernel issues under heavy traffic. {208689}"

Here are the syslog when the bug appears:

@timestamp type message program
2023-04-11T23:34:01.406Z syslog [6833005.040087] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:34:01.406Z syslog [6833005.040086] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 911, hung_queue 22 kernel
2023-04-11T23:34:01.406Z syslog [6833005.040085] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:33:56.290Z syslog [6832999.924350] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:33:56.290Z syslog [6832999.924349] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 910, hung_queue 22 kernel
2023-04-11T23:33:56.290Z syslog [6832999.924347] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:33:46.365Z syslog [6832989.936862] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:33:46.365Z syslog [6832989.936864] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 909, hung_queue 22 kernel
2023-04-11T23:33:46.365Z syslog [6832989.936864] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:33:40.414Z syslog [6832984.049162] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:33:40.414Z syslog [6832984.049164] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 908, hung_queue 22 kernel
2023-04-11T23:33:40.414Z syslog [6832984.049165] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:33:35.323Z syslog [6832978.929427] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:33:35.322Z syslog [6832978.929426] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 907, hung_queue 22 kernel
2023-04-11T23:33:35.322Z syslog [6832978.929424] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:33:29.487Z syslog [6832973.041730] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 906, hung_queue 22 kernel
2023-04-11T23:33:29.487Z syslog [6832973.041731] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:33:29.486Z syslog [6832973.041729] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:33:24.327Z syslog [6832967.921995] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:33:24.326Z syslog [6832967.921993] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:33:24.326Z syslog [6832967.921994] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 905, hung_queue 22 kernel
2023-04-11T23:33:14.306Z syslog [6832957.942505] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:33:14.306Z syslog [6832957.942502] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:33:14.306Z syslog [6832957.942504] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 904, hung_queue 22 kernel
2023-04-11T23:33:08.414Z syslog [6832952.050807] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:33:08.414Z syslog [6832952.050809] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 903, hung_queue 22 kernel
2023-04-11T23:33:08.414Z syslog [6832952.050810] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:33:03.294Z syslog [6832946.931073] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:33:03.294Z syslog [6832946.931075] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:33:03.294Z syslog [6832946.931075] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 902, hung_queue 22 kernel
2023-04-11T23:32:57.406Z syslog [6832941.043373] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:32:57.406Z syslog [6832941.043372] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 901, hung_queue 22 kernel
2023-04-11T23:32:57.406Z syslog [6832941.043371] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:32:52.286Z syslog [6832935.923637] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:32:52.286Z syslog [6832935.923639] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 900, hung_queue 22 kernel
2023-04-11T23:32:52.286Z syslog [6832935.923639] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:32:42.355Z syslog [6832925.940152] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:32:42.355Z syslog [6832925.940151] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 899, hung_queue 22 kernel
2023-04-11T23:32:42.355Z syslog [6832925.940149] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:32:36.441Z syslog [6832920.052458] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:32:36.441Z syslog [6832920.052457] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 898, hung_queue 22 kernel
2023-04-11T23:32:36.440Z syslog [6832920.052455] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel
2023-04-11T23:32:31.298Z syslog [6832914.936718] ice 0000:c4:00.3 ens3f3: tx_timeout recovery unsuccessful, device is in unrecoverable state. kernel
2023-04-11T23:32:31.298Z syslog [6832914.936717] ice 0000:c4:00.3 ens3f3: tx_timeout recovery level 897, hung_queue 22 kernel
2023-04-11T23:32:31.298Z syslog [6832914.936716] ice 0000:c4:00.3 ens3f3: tx_timeout: VSI_num: 15, Q 22, NTC: 0x8e, HW_HEAD: 0x8d, NTU: 0x78, INT: 0x1 kernel

Tags: bot-comment
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Libera.chat.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/2023674/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
A. Saber Shenouda (saberph) wrote :

5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

ice driver: 0.8.1-k

affects: ubuntu → kernel
affects: kernel → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2023674

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
A. Saber Shenouda (saberph) wrote :

I can't run this command right now as it's currently in production. Will try to run it soon.

System is a Dell R7515, AMD Epyc 7543P, 256GB RAM

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.