Activity log for bug #1814095

Date Who What changed Old value New value Message
2019-01-31 13:09:28 Nivedita Singhvi bug added bug
2019-01-31 13:10:02 Nivedita Singhvi attachment added kern.log.excerpt-netdev-watchdog-timeout.txt https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+attachment/5234643/+files/kern.log.excerpt-netdev-watchdog-timeout.txt
2019-01-31 13:30:06 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2019-01-31 13:33:29 Nivedita Singhvi nominated for series Ubuntu Xenial
2019-01-31 13:34:41 Nivedita Singhvi linux (Ubuntu): status Incomplete Confirmed
2019-02-06 20:40:33 Terry Rudd bug added subscriber Terry Rudd
2019-02-21 14:21:17 Nivedita Singhvi description The following 25Gb Broadcom NIC error was seen on Xenial running the 4.4.0-141-generic kernel on an amd64 host seeing moderate-heavy network traffic (just once): * The bnxt_en_po driver froze on a "TX timed out" error and triggered the Netdev Watchdog timer under load. * From kernel log: "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out" See attached kern.log excerpt file for full excerpt of error log. * Release = Xenial Kernel = 4.4.0-141-generic #167 eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet * This caused the driver to reset in order to recover: "bnxt_en_bpo 0000:19:00.1 eno2d1: TX timeout detected, starting reset task!" driver: bnxt_en_bpo version: 1.8.1 source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout() * The loss of connectivity and softirq stall caused other failures on the system. * The bnxt_en_po driver is the imported Broadcom driver pulled in to support newer Broadcom HW (specific boards) while the bnx_en module continues to support the older HW. The current Linux upstream driver does not compile easily with the 4.4 kernel (too many changes). * This upstream and bnxt_en driver fix is a likely solution: "bnxt_en: Fix TX timeout during netpoll" commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906 This fix has not been applied to the bnxt_en_po driver version, but review of the code indicates that it is susceptible to the bug, and the fix would be reasonable. * No easy way to reproduce this [Impact] The bnxt_en_bpo driver experienced tx timeouts causing the system to experience network stalls and fail to send data and heartbeat packets. The following 25Gb Broadcom NIC error was seen on Xenial running the 4.4.0-141-generic kernel on an amd64 host seeing moderate-heavy network traffic (just once): * The bnxt_en_po driver froze on a "TX timed out" error   and triggered the Netdev Watchdog timer under load. * From kernel log:   "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"   See attached kern.log excerpt file for full excerpt of error log. * Release = Xenial   Kernel = 4.4.0-141-generic #167   eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet * This caused the driver to reset in order to recover:   "bnxt_en_bpo 0000:19:00.1 eno2d1: TX timeout detected, starting reset task!"   driver: bnxt_en_bpo   version: 1.8.1   source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout() * The loss of connectivity and softirq stall caused other failures   on the system. * The bnxt_en_po driver is the imported Broadcom driver   pulled in to support newer Broadcom HW (specific boards)   while the bnx_en module continues to support the older   HW. The current Linux upstream driver does not compile   easily with the 4.4 kernel (too many changes). * This upstream and bnxt_en driver fix is a likely solution:    "bnxt_en: Fix TX timeout during netpoll"    commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906   This fix has not been applied to the bnxt_en_po driver   version, but review of the code indicates that it is   susceptible to the bug, and the fix would be reasonable. [Test Case] * Unfortunately, this is not easy to reproduce. Also, it is only seen on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo driver. [Regression Potential] * The patch is restricted to the bpo driver, with very constrained scope - just the newest Broadcom NICs being used by the Xenial 4.4 kernel (as opposed to the hwe 4.15 etc. kernels, which would have the in-tree fixed driver). * The patch is very small and backport is fairly minimal and simple. * The fix has been running on the in-tree driver in upstream mainline as well as the Ubuntu Linux in-tree driver, although the Broadcom driver has a lot of lower level code that is different, this piece is still the same.
2019-02-21 16:43:54 Terry Rudd bug task added linux (Ubuntu Xenial)
2019-02-22 10:26:11 Nivedita Singhvi linux (Ubuntu Xenial): status New Confirmed
2019-02-22 10:26:28 Nivedita Singhvi linux (Ubuntu Xenial): importance Undecided High
2019-03-03 19:09:15 Nivedita Singhvi linux (Ubuntu Xenial): status Confirmed In Progress
2019-03-03 19:09:22 Nivedita Singhvi linux (Ubuntu Xenial): assignee Nivedita Singhvi (niveditasinghvi)
2019-03-03 22:44:20 Khaled El Mously linux (Ubuntu Xenial): status In Progress Fix Committed
2019-03-15 20:04:37 Brad Figg tags xenial verification-needed-xenial xenial
2019-04-02 10:26:27 Launchpad Janitor linux (Ubuntu Xenial): status Fix Committed Fix Released
2019-04-02 10:26:27 Launchpad Janitor cve linked 2018-9517
2019-04-02 10:26:27 Launchpad Janitor cve linked 2019-3459
2019-04-02 10:26:27 Launchpad Janitor cve linked 2019-3460
2019-04-02 10:26:27 Launchpad Janitor cve linked 2019-6974
2019-04-02 10:26:27 Launchpad Janitor cve linked 2019-7221
2019-04-02 10:26:27 Launchpad Janitor cve linked 2019-7222
2019-04-02 10:26:27 Launchpad Janitor cve linked 2019-9213
2019-05-23 02:41:04 Nivedita Singhvi tags verification-needed-xenial xenial sts verification-needed-xenial xenial
2019-07-24 20:58:48 Brad Figg tags sts verification-needed-xenial xenial cscc sts verification-needed-xenial xenial
2020-07-14 14:54:46 Guilherme G. Piccoli linux (Ubuntu): status Confirmed Fix Released
2020-07-14 14:54:59 Guilherme G. Piccoli linux (Ubuntu): assignee Nivedita Singhvi (niveditasinghvi)