Activity log for bug #1840789

Date Who What changed Old value New value Message
2019-08-20 15:11:27 Mauricio Faria de Oliveira bug added bug
2019-08-20 15:11:37 Mauricio Faria de Oliveira linux (Ubuntu): status New In Progress
2019-08-20 15:11:39 Mauricio Faria de Oliveira linux (Ubuntu): assignee Mauricio Faria de Oliveira (mfo)
2019-08-22 12:48:33 Mauricio Faria de Oliveira description Description/patches to be provided this week. [Impact] * The bnx2x driver may cause hardware faults (leading to panic/reboot) and other behaviors as transmit timeouts, after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is introduced. * This issue has been observed by an user shortly after starting docker & kubelet, with adapters: - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c] - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79] * If options to ignore hardware faults are used (erst_disable=1 hest_disable=1 ghes.disable=1) the system doesn't panic/reboot and continues on to timeout on adapter stats, then transmit timeouts, spewing some adapter firmware dumps, but the network interface is non-functional. * The issue only happened when LLDP is enabled on the network switches, and crashdump shows the bnx2x driver is stuck/waits for firmware to complete the stop traffic command in LLDP handling. Workaround used is to disable LLDP in the network switches/ports. * Analysis of the driver and firmware dumps didn't help significantly towards finding the root cause. * Upstream/mainline recently just reverted the patch, due to similar problem reports, while looking for the root cause/proper fix. [Test Case] * No reproducible test case found outside the user's systems/cluster, where it is enough to start docker & kubelet & wait. * The user verified test kernels for Xenial and Bionic - the problem does not happen. [Regression Potential] * Users who significantly use/apply the non-default traffic class (tc) / class of service (cos) might possibly see performance changes (if any at all) in such applications, however that's unclear now. * This is a recent revert upstream (v5.3-rc'ish), so there's chance things might change in this area. * Nonetheless, the patch is authored by the driver vendor, and made its way into stable kernels (e.g., v5.2.8 which made Eoan/19.10 recently).
2019-08-22 12:48:50 Mauricio Faria de Oliveira nominated for series Ubuntu Bionic
2019-08-22 12:48:50 Mauricio Faria de Oliveira bug task added linux (Ubuntu Bionic)
2019-08-22 12:48:50 Mauricio Faria de Oliveira nominated for series Ubuntu Xenial
2019-08-22 12:48:50 Mauricio Faria de Oliveira bug task added linux (Ubuntu Xenial)
2019-08-22 12:48:50 Mauricio Faria de Oliveira nominated for series Ubuntu Eoan
2019-08-22 12:48:50 Mauricio Faria de Oliveira bug task added linux (Ubuntu Eoan)
2019-08-22 12:48:50 Mauricio Faria de Oliveira nominated for series Ubuntu Disco
2019-08-22 12:48:50 Mauricio Faria de Oliveira bug task added linux (Ubuntu Disco)
2019-08-22 12:49:37 Mauricio Faria de Oliveira linux (Ubuntu Eoan): status In Progress Fix Released
2019-08-22 12:53:40 Mauricio Faria de Oliveira linux (Ubuntu Disco): status New In Progress
2019-08-22 12:53:45 Mauricio Faria de Oliveira linux (Ubuntu Bionic): status New In Progress
2019-08-22 12:53:50 Mauricio Faria de Oliveira linux (Ubuntu Xenial): status New In Progress
2019-08-22 12:53:53 Mauricio Faria de Oliveira linux (Ubuntu Disco): assignee Mauricio Faria de Oliveira (mfo)
2019-08-22 12:53:55 Mauricio Faria de Oliveira linux (Ubuntu Bionic): assignee Mauricio Faria de Oliveira (mfo)
2019-08-22 12:53:57 Mauricio Faria de Oliveira linux (Ubuntu Xenial): assignee Mauricio Faria de Oliveira (mfo)
2019-08-22 12:55:12 Mauricio Faria de Oliveira description [Impact] * The bnx2x driver may cause hardware faults (leading to panic/reboot) and other behaviors as transmit timeouts, after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is introduced. * This issue has been observed by an user shortly after starting docker & kubelet, with adapters: - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c] - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79] * If options to ignore hardware faults are used (erst_disable=1 hest_disable=1 ghes.disable=1) the system doesn't panic/reboot and continues on to timeout on adapter stats, then transmit timeouts, spewing some adapter firmware dumps, but the network interface is non-functional. * The issue only happened when LLDP is enabled on the network switches, and crashdump shows the bnx2x driver is stuck/waits for firmware to complete the stop traffic command in LLDP handling. Workaround used is to disable LLDP in the network switches/ports. * Analysis of the driver and firmware dumps didn't help significantly towards finding the root cause. * Upstream/mainline recently just reverted the patch, due to similar problem reports, while looking for the root cause/proper fix. [Test Case] * No reproducible test case found outside the user's systems/cluster, where it is enough to start docker & kubelet & wait. * The user verified test kernels for Xenial and Bionic - the problem does not happen. [Regression Potential] * Users who significantly use/apply the non-default traffic class (tc) / class of service (cos) might possibly see performance changes (if any at all) in such applications, however that's unclear now. * This is a recent revert upstream (v5.3-rc'ish), so there's chance things might change in this area. * Nonetheless, the patch is authored by the driver vendor, and made its way into stable kernels (e.g., v5.2.8 which made Eoan/19.10 recently). [Impact]  * The bnx2x driver may cause hardware faults (leading to    panic/reboot) and other behaviors as transmit timeouts,    after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is    introduced.  * This issue has been observed by an user shortly    after starting docker & kubelet, with adapters:    - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c]    - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79]  * If options to ignore hardware faults are used    (erst_disable=1 hest_disable=1 ghes.disable=1)    the system doesn't panic/reboot and continues    on to timeout on adapter stats, then transmit    timeouts, spewing some adapter firmware dumps,    but the network interface is non-functional.  * The issue only happened when LLDP is enabled    on the network switches, and crashdump shows    the bnx2x driver is stuck/waits for firmware    to complete the stop traffic command in LLDP    handling. Workaround used is to disable LLDP    in the network switches/ports.  * Analysis of the driver and firmware dumps    didn't help significantly towards finding    the root cause.  * Upstream/mainline recently just reverted the    patch, due to similar problem reports, while    looking for the root cause/proper fix. [Test Case]  * No reproducible test case found outside    the user's systems/cluster, where it is    enough to start docker & kubelet & wait.  * The user verified test kernels for Xenial    and Bionic - the problem does not happen; build-tested on Disco. [Regression Potential]  * Users who significantly use/apply the non-default    traffic class (tc) / class of service (cos) might    possibly see performance changes (if any at all)    in such applications, however that's unclear now.  * This is a recent revert upstream (v5.3-rc'ish),    so there's chance things might change in this area.  * Nonetheless, the patch is authored by the driver    vendor, and made its way into stable kernels    (e.g., v5.2.8 which made Eoan/19.10 recently).
2019-08-29 22:44:23 Nivedita Singhvi tags sts
2019-08-29 22:44:41 Nivedita Singhvi linux (Ubuntu Xenial): importance Undecided High
2019-08-29 22:44:44 Nivedita Singhvi linux (Ubuntu Bionic): importance Undecided High
2019-08-29 22:44:49 Nivedita Singhvi linux (Ubuntu Xenial): importance High Critical
2019-08-29 22:44:52 Nivedita Singhvi linux (Ubuntu Bionic): importance High Critical
2019-08-29 22:44:56 Nivedita Singhvi linux (Ubuntu Disco): importance Undecided Critical
2019-08-29 22:45:00 Nivedita Singhvi linux (Ubuntu Eoan): importance Undecided Critical
2019-08-29 22:45:11 Nivedita Singhvi bug added subscriber Nivedita Singhvi
2019-09-02 11:46:40 Mauricio Faria de Oliveira linux (Ubuntu Xenial): status In Progress Incomplete
2019-09-02 11:46:42 Mauricio Faria de Oliveira linux (Ubuntu Bionic): status In Progress Incomplete
2019-09-02 11:46:44 Mauricio Faria de Oliveira linux (Ubuntu Disco): status In Progress Incomplete