Activity log for bug #2064163

Date Who What changed Old value New value Message
2024-04-29 19:29:44 Asmaa Mnebhi bug added bug
2024-04-29 19:56:10 Asmaa Mnebhi description SRU Justification: [Impact] During the QA reboot test, the BF3 Vitesse PHY gets stuck in a bad state, resulting in no ip provisioning. The only way to recover is to toggle the PHY hard reset pin via GPIO17 (or powercycle cycle since it achieves just that). We might have found a software workaround to avoid getting in this state in the first place: suspend the PHY during graceful shutdown. Suspend the PHY = Power down = set bit 11 to 1 in reg 0 of the PHY. This WA passed 1800 reboots on QA's setup. [Fix] * During reboot, the mlxbf_gige_shutdown() function makes a call to phy_stop(). phy_stop() calls phy_suspend(). * Certain Linux PHY drivers, like the Vitesse PHY, don't support suspend() to power down the PHY during shutdown. * Our Hardware also does not toggle the hard reset signal of the PHY during reboot. * Hence, when the PHY is in a bad state, it stays in its bad state until powercycle. * We have found a way to prevent the PHY from entering this bad state by suspending the PHY in the case of reboot. [Test Case] * do the reboot test (at least 2000 reboots): run 'reboot' from linux. * Check that the oob_net0 interface is up and the ip is assigned. * please note that if the the OOB doesn't get an ip, try reloading the driver (rmmod/modprobe). it that solves the issue, that would be a different bug. In the bug at stake, nothing recovers the OOB ip except power cycle. [Regression Potential] * Make sure the redfish DHCP is still working during the reboot test * Make sure the OOB gets an ip [Other] These changes were made both in the mlxbf-gige driver and UEFI SRU Justification: [Impact] During the QA reboot test, the BF3 Vitesse PHY gets stuck in a bad state, resulting in no ip provisioning. The only way to recover is to powercycle. We might have found a software workaround to avoid getting in this state in the first place: suspend the PHY during graceful shutdown. Suspend the PHY = Power down = set bit 11 to 1 in reg 0 of the PHY. This WA passed 1800 reboots on QA's setup. [Fix] * During reboot, the mlxbf_gige_shutdown() function makes a call to phy_stop(). phy_stop() calls phy_suspend(). * Certain Linux PHY drivers, like the Vitesse PHY, don't support suspend() to power down the PHY during shutdown. * Our Hardware also does not toggle the hard reset signal of the PHY during reboot. * Hence, when the PHY is in a bad state, it stays in its bad state until powercycle. * We have found a way to prevent the PHY from entering this bad state by suspending the PHY in the case of reboot. [Test Case] * do the reboot test (at least 2000 reboots): run 'reboot' from linux. * Check that the oob_net0 interface is up and the ip is assigned. * please note that if the the OOB doesn't get an ip, try reloading the driver (rmmod/modprobe). it that solves the issue, that would be a different bug. In the bug at stake, nothing recovers the OOB ip except power cycle. [Regression Potential] * Make sure the redfish DHCP is still working during the reboot test * Make sure the OOB gets an ip [Other] These changes were made both in the mlxbf-gige driver and UEFI
2024-04-30 09:41:40 Bartlomiej Zolnierkiewicz nominated for series Ubuntu Jammy
2024-04-30 09:41:40 Bartlomiej Zolnierkiewicz bug task added linux-bluefield (Ubuntu Jammy)
2024-04-30 18:42:29 Asmaa Mnebhi linux-bluefield (Ubuntu): status New Invalid
2024-04-30 18:43:28 Asmaa Mnebhi linux-bluefield (Ubuntu Jammy): status New Invalid
2024-05-13 21:17:16 Asmaa Mnebhi linux-bluefield (Ubuntu): status Invalid In Progress
2024-05-13 21:17:19 Asmaa Mnebhi linux-bluefield (Ubuntu Jammy): status Invalid In Progress
2024-05-13 21:19:46 Asmaa Mnebhi description SRU Justification: [Impact] During the QA reboot test, the BF3 Vitesse PHY gets stuck in a bad state, resulting in no ip provisioning. The only way to recover is to powercycle. We might have found a software workaround to avoid getting in this state in the first place: suspend the PHY during graceful shutdown. Suspend the PHY = Power down = set bit 11 to 1 in reg 0 of the PHY. This WA passed 1800 reboots on QA's setup. [Fix] * During reboot, the mlxbf_gige_shutdown() function makes a call to phy_stop(). phy_stop() calls phy_suspend(). * Certain Linux PHY drivers, like the Vitesse PHY, don't support suspend() to power down the PHY during shutdown. * Our Hardware also does not toggle the hard reset signal of the PHY during reboot. * Hence, when the PHY is in a bad state, it stays in its bad state until powercycle. * We have found a way to prevent the PHY from entering this bad state by suspending the PHY in the case of reboot. [Test Case] * do the reboot test (at least 2000 reboots): run 'reboot' from linux. * Check that the oob_net0 interface is up and the ip is assigned. * please note that if the the OOB doesn't get an ip, try reloading the driver (rmmod/modprobe). it that solves the issue, that would be a different bug. In the bug at stake, nothing recovers the OOB ip except power cycle. [Regression Potential] * Make sure the redfish DHCP is still working during the reboot test * Make sure the OOB gets an ip [Other] These changes were made both in the mlxbf-gige driver and UEFI SRU Justification: [Impact] During the QA reboot test, the BF3 Vitesse PHY gets stuck in a bad state, resulting in no ip provisioning. The only way to recover is to powercycle. We found a software workaround to avoid getting in this state in the first place: disable the OOB port in the shutdown function. [Fix] * Prevent the PHY from entering this bad state by disabling the OOB port during shutdown. [Test Case] * do the reboot test (at least 2000 reboots): run 'reboot' from linux. * Check that the oob_net0 interface is up and the ip is assigned. * please note that if the the OOB doesn't get an ip, try reloading the driver (rmmod/modprobe). it that solves the issue, that would be a different bug. In the bug at stake, nothing recovers the OOB ip except power cycle. [Regression Potential] * Make sure the redfish DHCP is still working during the reboot test * Make sure the OOB gets an ip