Activity log for bug #2056354

Date Who What changed Old value New value Message
2024-03-06 16:55:10 Thibf bug added bug
2024-03-06 16:58:32 Thibf nominated for series Ubuntu Noble
2024-03-06 16:58:32 Thibf bug task added linux (Ubuntu Noble)
2024-03-06 16:59:48 Thibf description [Impact] Some improvement are made to qat which makes it more resilient and able to handle reset in a better way. There is an upstream patch set that improve this but it's applied to linux-next and scheduled for 6.9. We should apply the patch set to the Noble 6.8 kernel, so we experience less issue with qat and be more maintainable. [Test case] Use the added mechanism to inject errors and rmmod/modprobe the module to verify that qat recover and log issues properly. [Fix] Apply the following commits (from linux-next): 2ecd43413d76 Documentation: qat: fix auto_reset section 7d42e097607c crypto: qat - resolve race condition during AER recovery c2304e1a0b80 crypto: qat - change SLAs cleanup flow at shutdown 9567d3dc7609 crypto: qat - improve aer error reset handling 750fa7c20e60 crypto: qat - limit heartbeat notifications f5419a4239af crypto: qat - add auto reset on error 2aaa1995a94a crypto: qat - add fatal error notification 4469f9b23468 crypto: qat - re-enable sriov after pf reset ec26f8e6c784 crypto: qat - update PFVF protocol for recovery 758a0087db98 crypto: qat - disable arbitration before reset ae508d7afb75 crypto: qat - add fatal error notify method e2b67859ab6e crypto: qat - add heartbeat error simulator [Regression potential] We may experience qat regression when crashing or restarting the module. [Impact] Some improvement are made to qat which makes it more resilient and able to handle reset in a better way. There is an upstream patch set that improve this but it's applied to linux-next and scheduled for 6.9. We should apply the patch set to the Noble 6.8 kernel, so we experience less issues with qat and be more maintainable. [Test case] Use the added mechanism to inject errors and rmmod/modprobe the module to verify that qat recover and log issues properly. [Fix] Apply the following commits (from linux-next): 2ecd43413d76 Documentation: qat: fix auto_reset section 7d42e097607c crypto: qat - resolve race condition during AER recovery c2304e1a0b80 crypto: qat - change SLAs cleanup flow at shutdown 9567d3dc7609 crypto: qat - improve aer error reset handling 750fa7c20e60 crypto: qat - limit heartbeat notifications f5419a4239af crypto: qat - add auto reset on error 2aaa1995a94a crypto: qat - add fatal error notification 4469f9b23468 crypto: qat - re-enable sriov after pf reset ec26f8e6c784 crypto: qat - update PFVF protocol for recovery 758a0087db98 crypto: qat - disable arbitration before reset ae508d7afb75 crypto: qat - add fatal error notify method e2b67859ab6e crypto: qat - add heartbeat error simulator [Regression potential] We may experience qat regression when crashing or restarting the module.
2024-03-07 16:12:08 Ricardo Martins linux (Ubuntu Noble): status New In Progress
2024-03-07 22:04:37 Thibf description [Impact] Some improvement are made to qat which makes it more resilient and able to handle reset in a better way. There is an upstream patch set that improve this but it's applied to linux-next and scheduled for 6.9. We should apply the patch set to the Noble 6.8 kernel, so we experience less issues with qat and be more maintainable. [Test case] Use the added mechanism to inject errors and rmmod/modprobe the module to verify that qat recover and log issues properly. [Fix] Apply the following commits (from linux-next): 2ecd43413d76 Documentation: qat: fix auto_reset section 7d42e097607c crypto: qat - resolve race condition during AER recovery c2304e1a0b80 crypto: qat - change SLAs cleanup flow at shutdown 9567d3dc7609 crypto: qat - improve aer error reset handling 750fa7c20e60 crypto: qat - limit heartbeat notifications f5419a4239af crypto: qat - add auto reset on error 2aaa1995a94a crypto: qat - add fatal error notification 4469f9b23468 crypto: qat - re-enable sriov after pf reset ec26f8e6c784 crypto: qat - update PFVF protocol for recovery 758a0087db98 crypto: qat - disable arbitration before reset ae508d7afb75 crypto: qat - add fatal error notify method e2b67859ab6e crypto: qat - add heartbeat error simulator [Regression potential] We may experience qat regression when crashing or restarting the module. [Impact] This set improves the error recovery flows in the QAT drivers and adds a mechanism to test it through an heartbeat simulator. This is an upstream patch set applied to linux-next and scheduled for 6.9. Link to the upstream submission: https://patchwork.kernel.org/project/linux-crypto/cover/20240202105324.50391-1-mun.chun.yep@intel.com/ We should apply this set to the Noble 6.8 kernel, in order to experience less issues with qat and improve maintainability. An added commit is required to update the configuration. [Test case] Unload and reload the module to verify that qat recover and log issues properly. Use the added error injection mechanism to verify the recovery flow. [Fix] Apply the following commits (from linux-next): 2ecd43413d76 Documentation: qat: fix auto_reset section 7d42e097607c crypto: qat - resolve race condition during AER recovery c2304e1a0b80 crypto: qat - change SLAs cleanup flow at shutdown 9567d3dc7609 crypto: qat - improve aer error reset handling 750fa7c20e60 crypto: qat - limit heartbeat notifications f5419a4239af crypto: qat - add auto reset on error 2aaa1995a94a crypto: qat - add fatal error notification 4469f9b23468 crypto: qat - re-enable sriov after pf reset ec26f8e6c784 crypto: qat - update PFVF protocol for recovery 758a0087db98 crypto: qat - disable arbitration before reset ae508d7afb75 crypto: qat - add fatal error notify method e2b67859ab6e crypto: qat - add heartbeat error simulator [Regression potential] We may experience qat regression when crashing or restarting the module.
2024-03-07 22:04:41 Thibf linux (Ubuntu Noble): assignee Thibf (thibf)
2024-03-08 13:59:01 Thibf summary qat: Improve error/reset handling qat: Improve error recovery flows
2024-03-08 15:51:42 Thibf linux (Ubuntu Noble): status In Progress Fix Committed
2024-03-20 23:09:33 Thibf linux (Ubuntu Noble): status Fix Committed Fix Released
2024-03-21 09:27:03 Kleber Sacilotto de Souza linux (Ubuntu Noble): status Fix Released Fix Committed
2024-03-28 08:15:32 Launchpad Janitor linux (Ubuntu Noble): status Fix Committed Fix Released
2024-05-14 19:47:47 Ubuntu Kernel Bot tags kernel-spammed-jammy-linux-nvidia-6.8-v2 verification-needed-jammy-linux-nvidia-6.8