Activity log for bug #2036777

Date Who What changed Old value New value Message
2023-09-20 16:16:48 Brahamaprakash Vardhaman bug added bug
2023-09-20 16:30:06 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2023-10-04 00:27:08 Michael Reed linux (Ubuntu): assignee Michael Reed (mreed8855)
2023-10-04 00:27:23 Michael Reed nominated for series Ubuntu Mantic
2023-10-04 00:27:23 Michael Reed bug task added linux (Ubuntu Mantic)
2023-10-04 00:27:23 Michael Reed nominated for series Ubuntu Lunar
2023-10-04 00:27:23 Michael Reed bug task added linux (Ubuntu Lunar)
2023-10-04 00:27:23 Michael Reed nominated for series Ubuntu Jammy
2023-10-04 00:27:23 Michael Reed bug task added linux (Ubuntu Jammy)
2023-10-04 00:28:36 Michael Reed summary fnic driver on needs to be updated to 1.6.0.57 on Focal [SRU] Fnic driver on needs to be updated to 1.6.0.57 on Focal
2023-10-04 00:31:08 Michael Reed description fnic_clean_pending_aborts() was returning a non-zero value irrespective of failure or success. This caused the caller of this function to assume that the device reset had failed, even though it would succeed in most cases. As a consequence, a successful device reset would escalate to host reset. sgreset is issued with a scsi command pointer. The device reset code assumes that it was issued on a hardware queue, and calls block multiqueue layer. However, the assumption is broken, and there is no hardware queue associated with the sgreset, and this leads to a crash due to a null pointer exception. Fix the code to use the max_tag_id as a tag which does not overlap with the other tags issued by mid layer. Tested by running FC traffic for a few minutes, and by issuing sgreset on the device in parallel. Without the fix, the crash is observed right away. With this fix, no crash is observed. sg_reset performs a device reset/lun reset on a lun. Since it is issued by the user, it does not come into the driver with a tag or a queue id. Fix the fnic driver to create an io_req and use a scsi command tag. Fix the ITMF path to special case the sg_reset response. below are the kernel patches which picked for the newer version of fins driver. https://lore.kernel.org/lkml/20230727193919.2519-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230817182146.229059-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230919182436.6895-1-kartilak@cisco.com/ [Impact] fnic_clean_pending_aborts() was returning a non-zero value irrespective of failure or success. This caused the caller of this function to assume that the device reset had failed, even though it would succeed in most cases. As a consequence, a successful device reset would escalate to host reset. sgreset is issued with a scsi command pointer. The device reset code assumes that it was issued on a hardware queue, and calls block multiqueue layer. However, the assumption is broken, and there is no hardware queue associated with the sgreset, and this leads to a crash due to a null pointer exception. [Fix] Fix the code to use the max_tag_id as a tag which does not overlap with the other tags issued by mid layer. Below are the kernel patches which picked for the newer version of fins driver. https://lore.kernel.org/lkml/20230727193919.2519-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230817182146.229059-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230919182436.6895-1-kartilak@cisco.com/ [Test Plan] Tested by running FC traffic for a few minutes, and by issuing sgreset on the device in parallel. Without the fix, the crash is observed right away. With this fix, no crash is observed. sg_reset performs a device reset/lun reset on a lun. Since it is issued by the user, it does not come into the driver with a tag or a queue id. Fix the fnic driver to create an io_req and use a scsi command tag. Fix the ITMF path to special case the sg_reset response. [ Where problems could occur ] [ Other Info ]
2023-10-04 00:31:41 Michael Reed linux (Ubuntu Lunar): assignee Michael Reed (mreed8855)
2023-10-04 00:31:43 Michael Reed linux (Ubuntu Jammy): assignee Michael Reed (mreed8855)
2023-11-08 08:25:08 Michael Reed summary [SRU] Fnic driver on needs to be updated to 1.6.0.57 on Focal [SRU] Fnic driver on needs to be updated to 1.6.0.57 on Jammy
2023-12-05 16:53:00 Michael Reed nominated for series Ubuntu Noble
2023-12-05 16:53:00 Michael Reed bug task added linux (Ubuntu Noble)
2023-12-05 16:53:15 Michael Reed linux (Ubuntu Jammy): status New Invalid
2023-12-05 16:53:19 Michael Reed linux (Ubuntu Jammy): status Invalid New
2023-12-05 16:53:27 Michael Reed linux (Ubuntu Lunar): status New Won't Fix
2023-12-05 16:53:38 Michael Reed linux (Ubuntu Jammy): status New In Progress
2023-12-07 01:29:23 Michael Reed linux (Ubuntu Noble): status Incomplete Fix Released
2023-12-07 01:29:28 Michael Reed linux (Ubuntu Mantic): status Incomplete In Progress
2023-12-07 02:13:19 Michael Reed description [Impact] fnic_clean_pending_aborts() was returning a non-zero value irrespective of failure or success. This caused the caller of this function to assume that the device reset had failed, even though it would succeed in most cases. As a consequence, a successful device reset would escalate to host reset. sgreset is issued with a scsi command pointer. The device reset code assumes that it was issued on a hardware queue, and calls block multiqueue layer. However, the assumption is broken, and there is no hardware queue associated with the sgreset, and this leads to a crash due to a null pointer exception. [Fix] Fix the code to use the max_tag_id as a tag which does not overlap with the other tags issued by mid layer. Below are the kernel patches which picked for the newer version of fins driver. https://lore.kernel.org/lkml/20230727193919.2519-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230817182146.229059-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230919182436.6895-1-kartilak@cisco.com/ [Test Plan] Tested by running FC traffic for a few minutes, and by issuing sgreset on the device in parallel. Without the fix, the crash is observed right away. With this fix, no crash is observed. sg_reset performs a device reset/lun reset on a lun. Since it is issued by the user, it does not come into the driver with a tag or a queue id. Fix the fnic driver to create an io_req and use a scsi command tag. Fix the ITMF path to special case the sg_reset response. [ Where problems could occur ] [ Other Info ] [Impact] fnic_clean_pending_aborts() was returning a non-zero value irrespective of failure or success. This caused the caller of this function to assume that the device reset had failed, even though it would succeed in most cases. As a consequence, a successful device reset would escalate to host reset. sgreset is issued with a scsi command pointer. The device reset code assumes that it was issued on a hardware queue, and calls block multiqueue layer. However, the assumption is broken, and there is no hardware queue associated with the sgreset, and this leads to a crash due to a null pointer exception. [Fix] Fix the code to use the max_tag_id as a tag which does not overlap with the other tags issued by mid layer. Below are the kernel patches which picked for the newer version of fins driver. https://lore.kernel.org/lkml/20230727193919.2519-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230817182146.229059-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230919182436.6895-1-kartilak@cisco.com/ [Test Plan] Tested by running FC traffic for a few minutes, and by issuing sgreset on the device in parallel. Without the fix, the crash is observed right away. With this fix, no crash is observed. sg_reset performs a device reset/lun reset on a lun. Since it is issued by the user, it does not come into the driver with a tag or a queue id. Fix the fnic driver to create an io_req and use a scsi command tag. Fix the ITMF path to special case the sg_reset response. [ Where problems could occur ] [ Other Info ] Jammy https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/fnic_11_10_23 Mantic https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/mantic/+ref/fnic_update_2036777
2023-12-07 02:16:01 Michael Reed description [Impact] fnic_clean_pending_aborts() was returning a non-zero value irrespective of failure or success. This caused the caller of this function to assume that the device reset had failed, even though it would succeed in most cases. As a consequence, a successful device reset would escalate to host reset. sgreset is issued with a scsi command pointer. The device reset code assumes that it was issued on a hardware queue, and calls block multiqueue layer. However, the assumption is broken, and there is no hardware queue associated with the sgreset, and this leads to a crash due to a null pointer exception. [Fix] Fix the code to use the max_tag_id as a tag which does not overlap with the other tags issued by mid layer. Below are the kernel patches which picked for the newer version of fins driver. https://lore.kernel.org/lkml/20230727193919.2519-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230817182146.229059-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230919182436.6895-1-kartilak@cisco.com/ [Test Plan] Tested by running FC traffic for a few minutes, and by issuing sgreset on the device in parallel. Without the fix, the crash is observed right away. With this fix, no crash is observed. sg_reset performs a device reset/lun reset on a lun. Since it is issued by the user, it does not come into the driver with a tag or a queue id. Fix the fnic driver to create an io_req and use a scsi command tag. Fix the ITMF path to special case the sg_reset response. [ Where problems could occur ] [ Other Info ] Jammy https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/fnic_11_10_23 Mantic https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/mantic/+ref/fnic_update_2036777 [Impact] fnic_clean_pending_aborts() was returning a non-zero value irrespective of failure or success. This caused the caller of this function to assume that the device reset had failed, even though it would succeed in most cases. As a consequence, a successful device reset would escalate to host reset. sgreset is issued with a scsi command pointer. The device reset code assumes that it was issued on a hardware queue, and calls block multiqueue layer. However, the assumption is broken, and there is no hardware queue associated with the sgreset, and this leads to a crash due to a null pointer exception. [Fix] Fix the code to use the max_tag_id as a tag which does not overlap with the other tags issued by mid layer. Below are the kernel patches which picked for the newer version of fins driver. 924cb24df4fc scsi: fnic: Stop using the SCSI pointer b559b99a5c081 scsi: fnic: Replace DMA mask of 64 bits with 47 bits 5a43b07a87835 scsi: fnic: Replace return codes in fnic_clean_pending_aborts() 15924b0503630 scsi: fnic: Replace sgreset tag with max_tag_id 514f0c400bde6 scsi: fnic: Fix sg_reset success path [Test Plan] Tested by running FC traffic for a few minutes, and by issuing sgreset on the device in parallel. Without the fix, the crash is observed right away. With this fix, no crash is observed. sg_reset performs a device reset/lun reset on a lun. Since it is issued by the user, it does not come into the driver with a tag or a queue id. Fix the fnic driver to create an io_req and use a scsi command tag. Fix the ITMF path to special case the sg_reset response. [ Where problems could occur ] [ Other Info ] Links to first 3 patches https://lore.kernel.org/lkml/20230727193919.2519-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230817182146.229059-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230919182436.6895-1-kartilak@cisco.com/
2023-12-07 02:17:18 Michael Reed description [Impact] fnic_clean_pending_aborts() was returning a non-zero value irrespective of failure or success. This caused the caller of this function to assume that the device reset had failed, even though it would succeed in most cases. As a consequence, a successful device reset would escalate to host reset. sgreset is issued with a scsi command pointer. The device reset code assumes that it was issued on a hardware queue, and calls block multiqueue layer. However, the assumption is broken, and there is no hardware queue associated with the sgreset, and this leads to a crash due to a null pointer exception. [Fix] Fix the code to use the max_tag_id as a tag which does not overlap with the other tags issued by mid layer. Below are the kernel patches which picked for the newer version of fins driver. 924cb24df4fc scsi: fnic: Stop using the SCSI pointer b559b99a5c081 scsi: fnic: Replace DMA mask of 64 bits with 47 bits 5a43b07a87835 scsi: fnic: Replace return codes in fnic_clean_pending_aborts() 15924b0503630 scsi: fnic: Replace sgreset tag with max_tag_id 514f0c400bde6 scsi: fnic: Fix sg_reset success path [Test Plan] Tested by running FC traffic for a few minutes, and by issuing sgreset on the device in parallel. Without the fix, the crash is observed right away. With this fix, no crash is observed. sg_reset performs a device reset/lun reset on a lun. Since it is issued by the user, it does not come into the driver with a tag or a queue id. Fix the fnic driver to create an io_req and use a scsi command tag. Fix the ITMF path to special case the sg_reset response. [ Where problems could occur ] [ Other Info ] Links to first 3 patches https://lore.kernel.org/lkml/20230727193919.2519-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230817182146.229059-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230919182436.6895-1-kartilak@cisco.com/ [Impact] fnic_clean_pending_aborts() was returning a non-zero value irrespective of failure or success. This caused the caller of this function to assume that the device reset had failed, even though it would succeed in most cases. As a consequence, a successful device reset would escalate to host reset. sgreset is issued with a scsi command pointer. The device reset code assumes that it was issued on a hardware queue, and calls block multiqueue layer. However, the assumption is broken, and there is no hardware queue associated with the sgreset, and this leads to a crash due to a null pointer exception. [Fix] Fix the code to use the max_tag_id as a tag which does not overlap with the other tags issued by mid layer. Below are the kernel patches which picked for the newer version of fins driver. 924cb24df4fc scsi: fnic: Stop using the SCSI pointer b559b99a5c081 scsi: fnic: Replace DMA mask of 64 bits with 47 bits 5a43b07a87835 scsi: fnic: Replace return codes in fnic_clean_pending_aborts() 15924b0503630 scsi: fnic: Replace sgreset tag with max_tag_id 514f0c400bde6 scsi: fnic: Fix sg_reset success path [Test Plan] Tested by running FC traffic for a few minutes, and by issuing sgreset on the device in parallel. Without the fix, the crash is observed right away. With this fix, no crash is observed. sg_reset performs a device reset/lun reset on a lun. Since it is issued by the user, it does not come into the driver with a tag or a queue id. Fix the fnic driver to create an io_req and use a scsi command tag. Fix the ITMF path to special case the sg_reset response. [ Where problems could occur ] [ Other Info ] Jammy https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/fnic_11_10_23 Mantic https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/mantic/+ref/fnic_update_2036777 Links to first 3 patches https://lore.kernel.org/lkml/20230727193919.2519-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230817182146.229059-1-kartilak@cisco.com/ https://lore.kernel.org/lkml/20230919182436.6895-1-kartilak@cisco.com/
2024-07-25 20:31:53 Brian Murray linux (Ubuntu Mantic): status In Progress Won't Fix