[SRU] Fnic driver on needs to be updated to 1.6.0.57 on Jammy

Bug #2036777 reported by Brahamaprakash Vardhaman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Michael Reed
Jammy
In Progress
Undecided
Michael Reed
Lunar
Won't Fix
Undecided
Michael Reed
Mantic
In Progress
Undecided
Michael Reed
Noble
Fix Released
Undecided
Michael Reed

Bug Description

[Impact]

fnic_clean_pending_aborts() was returning a non-zero value
irrespective of failure or success.
This caused the caller of this function to assume that the
device reset had failed, even though it would succeed in
most cases. As a consequence, a successful device reset
would escalate to host reset.

sgreset is issued with a scsi command pointer.
The device reset code assumes that it was issued
on a hardware queue, and calls block multiqueue
layer. However, the assumption is broken, and
there is no hardware queue associated with the
sgreset, and this leads to a crash due to a
null pointer exception.

[Fix]
Fix the code to use the max_tag_id as a tag
which does not overlap with the other tags
issued by mid layer.

Below are the kernel patches which picked for the newer version of fins driver.

924cb24df4fc scsi: fnic: Stop using the SCSI pointer
b559b99a5c081 scsi: fnic: Replace DMA mask of 64 bits with 47 bits
5a43b07a87835 scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
15924b0503630 scsi: fnic: Replace sgreset tag with max_tag_id
514f0c400bde6 scsi: fnic: Fix sg_reset success path

[Test Plan]
Tested by running FC traffic for a few minutes,
and by issuing sgreset on the device in parallel.
Without the fix, the crash is observed right away.
With this fix, no crash is observed.

sg_reset performs a device reset/lun reset on a lun.
Since it is issued by the user, it does not come into the
driver with a tag or a queue id.
Fix the fnic driver to create an io_req and use a scsi command tag.
Fix the ITMF path to special case the sg_reset response.

[ Where problems could occur ]

[ Other Info ]

Jammy
https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/fnic_11_10_23

Mantic
https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/mantic/+ref/fnic_update_2036777

Links to first 3 patches
https://<email address hidden>/
https://<email address hidden>/
https://<email address hidden>/

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2036777

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Michael Reed (mreed8855)
Changed in linux (Ubuntu):
assignee: nobody → Michael Reed (mreed8855)
summary: - fnic driver on needs to be updated to 1.6.0.57 on Focal
+ [SRU] Fnic driver on needs to be updated to 1.6.0.57 on Focal
description: updated
Changed in linux (Ubuntu Lunar):
assignee: nobody → Michael Reed (mreed8855)
Changed in linux (Ubuntu Jammy):
assignee: nobody → Michael Reed (mreed8855)
Michael Reed (mreed8855)
summary: - [SRU] Fnic driver on needs to be updated to 1.6.0.57 on Focal
+ [SRU] Fnic driver on needs to be updated to 1.6.0.57 on Jammy
Revision history for this message
Michael Reed (mreed8855) wrote :

The first patch is already in Jammy

commit 06adda263bd3776b019f282318efe18dd5bfa173
Author: Karan Tilak Kumar <email address hidden>
Date: Thu Jul 27 12:39:19 2023 -0700

    scsi: fnic: Replace return codes in fnic_clean_pending_aborts()

    BugLink: https://bugs.launchpad.net/bugs/2038382

    commit 5a43b07a87835660f91d88a4db11abfea8c523b7 upstream.

    fnic_clean_pending_aborts() was returning a non-zero value irrespective of
    failure or success. This caused the caller of this function to assume that
    the device reset had failed, even though it would succeed in most cases. As
    a consequence, a successful device reset would escalate to host reset.

    Reviewed-by: Sesidhar Baddela <email address hidden>
    Tested-by: Karan Tilak Kumar <email address hidden>
    Signed-off-by: Karan Tilak Kumar <email address hidden>
    Link: https://<email address hidden>
    Signed-off-by: Martin K. Petersen <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>
    Signed-off-by: Kamal Mostafa <email address hidden>
    Signed-off-by: Stefan Bader <email address hidden>

Revision history for this message
Michael Reed (mreed8855) wrote (last edit ):
Changed in linux (Ubuntu Jammy):
status: New → Invalid
status: Invalid → New
Changed in linux (Ubuntu Lunar):
status: New → Won't Fix
Changed in linux (Ubuntu Jammy):
status: New → In Progress
Michael Reed (mreed8855)
Changed in linux (Ubuntu Noble):
status: Incomplete → Fix Released
Changed in linux (Ubuntu Mantic):
status: Incomplete → In Progress
Revision history for this message
Michael Reed (mreed8855) wrote :

These patches were also needed.

924cb24df4fc scsi: fnic: Stop using the SCSI pointer
b559b99a5c081 scsi: fnic: Replace DMA mask of 64 bits with 47 bits

description: updated
description: updated
description: updated
Revision history for this message
Michael Reed (mreed8855) wrote :
Revision history for this message
Brahamaprakash Vardhaman (bvardham) wrote : Re: [Bug 2036777] Re: [SRU] Fnic driver on needs to be updated to 1.6.0.57 on Jammy
Download full text (3.3 KiB)

+Karan and team to test this kernel.

Regards
Brahamaprakash Vardhaman

From: <email address hidden> <email address hidden> on behalf of Michael Reed <email address hidden>
Date: Tuesday, 12 December 2023 at 11:15 PM
To: Brahamaprakash Vardhaman (bvardham) <email address hidden>
Subject: [Bug 2036777] Re: [SRU] Fnic driver on needs to be updated to 1.6.0.57 on Jammy
Here is the test kernel for Mantic

https://people.canonical.com/~mreed/cisco/lp_2036777/mantic/

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/2036777

Title:
  [SRU] Fnic driver on needs to be updated to 1.6.0.57 on Jammy

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  In Progress
Status in linux source package in Lunar:
  Won't Fix
Status in linux source package in Mantic:
  In Progress
Status in linux source package in Noble:
  Fix Released

Bug description:
  [Impact]

  fnic_clean_pending_aborts() was returning a non-zero value
  irrespective of failure or success.
  This caused the caller of this function to assume that the
  device reset had failed, even though it would succeed in
  most cases. As a consequence, a successful device reset
  would escalate to host reset.

  sgreset is issued with a scsi command pointer.
  The device reset code assumes that it was issued
  on a hardware queue, and calls block multiqueue
  layer. However, the assumption is broken, and
  there is no hardware queue associated with the
  sgreset, and this leads to a crash due to a
  null pointer exception.

  [Fix]
  Fix the code to use the max_tag_id as a tag
  which does not overlap with the other tags
  issued by mid layer.

  Below are the kernel patches which picked for the newer version of
  fins driver.

  924cb24df4fc scsi: fnic: Stop using the SCSI pointer
  b559b99a5c081 scsi: fnic: Replace DMA mask of 64 bits with 47 bits
  5a43b07a87835 scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
  15924b0503630 scsi: fnic: Replace sgreset tag with max_tag_id
  514f0c400bde6 scsi: fnic: Fix sg_reset success path

  [Test Plan]
  Tested by running FC traffic for a few minutes,
  and by issuing sgreset on the device in parallel.
  Without the fix, the crash is observed right away.
  With this fix, no crash is observed.

  sg_reset performs a device reset/lun reset on a lun.
  Since it is issued by the user, it does not come into the
  driver with a tag or a queue id.
  Fix the fnic driver to create an io_req and use a scsi command tag.
  Fix the ITMF path to special case the sg_reset response.

  [ Where problems could occur ]

  [ Other Info ]

  Jammy
  https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/fnic_11_10_23

  Mantic
  https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/mantic/+ref/fnic_update_2036777

  Links to first 3 patches
  https://<email address hidden>/
  https://<email address hidden>/
  https://lore.kernel.org/lkml/20230919182436...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.