zfcp: fix infinite iteration on ERP ready list

Bug #1780067 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Canonical Kernel Team
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Bionic
Fix Released
High
Joseph Salisbury

Bug Description

Please backport:
commit fa89adba1941e4f3b213399b81732a5c12fd9131
    scsi: zfcp: fix infinite iteration on ERP ready list

    zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's
    rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen
    adapter ERP action via zfcp_erp_action_enqueue(). Both are separately
    processed asynchronously and concurrently.

    Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It
    calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via
    zfcp_dbf_rec_trig(). zfcp_dbf_rec_trig() acquires the DBF REC spin lock
    and then iterates with list_for_each() over the adapter's ERP ready list
    without holding the ERP lock. This opens a race window in which the
    current list entry can be moved to another list, causing list_for_each()
    to iterate forever on the wrong list, as the erp_ready_head is never
    encountered as terminal condition.

    Meanwhile the ERP action can be processed in the ERP thread by
    zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP
    lock and then calls zfcp_erp_action_to_running() to move the ERP action
    from the ready to the running list. zfcp_erp_action_to_running() can
    move the ERP action using list_move() just during the aforementioned
    race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run().
    zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is
    held by the infinitely looping kworker, it effectively spins forever.

    Example Sequence Diagram:

    Process ERP Thread rport_work
    ------------------- ------------------- -------------------
    zfcp_erp_adapter_reopen()
    zfcp_erp_adapter_block()
    zfcp_scsi_schedule_rports_block()
    lock ERP zfcp_scsi_rport_work()
    zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER)
    list_add_tail() on ready !(rport_task==RPORT_ADD)
    wake_up() ERP thread zfcp_scsi_rport_block()
    zfcp_dbf_rec_trig() zfcp_erp_strategy() zfcp_dbf_rec_trig()
    unlock ERP lock DBF REC
    zfcp_erp_wait() lock ERP
    | zfcp_erp_action_to_running()
    | list_for_each() ready
    | list_move() current entry
    | ready to running
    | zfcp_dbf_rec_run() endless loop over running
    | zfcp_dbf_rec_run_lvl()
    | lock DBF REC spins forever

    Any adapter recovery can trigger this, such as setting the device offline
    or reboot.

    V4.9 commit 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport
    during rport gone") introduced additional tracing of (un)blocking of
    rports. It missed that the adapter->erp_lock must be held when calling
    zfcp_dbf_rec_trig().

    This fix uses the approach formerly introduced by commit aa0fec62391c
    ("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got
    later removed by commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug
    tracing for recovery actions.").

    Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that
    acquires and releases the adapter->erp_lock for read.

    Reported-by: Sebastian Ott <email address hidden>
    Signed-off-by: Jens Remus <email address hidden>
    Fixes: 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport during rport gone")
    Cc: <email address hidden> # 2.6.32+
    Reviewed-by: Benjamin Block <email address hidden>
    Signed-off-by: Steffen Maier <email address hidden>
    Signed-off-by: Martin K. Petersen <email address hidden>

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-169499 severity-high targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → In Progress
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
status: New → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Joseph Salisbury (jsalisbury)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

 built a test kernel with commit fa89adba1941e4f3b213399b81732a5c12fd9131. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1780067

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-07-09 05:21 EDT-------
Fix verified upfront by IBM

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This commit is now in the Bionic master-next repo.

Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The commit made it's way into bionic via the upstream stable updates posted in bug 1783418

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

$ git tag --contains fa89adba1941e4f3b213399b81732a5c12fd9131 Ubuntu-4.17.0-6.7
Ubuntu-4.17.0-6.7

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Confirmed this patch is in the Ubuntu-4.15.0-31 kernel as commit:

b4756bd452c1 scsi: zfcp: fix infinite iteration on ERP ready list

Revision history for this message
Frank Heimes (fheimes) wrote :

Since we are on 4.15.0.33.35 with bionic, I'm marking bionic as Fix Released and close the ticket.

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-08-31 07:51 EDT-------
IBM bugzilla status -> closed, Fix Released by Canonical

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.