[UBUNTU 20.04] LPAR becomes unresponsive after the Kernel panic - rq_qos_wake_function

Bug #1929923 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Critical
Skipper Bug Screeners
linux (Ubuntu)
Undecided
Skipper Bug Screeners

Bug Description

---Problem Description---
kernel panic rq_qos_wake_function

---uname output---
Linux version 5.4.0-71-generic

Machine Type = s390x

---Debugger---
A debugger is not configured

Stack trace output:
 May 15 20:21:04 data1 kernel: Call Trace:
May 15 20:21:04 data1 kernel: ([<000000234091e670>] 0x234091e670)
May 15 20:21:04 data1 kernel: [<0000003e10047e3a>] rq_qos_wake_function+0x8a/0xa0
May 15 20:21:04 data1 kernel: [<0000003e0fbec482>] __wake_up_common+0xa2/0x1b0
May 15 20:21:04 data1 kernel: [<0000003e0fbec984>] __wake_up_common_lock+0x94/0xe0
May 15 20:21:04 data1 kernel: [<0000003e0fbec9fa>] __wake_up+0x2a/0x40
May 15 20:21:04 data1 kernel: [<0000003e1005ee70>] wbt_done+0x90/0xe0
May 15 20:21:04 data1 kernel: [<0000003e10047f42>] __rq_qos_done+0x42/0x60
May 15 20:21:04 data1 kernel: [<0000003e10033cb0>] blk_mq_free_request+0xe0/0x140
May 15 20:21:04 data1 kernel: [<0000003e101d46f0>] dm_softirq_done+0x140/0x230
May 15 20:21:04 data1 kernel: [<0000003e100326c0>] blk_done_softirq+0xc0/0xe0
May 15 20:21:04 data1 kernel: [<0000003e103fc084>] __do_softirq+0x104/0x360
May 15 20:21:04 data1 kernel: [<0000003e0fb9da1e>] irq_exit+0x9e/0xc0
May 15 20:21:04 data1 kernel: [<0000003e0fb28ae8>] do_IRQ+0x78/0xb0
May 15 20:21:04 data1 kernel: [<0000003e103fb588>] ext_int_handler+0x130/0x134
May 15 20:21:04 data1 kernel: [<0000003e101d4416>] dm_mq_queue_rq+0x36/0x1d0
May 15 20:21:04 data1 kernel: Last Breaking-Event-Address:
May 15 20:21:04 data1 kernel: [<0000003e0fbce75e>] wake_up_process+0xe/0x20
May 15 20:21:04 data1 kernel: Kernel panic - not syncing: Fatal exception in interrupt

Oops output:
 no

System Dump Info:
  The system was configured to capture a dump, however a dump was not produced.

-Attach sysctl -a output output to the bug.

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-192966 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
bugproxy (bugproxy)
tags: added: severity-critical
removed: severity-high
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Is it possible to describe the steps required to reproduce this issue? And the environment in which it occurred?
Thanks!

Changed in ubuntu-z-systems:
importance: Undecided → Critical
Revision history for this message
Pedro Principeza (pprincipeza) wrote :

Greetings!

Aside from Andrew's last request, I see that a very similar issue was discussed on LP# 1881109 [0] but, at that time, tests with a later kernel didn't reproduce the issue.

You're running 5.4.0-71 there. Have you been able to reproduce this using either 5.4.0-73 or 5.4.0-74 (the latter in Proposed, only)?

Thanks!

[0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1881109

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Revision history for this message
Frank Heimes (fheimes) wrote :

Well, as already mentioned in the two comments before more details are needed:

If a system is not on the latest kernel level, but shows kernel issues, it is first of all needed to update the system to the latest kernel level and try to recreate the issue there.
So I agree with Pedro that this needs to be verified on (currently) latest 5.4.0.73, since especially 5.4.0.73 includes hundreds of upstream stable patches because it includes the range from v5.4.102 to .106.

Testing on 5.4.0-74 (currently in proposed) would be the next crucial step, since it incl. even more upstream stable patches ranging from v5.4.107 to .114 - again hundreds of patches.

This is needed to be sure that we don't hunt a bug that may already have been fixed.

If the issue is re-produceable on these kernel levels, wee need more details on the environment:
- which IBM Z or LinuxONE system is in use
- which storage backend is attached and used?
- is zFCP/SCSI used or DASDs?
- and what is the dump device and how did it got configured?
(again detailed steps to re-produce, like Andrew asked for)

Changed in ubuntu-z-systems:
status: New → Incomplete
Revision history for this message
Frank Heimes (fheimes) wrote :

Updating status to 'Invalid' due to inactivity.

Changed in linux (Ubuntu):
status: New → Invalid
Changed in ubuntu-z-systems:
status: Incomplete → Invalid
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-08-26 13:23 EDT-------
Problem could not be reproduced by Canonical. Further detailed information requested by Canonical was not provided to them since May.
In the meantime, the new point release Ubuntu Server LTS 20.04.3 is available. This means that 20.04.2 which the bug was opened against, has become obsolete.
Therefore, closing / rejecting the bug.

Changing
Status:->REJECTED (UNREPRODUCIBLE)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers