[UBUNTU 24.04] Running sosreport causes the system to crash and produce a dump

Bug #2068577 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
Undecided
Unassigned
sosreport (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

== Reported by <email address hidden> ==

---Problem Description---
Good day, we recently installed Ubuntu 24.04 on a z LPAR.
After we run any test we usually collect the debug information in case we need it, and what happened is that each time "sosreport" is run on the server
the system crashes and produces a dump.
I'll attach the full output when the "sosreport" command was run, but this is a snippet of the last lines captured before it went down:

[plugin:zvm] skipped command 'vmcp ind sp': required kmods missing: vmcp, cpint.
[plugin:zvm] skipped command 'vmcp ind user': required kmods missing: vmcp, cpint.
 Running plugins. Please wait ...

  Starting 21/74 filesys [Running: block btrfs ebpf filesys]

Machine Type = s390x

Contact Information = Vanessa <email address hidden>, Michael <email address hidden>, Daniel <email address hidden>

---uname output---
Linux ilzlnx10 6.8.0-31-generic #31-Ubuntu SMP Sat Apr 20 00:14:26 UTC 2024 s390x s390x s390x GNU/Linux

---Steps to Reproduce---
 Run "sosreport"

---Debugger---
A debugger is not configured

System Dump Location:
 Access credentials to the server can be provided.

Oops output:
 [59770.601165] [ T120907] Kernel panic - not syncing: Fatal exception: panic_on_oops

Stack trace output:
 [59770.601103] [ T120907] Call Trace:
[59770.601104] [ T120907] [<0000000fc7961bf0>] mutex_lock+0x30/0x60
[59770.601107] [ T120907] ([<0000000fc7961be6>] mutex_lock+0x26/0x60)
[59770.601109] [ T120907] [<000003ff805990f8>] svc_pool_stats_start+0x40/0xd0 [sunrpc]
[59770.601146] [ T120907] [<0000000fc7031704>] seq_read_iter+0x134/0x510
[59770.601149] [ T120907] [<0000000fc7031bfc>] seq_read+0x11c/0x168
[59770.601151] [ T120907] [<0000000fc6ff4276>] vfs_read+0x96/0x328
[59770.601154] [ T120907] [<0000000fc6ff5058>] ksys_read+0x70/0x128
[59770.601156] [ T120907] [<0000000fc7954c26>] __do_syscall+0x246/0x2e8
[59770.601158] [ T120907] [<0000000fc7968510>] system_call+0x70/0x98
[59770.601161] [ T120907] Last Breaking-Event-Address:
[59770.601162] [ T120907] [<0000000fc795fd88>] __cond_resched+0x48/0x90
[59770.601165] [ T120907] Kernel panic - not syncing: Fatal exception: panic_on_oops

Revision history for this message
bugproxy (bugproxy) wrote : failed_sosreport_output_20240515

Default Comment by Bridge

tags: added: architecture-s39064 bugnameltc-206504 severity-high targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : extracted kernel messages from automatically collected kdump

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : DBGINFO-2024-05-16-10-09-29-ilabg14-2A1A18.tgz

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : failed_sosreport_output_ilabg14_20240515

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
affects: linux (Ubuntu) → sosreport (Ubuntu)
Revision history for this message
David Negreira (dnegreira) wrote :

Hi,

Thank you for the report!

Do you mind testing and verify if you run into the same issue with the latest Ubuntu kernel?
It should be 6.8.0-35.35

Revision history for this message
David Negreira (dnegreira) wrote :

We would also like to look at the latest sosreport command that was run before crashing, can you share the latest /tmp/sos.<randomname> so that we can inspect what is there?

Thank you.

Revision history for this message
Frank Heimes (fheimes) wrote :

Thanks for having reported this issue.

I tried to recreate this on the systems that I have at hand (which is a z13 in PS/SM mode and a LinuxONE 3 in DPM mode) and ran sosreport twice on both systems in an LPAR, with a default 24.04 install, and after having 24.04 upgraded to the latest level (incl. kernel) with:
sudo apt update && sudo apt full-upgrade # and reboot
and in none of the four cases sosreport crashed for me.
Would you please retry (like my colleagues above already mentioned) with the latest kernel (means after full-upgrade)? Even if I cannot recreate with the GA kernel on my system(s).

So I'm now trying to figure out differences between your setup and mine.

You sosreport package versions is the same than mine: sosreport (version 4.5.6)

Then I though that you may use a filesystem formatted other than ext4, which may cause issues, since the last line that you see seems to be:
"Starting 21/74 filesys [Running: block btrfs ebpf filesys]"
but the debuginfo tells you that you are also on ext4 (like me).

Looks like you system is a IBM z16 Model A01, Machine Type 3931 (that I do not have at hand).

Is this really happening for you on an LPAR or in a zVM guest? (since dbginfo also incl. zvm data)?

Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in sosreport (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → nobody
Changed in ubuntu-z-systems:
importance: Undecided → High
Revision history for this message
Frank Heimes (fheimes) wrote :

On top I've noticed that there are probably nfs shares active in your system (saw that in fstab).
There is an open issue with nfs.
Would you mind tearing down (and removing nfs) temporarily and for test reasons and run sosreport then again?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2024-06-19 16:28 EDT-------
Hello! Thank you very much for your replies!

(In reply to comment #16)
> Hi,
>
> Thank you for the report!
>
> Do you mind testing and verify if you run into the same issue with the
> latest Ubuntu kernel?
> It should be 6.8.0-35.35

I upgraded all packages on the server and I'm currently on 6.8.0-35-generic.
I also tore down all NFS (up to the point of uninstalling packages).
And have tried running sosreport and I'm no longer encountering this issue. The reports get generated with no issues.
No dumps are generated.

(In reply to comment #17)
> We would also like to look at the latest sosreport command that was run
> before crashing, can you share the latest /tmp/sos.<randomname> so that we
> can inspect what is there?
>
> Thank you.
Since I can no longer reproduce this I can dig around to see if I captured this information in any earlier instance of this issue.
Is there any value in providing this information now?

(In reply to comment #18)
> Would you please retry (like my colleagues above already mentioned) with the
> latest kernel (means after full-upgrade)? Even if I cannot recreate with the
> GA kernel on my system(s).
>

> Is this really happening for you on an LPAR or in a zVM guest? (since
> dbginfo also incl. zvm data)?

It is an LPAR, not a zVM guest.

(In reply to comment #19)
> On top I've noticed that there are probably nfs shares active in your system
> (saw that in fstab).
> There is an open issue with nfs.
> Would you mind tearing down (and removing nfs) temporarily and for test
> reasons and run sosreport then again?

Removed :)

Thanks!

Revision history for this message
Frank Heimes (fheimes) wrote :

Hello, first of all I'm glad reading that this is now solved for you.
I'm still wondering and a bit curious what this has initially caused in your environment...

Anyway, happy that the upgrade to the latest update levels has solved this (and since this is recommended to do anyway, to have a supported system, I think we are fine).

With that I'm closing this ticket.

Please feel free to open any new ticket in case this (or anything else) happens to your system(s) in future.

Changed in sosreport (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu):
status: New → Fix Released
Changed in ubuntu-z-systems:
status: New → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-06-20 10:24 EDT-------
(In reply to comment #21)
> Please feel free to open any new ticket in case this (or anything else)
> happens to your system(s) in future.
Thank you, I appreciate your support and clear responses. If I ever hit this again, will do so.
Have a nice one!

------- Comment From <email address hidden> 2024-06-20 12:21 EDT-------
(In reply to comment #19)
> On top I've noticed that there are probably nfs shares active in your system
> (saw that in fstab).
> There is an open issue with nfs.
> Would you mind tearing down (and removing nfs) temporarily and for test
> reasons and run sosreport then again?

An additional question, could you provide the reference on which ticket is this defect against NFS being tracked?
Thanks!

Revision history for this message
Frank Heimes (fheimes) wrote :

Yes, sure the NFS issue is reported here:
https://bugs.launchpad.net/bugs/2060217

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-06-24 11:21 EDT-------
(In reply to comment #24)
> Yes, sure the NFS issue is reported here:
> https://bugs.launchpad.net/bugs/2060217

Thank you!

bugproxy (bugproxy)
tags: added: targetmilestone-inin2404
removed: targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-06-24 12:21 EDT-------
Thanks everyone for your work.

Closing the bug: changing status to "CLOSED"

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-07-01 18:35 EDT-------
(In reply to comment #26)
> Problem could be fixed by upgrading Ubuntu 24.04 to the latest level (incl.
> kernel) via update, full-upgrade and reboot.

Good day! Back in Comment 6 , we mentioned there was another server seeing this issue, the same distro, the only difference was that it is a zVM guest.

The issue was still happening on it (sosreport causing the host to crash)
After updating all available packages and being on the latest kernel, the issue was still happening.

It wasn't until the NFS was removed (completely uninstalled) that a sosreport was able to complete.

This means that this issue is not resolved, rather we have a workaround.
Wanted to point that out in case we want to reopen this ticket or link it to the NFS defect mentioned in Comment 24. Let me know what would be the best route.

Thank you!

Revision history for this message
Frank Heimes (fheimes) wrote :

So the NFS problem that was reported in LP#2060217 is already addressed and is fixed in the Ubuntu Kernel 6.8.0-38.
But kernel 6.8.0-38 is not yet fully rolled out - it's currently in the "-proposed" pocket of the Ubuntu archive, but can already be tested and used.

However, the way to install from "-proposed" changed starting with 24.04, this is the 'new way':

$ sudo add-apt-repository -y "deb http://ports.ubuntu.com/ubuntu-ports/ $(lsb_release -sc)-proposed main universe"
$ sudo apt update
$ apt-cache policy linux-generic
linux-generic:
  Installed: 6.8.0-36.36
  Candidate: 6.8.0-36.36
  Version table:
     6.8.0-38.38 100
        100 http://ports.ubuntu.com/ubuntu-ports noble-proposed/main s390x Packages
 *** 6.8.0-36.36 500
        500 http://ports.ubuntu.com/ubuntu-ports noble-updates/main s390x Packages
        500 http://ports.ubuntu.com/ubuntu-ports noble-security/main s390x Packages
        100 /var/lib/dpkg/status
     6.8.0-31.31 500
        500 http://ports.ubuntu.com/ubuntu-ports noble/main s390x Packages
$ sudo apt -t=noble-proposed install linux-generic # will install 6.8.0-38 from noble-proposed

Testing this kernel (with nfs enabled) would show if the fix for nfs in LP#2060217 is also fixing the situation here.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-07-03 13:42 EDT-------
Thank you! Will test and update results here :)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.