[Ubuntu 24.04] FW1060.00 (NH1060_026) sosreport is running to Kernel OOPS crash
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Fix Committed
|
High
|
Patricia Domingues | ||
linux (Ubuntu) |
Invalid
|
High
|
Canonical Kernel Team | ||
Noble |
Fix Committed
|
High
|
Unassigned | ||
sosreport (Ubuntu) |
Invalid
|
High
|
Unassigned | ||
Noble |
Invalid
|
Undecided
|
Unassigned |
Bug Description
SRU Justification:
[Impact]
* When the sosreport command is executed, a kernel OOPS happens and the system is crashing,
depending on the configuration (but default) the system/LPAR is rebooting.
[Fix]
* e0011bca603c101
[Test Case]
* Have a Ubuntu Server 24.04 LTS installation on ppc64el.
* one option is only running sosreport on the system - and
the crash is seen when the sosreport is starting to capture dump
* second option (without sosreport) is:
* CONFIG_NFSD=m (or y) must be set
* mount nfsd if not already, using "$ mount -t nfsd nfsd /proc/fs/nfsd" command
* The kernel oops will happen and the logs will show:
...
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc0000000016ff114
Oops: Kernel access of bad area, sig: 11 [#1]
...
* On a system with that kernel that incl. the above patch
no oops will occur and the sosreport command will execute normally.
[Regression Potential]
* There is a certain risk of a regression, with any code modification,
and here because the mutex handling in nfsd is modified.
* But the changes are pretty traceable.
* On top the commit is already upstream reviewed and accepted.
* The modifications were done by the NFSD maintainer and also tested by IBM.
[Other]
* The fix/commit got upstream accepted with kernel v6.10-rc7,
hence Oracular (with a planned kernel of >=6.10) is not affected.
== Comment: #0 - Tasmiya Nalatwad <email address hidden> - 2024-05-28 04:35:50 ==
--- Description ---
When sosreport command is executed the kernel OOPS crash is happening and lpar is rebooting. As kdump was enabled the dump is captured.
Note : The bug looks similar Bug 206504 Which is seen on z lpars.
--- Lpar Details ---
1. PowerVM
2. FW: FW1060.00 (NH1060_026)
3. OS: Ubuntu 24.04
4. Kernel: 6.8.0-31-generic
5. Mem (free -mh): 47Gi
6. cpus: 40
--- Steps to reproduce ---
1. run sosreport command on the lpar and the crash is seen when the sosreport is starting to capture dump.
--- Traces ---
root@ubuntulp2h
Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'.
Redirecting to 'sos report '
sosreport (version 4.5.6)
This command will collect system configuration and diagnostic
information from this Ubuntu system.
For more information on Canonical visit:
Community Website : https:/
Commercial Support : https:/
The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.
No changes will be made to system configuration.
Press ENTER to continue, or CTRL-C to quit.
Optionally, please enter the case id that you are generating this report for []:
Setting up archive ...
Setting up plugins ...
[plugin:lxd] skipped command 'lxc image list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter.
[plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter.
[plugin:lxd] skipped command 'lxc network list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter.
[plugin:lxd] skipped command 'lxc profile list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter.
[plugin:lxd] skipped command 'lxc storage list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter.
[plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-
[plugin:networking] skipped command 'ss -peaonmi': required kmods missing: af_packet_diag, unix_diag, netlink_diag, udp_diag, inet_diag, tcp_diag, xsk_diag. Use '--allow-
Not all environment variables set. Source the environment file for the user intended to connect to the OpenStack environment.
[plugin:ufw] skipped command 'ufw status numbered': required kmods missing: bpfilter, iptable_filter.
[plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter, iptable_filter.
Running plugins. Please wait ...
Starting 21/75 firewall_tables [Running: cloud_init ebpf filesys firewall_tables] [ 1057.076626] Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
[ 1057.076645] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 1057.076650] Faulting instruction address: 0xc0000000016ff114
[ 1057.076655] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1057.076659] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 1057.076665] Modules linked in: rpcsec_gss_krb5 xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc rdma_ucm ib_uverbs qrtr rdma_cm iw_cm ib_cm ib_core cfg80211 binfmt_misc kvm_hv kvm vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace nf_tables nvme_fabrics dm_multipath nvme_core nvme_auth sunrpc nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 nx_compress_pseries nx_compress ibmvscsi 842_decompress ibmveth pseries_rng poly1305_p10_crypto chacha_p10_crypto libchacha crct10dif_vpmsum crc32c_vpmsum aes_gcm_p10_crypto
[ 1057.076731] CPU: 25 PID: 6109 Comm: sosreport Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu
[ 1057.076737] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_026) hv:phyp pSeries
[ 1057.076743] NIP: c0000000016ff114 LR: c0000000016ff108 CTR: c0000000016ff0e0
[ 1057.076747] REGS: c000000067e63630 TRAP: 0300 Not tainted (6.8.0-31-generic)
[ 1057.076752] MSR: 8000000000009033 <SF,EE,
[ 1057.076761] CFAR: c0000000016fb6c8 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
[ 1057.076761] GPR00: 0000000000000000 c000000067e638d0 c000000002254800 0000000000000000
[ 1057.076761] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 1057.076761] GPR08: 0000000000000000 0000000000000000 c000000057a07980 c008000005d39538
[ 1057.076761] GPR12: c0000000016ff0e0 c000000c1bc8ff00 0000000000000000 0000000000000000
[ 1057.076761] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 1057.076761] GPR20: c00000006751a628 0000000000000000 0000000000000000 0000000000000000
[ 1057.076761] GPR24: 0000000000000000 c00000006751a618 0000000000000000 c000000067e63a70
[ 1057.076761] GPR28: c000000067e63a98 0000000000000000 c00000006b4d9188 0000000000000000
[ 1057.076809] NIP [c0000000016ff114] mutex_lock+
[ 1057.076816] LR [c0000000016ff108] mutex_lock+
[ 1057.076821] Call Trace:
[ 1057.076823] [c000000067e638d0] [c0000000016ff108] mutex_lock+
[ 1057.076829] [c000000067e63900] [c008000005d2e480] svc_pool_
[ 1057.076866] [c000000067e63970] [c0000000007196a0] seq_read_
[ 1057.076871] [c000000067e63a40] [c000000000719d00] seq_read+
[ 1057.076875] [c000000067e63ae0] [c0000000006c8254] vfs_read+0xe4/0x3e0
[ 1057.076881] [c000000067e63b90] [c0000000006c94a0] ksys_read+
[ 1057.076886] [c000000067e63be0] [c000000000033248] system_
[ 1057.076892] [c000000067e63e50] [c00000000000d05c] system_
[ 1057.076899] --- interrupt: 3000 at 0x689080b5b504
[ 1057.076903] NIP: 0000689080b5b504 LR: 0000689080b5b504 CTR: 0000000000000000
[ 1057.076907] REGS: c000000067e63e80 TRAP: 3000 Not tainted (6.8.0-31-generic)
[ 1057.076911] MSR: 800000000280f033 <SF,VEC,
[ 1057.076922] IRQMASK: 0
[ 1057.076922] GPR00: 0000000000000003 000068907600da50 0000689080c96d00 0000000000000008
[ 1057.076922] GPR04: 000068905c014660 0000000000010000 000068907ca613c8 00006890760168e0
[ 1057.076922] GPR08: 000068907600f228 0000000000000000 0000000000000000 0000000000000000
[ 1057.076922] GPR12: 0000000000000000 00006890760168e0 0000000000000001 0000000000000000
[ 1057.076922] GPR16: 000068907c89bb50 000068907ff10968 000068907ff10978 0000689080372f7a
[ 1057.076922] GPR20: 0000689080372f78 000068907ff10938 000068907ff108f0 0000000010493180
[ 1057.076922] GPR24: 000068905c014660 0000000000000008 0000000000000000 0000000000000000
[ 1057.076922] GPR28: 000068905c014660 0000000000010000 0000000000000008 000068907600da50
[ 1057.076965] NIP [0000689080b5b504] 0x689080b5b504
[ 1057.076969] LR [0000689080b5b504] 0x689080b5b504
[ 1057.076972] --- interrupt: 3000
[ 1057.076975] Code: 38425720 7c0802a6 60000000 7c0802a6 fbe1fff8 7c7f1b78 f8010010 f821ffd1 4bffc575 60000000 39200000 e94d0908 <7d00f8a8> 7c284800 40c20010 7d40f9ad
[ 1057.076990] ---[ end trace 0000000000000000 ]---
== Comment: #1 - Tasmiya Nalatwad <email address hidden> - 2024-05-28 04:39:47 ==
Placed the dump file and dmesg file in the junebug server
ssh <email address hidden>
Location to the dump dile is present : /home/dump/
== Comment: #5 - Sourabh Jain <email address hidden> - 2024-05-29 09:23:29 ==
Hello Team,
Here is my observation on this issue:
The kernel crash is due to sos trying to get data from below sysfs file:
/proc/fs/
This issue is also reproducible with current upstream kernel 6.10-rc1.
So there is nothing wrong with sos tool, it is a kernel bug.
Here is the first kernel bad commit which introduced this issue:
7b207ccd9833 svc: don't hold reference for poolstats, only mutex.
Here are the steps to reproduce this issue without sos tool:
Requirements:
1. Kernel must have "7b207ccd9833 svc: don't hold reference for poolstats, only mutex." commit
2. CONFIG_NFSD=m must be enabled
3. mount nfsd if not already using "$ mount -t nfsd nfsd /proc/fs/nfsd" command
Run the below command reproduce the issue:
$ cat /proc/fs/
NOTE: the above command will crash the kernel.
Thanks,
Sourabh Jain
== Comment: #9 - Sourabh Jain <email address hidden> - 2024-06-17 08:57:19 ==
Hello Team,
NFSD maintainer has provided the fix.
https://<email address hidden>/
Feel free try the above fix.
Note: the fix is for Linux kernel and not for sosreport tool.
Thanks,
Sourabh Jain
== Comment: #10 - Sourabh Jain <email address hidden> - 2024-06-17 22:07:11 ==
Hello Team,
Fix is applied to nfsd-next kernel. Likely to hit mainline kernel in next rc.
https://<email address hidden>/
Thanks,
Sourabh Jain
== Comment: #14 - Tasmiya Nalatwad <email address hidden> - 2024-06-25 03:38:16 ==
Team, I have tested the fix on custom kernel "6.9.0-
---- uname ----
Linux ubuntulp2host 6.9.0-rc7nfsd-fix+ #2 SMP Tue Jun 25 06:49:48 UTC 2024 ppc64le ppc64le ppc64le GNU/Linux
1. sosreport is generated as expected
------------------- logs -------
Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'.
Redirecting to 'sos report '
sosreport (version 4.5.6)
This command will collect system configuration and diagnostic
information from this Ubuntu system.
For more information on Canonical visit:
Community Website : https:/
Commercial Support : https:/
The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.
No changes will be made to system configuration.
Press ENTER to continue, or CTRL-C to quit.
Optionally, please enter the case id that you are generating this report for []:
Setting up archive ...
Setting up plugins ...
[plugin:lxd] skipped command 'lxc image list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw.
[plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw.
[plugin:lxd] skipped command 'lxc network list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw.
[plugin:lxd] skipped command 'lxc profile list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw.
[plugin:lxd] skipped command 'lxc storage list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw.
[plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-
[plugin:networking] skipped command 'ss -peaonmi': required kmods missing: unix_diag, xsk_diag, af_packet_diag, tcp_diag, udp_diag, netlink_diag, inet_diag. Use '--allow-
Not all environment variables set. Source the environment file for the user intended to connect to the OpenStack environment.
[plugin:ufw] skipped command 'ufw status numbered': required kmods missing: bpfilter, iptable_filter.
[plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter, iptable_filter.
Running plugins. Please wait ...
Finishing plugins [Running: logs]
Finished running plugins
Creating compressed archive...
Your sosreport has been generated and saved in:
/tmp/sosreport
Size 5.99MiB
Owner root
sha256 192c04e45142382
Please send this file to your support representative.
2. As mentioned by Sourabh in the above comments the below command is not giving crash/OOPS .
cat /proc/fs/
# pool packets-arrived sockets-enqueued threads-woken threads-timedout
0 0 2 0 0
affects: | linux (Ubuntu) → sosreport (Ubuntu) |
Changed in sosreport (Ubuntu): | |
assignee: | Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → nobody |
Changed in ubuntu-power-systems: | |
assignee: | nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) |
tags: |
added: targetmilestone-inin2404 removed: targetmilestone-inin--- |
Changed in ubuntu-power-systems: | |
importance: | Undecided → High |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
Changed in sosreport (Ubuntu): | |
importance: | Undecided → High |
Changed in ubuntu-power-systems: | |
assignee: | Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Patricia Domingues (patriciasd) |
description: | updated |
Changed in linux (Ubuntu): | |
assignee: | nobody → Patricia Domingues (patriciasd) |
Changed in linux (Ubuntu Noble): | |
importance: | Undecided → High |
status: | New → Fix Committed |
Changed in ubuntu-power-systems: | |
status: | In Progress → Fix Committed |
description: | updated |
Changed in linux (Ubuntu): | |
status: | In Progress → Invalid |
Changed in sosreport (Ubuntu Noble): | |
status: | New → Invalid |
------- Comment From <email address hidden> 2024-06-25 05:18 EDT-------
please integrate this commit into ubuntu 24.04
Fix is applied to nfsd-next kernel. Likely to hit mainline kernel in next rc.
https://<email address hidden>/