Activity log for bug #2070358

Date Who What changed Old value New value Message
2024-06-25 09:19:45 bugproxy bug added bug
2024-06-25 09:19:46 bugproxy tags architecture-ppc64le bugnameltc-206751 severity-high targetmilestone-inin---
2024-06-25 09:19:48 bugproxy ubuntu: assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
2024-06-25 09:19:54 bugproxy affects ubuntu linux (Ubuntu)
2024-06-25 09:24:50 Frank Heimes affects linux (Ubuntu) sosreport (Ubuntu)
2024-06-25 09:25:11 Frank Heimes bug task added linux (Ubuntu)
2024-06-25 09:25:29 Frank Heimes bug task added ubuntu-power-systems
2024-06-25 09:25:42 Frank Heimes sosreport (Ubuntu): assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
2024-06-25 09:25:49 Frank Heimes ubuntu-power-systems: assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
2024-06-25 09:28:12 bugproxy tags architecture-ppc64le bugnameltc-206751 severity-high targetmilestone-inin--- architecture-ppc64le bugnameltc-206751 severity-high targetmilestone-inin2404
2024-06-25 09:56:04 Frank Heimes ubuntu-power-systems: importance Undecided High
2024-06-25 09:56:10 Frank Heimes linux (Ubuntu): importance Undecided High
2024-06-25 09:56:12 Frank Heimes sosreport (Ubuntu): importance Undecided High
2024-07-01 10:39:41 bugproxy attachment added Whole Console logs captured during performing the below steps https://bugs.launchpad.net/bugs/2070358/+attachment/5793768/+files/Update_apt_console_logs
2024-07-01 17:27:21 Frank Heimes sosreport (Ubuntu): status New Invalid
2024-07-12 15:26:14 Patricia Domingues ubuntu-power-systems: assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) Patricia Domingues (patriciasd)
2024-07-12 15:45:03 Patricia Domingues description == Comment: #0 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-05-28 04:35:50 == --- Description --- When sosreport command is executed the kernel OOPS crash is happening and lpar is rebooting. As kdump was enabled the dump is captured. Note : The bug looks similar Bug 206504 Which is seen on z lpars. --- Lpar Details --- 1. PowerVM 2. FW: FW1060.00 (NH1060_026) 3. OS: Ubuntu 24.04 4. Kernel: 6.8.0-31-generic 5. Mem (free -mh): 47Gi 6. cpus: 40 --- Steps to reproduce --- 1. run sosreport command on the lpar and the crash is seen when the sosreport is starting to capture dump. --- Traces --- root@ubuntulp2host:~# sosreport Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'. Redirecting to 'sos report ' sosreport (version 4.5.6) This command will collect system configuration and diagnostic information from this Ubuntu system. For more information on Canonical visit: Community Website : https://www.ubuntu.com/ Commercial Support : https://www.canonical.com The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Press ENTER to continue, or CTRL-C to quit. Optionally, please enter the case id that you are generating this report for []: Setting up archive ... Setting up plugins ... [plugin:lxd] skipped command 'lxc image list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc network list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc profile list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc storage list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection. [plugin:networking] skipped command 'ss -peaonmi': required kmods missing: af_packet_diag, unix_diag, netlink_diag, udp_diag, inet_diag, tcp_diag, xsk_diag. Use '--allow-system-changes' to enable collection. Not all environment variables set. Source the environment file for the user intended to connect to the OpenStack environment. [plugin:ufw] skipped command 'ufw status numbered': required kmods missing: bpfilter, iptable_filter. [plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter, iptable_filter. Running plugins. Please wait ... Starting 21/75 firewall_tables [Running: cloud_init ebpf filesys firewall_tables] [ 1057.076626] Kernel attempted to read user page (0) - exploit attempt? (uid: 0) [ 1057.076645] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 1057.076650] Faulting instruction address: 0xc0000000016ff114 [ 1057.076655] Oops: Kernel access of bad area, sig: 11 [#1] [ 1057.076659] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [ 1057.076665] Modules linked in: rpcsec_gss_krb5 xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc rdma_ucm ib_uverbs qrtr rdma_cm iw_cm ib_cm ib_core cfg80211 binfmt_misc kvm_hv kvm vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace nf_tables nvme_fabrics dm_multipath nvme_core nvme_auth sunrpc nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 nx_compress_pseries nx_compress ibmvscsi 842_decompress ibmveth pseries_rng poly1305_p10_crypto chacha_p10_crypto libchacha crct10dif_vpmsum crc32c_vpmsum aes_gcm_p10_crypto [ 1057.076731] CPU: 25 PID: 6109 Comm: sosreport Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu [ 1057.076737] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_026) hv:phyp pSeries [ 1057.076743] NIP: c0000000016ff114 LR: c0000000016ff108 CTR: c0000000016ff0e0 [ 1057.076747] REGS: c000000067e63630 TRAP: 0300 Not tainted (6.8.0-31-generic) [ 1057.076752] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24044400 XER: 2004008c [ 1057.076761] CFAR: c0000000016fb6c8 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 [ 1057.076761] GPR00: 0000000000000000 c000000067e638d0 c000000002254800 0000000000000000 [ 1057.076761] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR08: 0000000000000000 0000000000000000 c000000057a07980 c008000005d39538 [ 1057.076761] GPR12: c0000000016ff0e0 c000000c1bc8ff00 0000000000000000 0000000000000000 [ 1057.076761] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR20: c00000006751a628 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR24: 0000000000000000 c00000006751a618 0000000000000000 c000000067e63a70 [ 1057.076761] GPR28: c000000067e63a98 0000000000000000 c00000006b4d9188 0000000000000000 [ 1057.076809] NIP [c0000000016ff114] mutex_lock+0x34/0x98 [ 1057.076816] LR [c0000000016ff108] mutex_lock+0x28/0x98 [ 1057.076821] Call Trace: [ 1057.076823] [c000000067e638d0] [c0000000016ff108] mutex_lock+0x28/0x98 (unreliable) [ 1057.076829] [c000000067e63900] [c008000005d2e480] svc_pool_stats_start+0x48/0xf8 [sunrpc] [ 1057.076866] [c000000067e63970] [c0000000007196a0] seq_read_iter+0x16c/0x6a4 [ 1057.076871] [c000000067e63a40] [c000000000719d00] seq_read+0x128/0x1a8 [ 1057.076875] [c000000067e63ae0] [c0000000006c8254] vfs_read+0xe4/0x3e0 [ 1057.076881] [c000000067e63b90] [c0000000006c94a0] ksys_read+0x90/0x168 [ 1057.076886] [c000000067e63be0] [c000000000033248] system_call_exception+0xf8/0x290 [ 1057.076892] [c000000067e63e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec [ 1057.076899] --- interrupt: 3000 at 0x689080b5b504 [ 1057.076903] NIP: 0000689080b5b504 LR: 0000689080b5b504 CTR: 0000000000000000 [ 1057.076907] REGS: c000000067e63e80 TRAP: 3000 Not tainted (6.8.0-31-generic) [ 1057.076911] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 42044402 XER: 00000000 [ 1057.076922] IRQMASK: 0 [ 1057.076922] GPR00: 0000000000000003 000068907600da50 0000689080c96d00 0000000000000008 [ 1057.076922] GPR04: 000068905c014660 0000000000010000 000068907ca613c8 00006890760168e0 [ 1057.076922] GPR08: 000068907600f228 0000000000000000 0000000000000000 0000000000000000 [ 1057.076922] GPR12: 0000000000000000 00006890760168e0 0000000000000001 0000000000000000 [ 1057.076922] GPR16: 000068907c89bb50 000068907ff10968 000068907ff10978 0000689080372f7a [ 1057.076922] GPR20: 0000689080372f78 000068907ff10938 000068907ff108f0 0000000010493180 [ 1057.076922] GPR24: 000068905c014660 0000000000000008 0000000000000000 0000000000000000 [ 1057.076922] GPR28: 000068905c014660 0000000000010000 0000000000000008 000068907600da50 [ 1057.076965] NIP [0000689080b5b504] 0x689080b5b504 [ 1057.076969] LR [0000689080b5b504] 0x689080b5b504 [ 1057.076972] --- interrupt: 3000 [ 1057.076975] Code: 38425720 7c0802a6 60000000 7c0802a6 fbe1fff8 7c7f1b78 f8010010 f821ffd1 4bffc575 60000000 39200000 e94d0908 <7d00f8a8> 7c284800 40c20010 7d40f9ad [ 1057.076990] ---[ end trace 0000000000000000 ]--- == Comment: #1 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-05-28 04:39:47 == Placed the dump file and dmesg file in the junebug server ssh dump@junebug1.isst.aus.stglabs.ibm.com Location to the dump dile is present : /home/dump/dumps/206751 == Comment: #5 - Sourabh Jain <sjain014@in.ibm.com> - 2024-05-29 09:23:29 == Hello Team, Here is my observation on this issue: The kernel crash is due to sos trying to get data from below sysfs file: /proc/fs/nfsd/pool_stats This issue is also reproducible with current upstream kernel 6.10-rc1. So there is nothing wrong with sos tool, it is a kernel bug. Here is the first kernel bad commit which introduced this issue: 7b207ccd9833 svc: don't hold reference for poolstats, only mutex. Here are the steps to reproduce this issue without sos tool: Requirements: 1. Kernel must have "7b207ccd9833 svc: don't hold reference for poolstats, only mutex." commit 2. CONFIG_NFSD=m must be enabled 3. mount nfsd if not already using "$ mount -t nfsd nfsd /proc/fs/nfsd" command Run the below command reproduce the issue: $ cat /proc/fs/nfsd/pool_stats NOTE: the above command will crash the kernel. Thanks, Sourabh Jain == Comment: #9 - Sourabh Jain <sjain014@in.ibm.com> - 2024-06-17 08:57:19 == Hello Team, NFSD maintainer has provided the fix. https://lore.kernel.org/all/20240617-nfsd-next-v1-1-5833b297015a@kernel.org/ Feel free try the above fix. Note: the fix is for Linux kernel and not for sosreport tool. Thanks, Sourabh Jain == Comment: #10 - Sourabh Jain <sjain014@in.ibm.com> - 2024-06-17 22:07:11 == Hello Team, Fix is applied to nfsd-next kernel. Likely to hit mainline kernel in next rc. https://lore.kernel.org/all/ZnBGbmIQy52IDC9L@tissot.1015granger.net/ Thanks, Sourabh Jain == Comment: #14 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-06-25 03:38:16 == Team, I have tested the fix on custom kernel "6.9.0-rc7nfsd-fix+" and the issue is not reproducible. ---- uname ---- Linux ubuntulp2host 6.9.0-rc7nfsd-fix+ #2 SMP Tue Jun 25 06:49:48 UTC 2024 ppc64le ppc64le ppc64le GNU/Linux 1. sosreport is generated as expected ------------------- logs --------------------------- Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'. Redirecting to 'sos report ' sosreport (version 4.5.6) This command will collect system configuration and diagnostic information from this Ubuntu system. For more information on Canonical visit: Community Website : https://www.ubuntu.com/ Commercial Support : https://www.canonical.com The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Press ENTER to continue, or CTRL-C to quit. Optionally, please enter the case id that you are generating this report for []: Setting up archive ... Setting up plugins ... [plugin:lxd] skipped command 'lxc image list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc network list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc profile list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc storage list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection. [plugin:networking] skipped command 'ss -peaonmi': required kmods missing: unix_diag, xsk_diag, af_packet_diag, tcp_diag, udp_diag, netlink_diag, inet_diag. Use '--allow-system-changes' to enable collection. Not all environment variables set. Source the environment file for the user intended to connect to the OpenStack environment. [plugin:ufw] skipped command 'ufw status numbered': required kmods missing: bpfilter, iptable_filter. [plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter, iptable_filter. Running plugins. Please wait ... Finishing plugins [Running: logs] Finished running plugins Creating compressed archive... Your sosreport has been generated and saved in: /tmp/sosreport-ubuntulp2host-2024-06-25-cussrcx.tar.xz Size 5.99MiB Owner root sha256 192c04e45142382038adb223d6dc4aa95edc8edf5d37a576cdd2912e71cdd98b Please send this file to your support representative. 2. As mentioned by Sourabh in the above comments the below command is not giving crash/OOPS . cat /proc/fs/nfsd/pool_stats # pool packets-arrived sockets-enqueued threads-woken threads-timedout 0 0 2 0 0 SRU Justification: [Impact] * When the sosreport command is executed, a kernel OOPS happens and the system is crashing, depending on the configuration (but default) the system/LPAR is rebooting. [Fix] * e0011bca603c101f2a3c007bdb77f7006fa78fb1 e0011bca603c "nfsd: initialise nfsd_info.mutex early" [Test Case] * Have a Ubuntu Server 24.04 LTS installation on ppc64el. * one option is only running sosreport on the system - and the crash is seen when the sosreport is starting to capture dump * second option (without sosreport) is: * CONFIG_NFSD=m (or y) must be set * mount nfsd if not already, using "$ mount -t nfsd nfsd /proc/fs/nfsd" command * The kernel oops will happen and the logs will show: ... BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc0000000016ff114 Oops: Kernel access of bad area, sig: 11 [#1] ... * On a system with that kernel that incl. the above patch no oops will occur and the sosreport command will execute normally. [Regression Potential] * There is a certain risk of a regression, with any code modification, and here because the mutex handling in nfsd is modified. * But the changes are pretty traceable. * On top the commit is already upstream reviewed and accepted. * The modifications were done by the NFSD maintainer and also tested by IBM. [Other] * The fix/commit got upstream accepted with kernel v6.10-rc7, hence Oracular is not affected. == Comment: #0 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-05-28 04:35:50 == --- Description --- When sosreport command is executed the kernel OOPS crash is happening and lpar is rebooting. As kdump was enabled the dump is captured. Note : The bug looks similar Bug 206504 Which is seen on z lpars. --- Lpar Details --- 1. PowerVM 2. FW: FW1060.00 (NH1060_026) 3. OS: Ubuntu 24.04 4. Kernel: 6.8.0-31-generic 5. Mem (free -mh): 47Gi 6. cpus: 40 --- Steps to reproduce --- 1. run sosreport command on the lpar and the crash is seen when the sosreport is starting to capture dump. --- Traces --- root@ubuntulp2host:~# sosreport Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'. Redirecting to 'sos report ' sosreport (version 4.5.6) This command will collect system configuration and diagnostic information from this Ubuntu system. For more information on Canonical visit:         Community Website : https://www.ubuntu.com/         Commercial Support : https://www.canonical.com The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Press ENTER to continue, or CTRL-C to quit. Optionally, please enter the case id that you are generating this report for []:  Setting up archive ...  Setting up plugins ... [plugin:lxd] skipped command 'lxc image list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc network list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc profile list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc storage list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection. [plugin:networking] skipped command 'ss -peaonmi': required kmods missing: af_packet_diag, unix_diag, netlink_diag, udp_diag, inet_diag, tcp_diag, xsk_diag. Use '--allow-system-changes' to enable collection. Not all environment variables set. Source the environment file for the user intended to connect to the OpenStack environment. [plugin:ufw] skipped command 'ufw status numbered': required kmods missing: bpfilter, iptable_filter. [plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter, iptable_filter.  Running plugins. Please wait ...   Starting 21/75 firewall_tables [Running: cloud_init ebpf filesys firewall_tables] [ 1057.076626] Kernel attempted to read user page (0) - exploit attempt? (uid: 0) [ 1057.076645] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 1057.076650] Faulting instruction address: 0xc0000000016ff114 [ 1057.076655] Oops: Kernel access of bad area, sig: 11 [#1] [ 1057.076659] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [ 1057.076665] Modules linked in: rpcsec_gss_krb5 xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc rdma_ucm ib_uverbs qrtr rdma_cm iw_cm ib_cm ib_core cfg80211 binfmt_misc kvm_hv kvm vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace nf_tables nvme_fabrics dm_multipath nvme_core nvme_auth sunrpc nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 nx_compress_pseries nx_compress ibmvscsi 842_decompress ibmveth pseries_rng poly1305_p10_crypto chacha_p10_crypto libchacha crct10dif_vpmsum crc32c_vpmsum aes_gcm_p10_crypto [ 1057.076731] CPU: 25 PID: 6109 Comm: sosreport Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu [ 1057.076737] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_026) hv:phyp pSeries [ 1057.076743] NIP: c0000000016ff114 LR: c0000000016ff108 CTR: c0000000016ff0e0 [ 1057.076747] REGS: c000000067e63630 TRAP: 0300 Not tainted (6.8.0-31-generic) [ 1057.076752] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24044400 XER: 2004008c [ 1057.076761] CFAR: c0000000016fb6c8 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 [ 1057.076761] GPR00: 0000000000000000 c000000067e638d0 c000000002254800 0000000000000000 [ 1057.076761] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR08: 0000000000000000 0000000000000000 c000000057a07980 c008000005d39538 [ 1057.076761] GPR12: c0000000016ff0e0 c000000c1bc8ff00 0000000000000000 0000000000000000 [ 1057.076761] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR20: c00000006751a628 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR24: 0000000000000000 c00000006751a618 0000000000000000 c000000067e63a70 [ 1057.076761] GPR28: c000000067e63a98 0000000000000000 c00000006b4d9188 0000000000000000 [ 1057.076809] NIP [c0000000016ff114] mutex_lock+0x34/0x98 [ 1057.076816] LR [c0000000016ff108] mutex_lock+0x28/0x98 [ 1057.076821] Call Trace: [ 1057.076823] [c000000067e638d0] [c0000000016ff108] mutex_lock+0x28/0x98 (unreliable) [ 1057.076829] [c000000067e63900] [c008000005d2e480] svc_pool_stats_start+0x48/0xf8 [sunrpc] [ 1057.076866] [c000000067e63970] [c0000000007196a0] seq_read_iter+0x16c/0x6a4 [ 1057.076871] [c000000067e63a40] [c000000000719d00] seq_read+0x128/0x1a8 [ 1057.076875] [c000000067e63ae0] [c0000000006c8254] vfs_read+0xe4/0x3e0 [ 1057.076881] [c000000067e63b90] [c0000000006c94a0] ksys_read+0x90/0x168 [ 1057.076886] [c000000067e63be0] [c000000000033248] system_call_exception+0xf8/0x290 [ 1057.076892] [c000000067e63e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec [ 1057.076899] --- interrupt: 3000 at 0x689080b5b504 [ 1057.076903] NIP: 0000689080b5b504 LR: 0000689080b5b504 CTR: 0000000000000000 [ 1057.076907] REGS: c000000067e63e80 TRAP: 3000 Not tainted (6.8.0-31-generic) [ 1057.076911] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 42044402 XER: 00000000 [ 1057.076922] IRQMASK: 0 [ 1057.076922] GPR00: 0000000000000003 000068907600da50 0000689080c96d00 0000000000000008 [ 1057.076922] GPR04: 000068905c014660 0000000000010000 000068907ca613c8 00006890760168e0 [ 1057.076922] GPR08: 000068907600f228 0000000000000000 0000000000000000 0000000000000000 [ 1057.076922] GPR12: 0000000000000000 00006890760168e0 0000000000000001 0000000000000000 [ 1057.076922] GPR16: 000068907c89bb50 000068907ff10968 000068907ff10978 0000689080372f7a [ 1057.076922] GPR20: 0000689080372f78 000068907ff10938 000068907ff108f0 0000000010493180 [ 1057.076922] GPR24: 000068905c014660 0000000000000008 0000000000000000 0000000000000000 [ 1057.076922] GPR28: 000068905c014660 0000000000010000 0000000000000008 000068907600da50 [ 1057.076965] NIP [0000689080b5b504] 0x689080b5b504 [ 1057.076969] LR [0000689080b5b504] 0x689080b5b504 [ 1057.076972] --- interrupt: 3000 [ 1057.076975] Code: 38425720 7c0802a6 60000000 7c0802a6 fbe1fff8 7c7f1b78 f8010010 f821ffd1 4bffc575 60000000 39200000 e94d0908 <7d00f8a8> 7c284800 40c20010 7d40f9ad [ 1057.076990] ---[ end trace 0000000000000000 ]--- == Comment: #1 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-05-28 04:39:47 == Placed the dump file and dmesg file in the junebug server ssh dump@junebug1.isst.aus.stglabs.ibm.com Location to the dump dile is present : /home/dump/dumps/206751 == Comment: #5 - Sourabh Jain <sjain014@in.ibm.com> - 2024-05-29 09:23:29 == Hello Team, Here is my observation on this issue: The kernel crash is due to sos trying to get data from below sysfs file: /proc/fs/nfsd/pool_stats This issue is also reproducible with current upstream kernel 6.10-rc1. So there is nothing wrong with sos tool, it is a kernel bug. Here is the first kernel bad commit which introduced this issue: 7b207ccd9833 svc: don't hold reference for poolstats, only mutex. Here are the steps to reproduce this issue without sos tool: Requirements:  1. Kernel must have "7b207ccd9833 svc: don't hold reference for poolstats, only mutex." commit  2. CONFIG_NFSD=m must be enabled  3. mount nfsd if not already using "$ mount -t nfsd nfsd /proc/fs/nfsd" command Run the below command reproduce the issue: $ cat /proc/fs/nfsd/pool_stats NOTE: the above command will crash the kernel. Thanks, Sourabh Jain == Comment: #9 - Sourabh Jain <sjain014@in.ibm.com> - 2024-06-17 08:57:19 == Hello Team, NFSD maintainer has provided the fix. https://lore.kernel.org/all/20240617-nfsd-next-v1-1-5833b297015a@kernel.org/ Feel free try the above fix. Note: the fix is for Linux kernel and not for sosreport tool. Thanks, Sourabh Jain == Comment: #10 - Sourabh Jain <sjain014@in.ibm.com> - 2024-06-17 22:07:11 == Hello Team, Fix is applied to nfsd-next kernel. Likely to hit mainline kernel in next rc. https://lore.kernel.org/all/ZnBGbmIQy52IDC9L@tissot.1015granger.net/ Thanks, Sourabh Jain == Comment: #14 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-06-25 03:38:16 == Team, I have tested the fix on custom kernel "6.9.0-rc7nfsd-fix+" and the issue is not reproducible. ---- uname ---- Linux ubuntulp2host 6.9.0-rc7nfsd-fix+ #2 SMP Tue Jun 25 06:49:48 UTC 2024 ppc64le ppc64le ppc64le GNU/Linux 1. sosreport is generated as expected ------------------- logs --------------------------- Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'. Redirecting to 'sos report ' sosreport (version 4.5.6) This command will collect system configuration and diagnostic information from this Ubuntu system. For more information on Canonical visit:         Community Website : https://www.ubuntu.com/         Commercial Support : https://www.canonical.com The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Press ENTER to continue, or CTRL-C to quit. Optionally, please enter the case id that you are generating this report for []:  Setting up archive ...  Setting up plugins ... [plugin:lxd] skipped command 'lxc image list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc network list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc profile list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc storage list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection. [plugin:networking] skipped command 'ss -peaonmi': required kmods missing: unix_diag, xsk_diag, af_packet_diag, tcp_diag, udp_diag, netlink_diag, inet_diag. Use '--allow-system-changes' to enable collection. Not all environment variables set. Source the environment file for the user intended to connect to the OpenStack environment. [plugin:ufw] skipped command 'ufw status numbered': required kmods missing: bpfilter, iptable_filter. [plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter, iptable_filter.  Running plugins. Please wait ...   Finishing plugins [Running: logs]   Finished running plugins Creating compressed archive... Your sosreport has been generated and saved in:  /tmp/sosreport-ubuntulp2host-2024-06-25-cussrcx.tar.xz  Size 5.99MiB  Owner root  sha256 192c04e45142382038adb223d6dc4aa95edc8edf5d37a576cdd2912e71cdd98b Please send this file to your support representative. 2. As mentioned by Sourabh in the above comments the below command is not giving crash/OOPS . cat /proc/fs/nfsd/pool_stats # pool packets-arrived sockets-enqueued threads-woken threads-timedout 0 0 2 0 0
2024-07-12 16:38:35 Frank Heimes linux (Ubuntu): assignee Patricia Domingues (patriciasd)
2024-07-15 05:25:22 Frank Heimes ubuntu-power-systems: status New In Progress
2024-07-15 05:25:26 Frank Heimes linux (Ubuntu): status New In Progress
2024-07-15 05:25:43 Frank Heimes linux (Ubuntu): assignee Patricia Domingues (patriciasd) Canonical Kernel Team (canonical-kernel-team)
2024-07-19 09:26:42 Stefan Bader nominated for series Ubuntu Noble
2024-07-19 09:26:42 Stefan Bader bug task added linux (Ubuntu Noble)
2024-07-19 09:26:42 Stefan Bader bug task added sosreport (Ubuntu Noble)
2024-07-19 09:27:05 Stefan Bader linux (Ubuntu Noble): importance Undecided High
2024-07-19 09:27:05 Stefan Bader linux (Ubuntu Noble): status New Fix Committed
2024-07-19 09:54:59 Frank Heimes ubuntu-power-systems: status In Progress Fix Committed
2024-07-19 09:55:50 Frank Heimes description SRU Justification: [Impact] * When the sosreport command is executed, a kernel OOPS happens and the system is crashing, depending on the configuration (but default) the system/LPAR is rebooting. [Fix] * e0011bca603c101f2a3c007bdb77f7006fa78fb1 e0011bca603c "nfsd: initialise nfsd_info.mutex early" [Test Case] * Have a Ubuntu Server 24.04 LTS installation on ppc64el. * one option is only running sosreport on the system - and the crash is seen when the sosreport is starting to capture dump * second option (without sosreport) is: * CONFIG_NFSD=m (or y) must be set * mount nfsd if not already, using "$ mount -t nfsd nfsd /proc/fs/nfsd" command * The kernel oops will happen and the logs will show: ... BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc0000000016ff114 Oops: Kernel access of bad area, sig: 11 [#1] ... * On a system with that kernel that incl. the above patch no oops will occur and the sosreport command will execute normally. [Regression Potential] * There is a certain risk of a regression, with any code modification, and here because the mutex handling in nfsd is modified. * But the changes are pretty traceable. * On top the commit is already upstream reviewed and accepted. * The modifications were done by the NFSD maintainer and also tested by IBM. [Other] * The fix/commit got upstream accepted with kernel v6.10-rc7, hence Oracular is not affected. == Comment: #0 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-05-28 04:35:50 == --- Description --- When sosreport command is executed the kernel OOPS crash is happening and lpar is rebooting. As kdump was enabled the dump is captured. Note : The bug looks similar Bug 206504 Which is seen on z lpars. --- Lpar Details --- 1. PowerVM 2. FW: FW1060.00 (NH1060_026) 3. OS: Ubuntu 24.04 4. Kernel: 6.8.0-31-generic 5. Mem (free -mh): 47Gi 6. cpus: 40 --- Steps to reproduce --- 1. run sosreport command on the lpar and the crash is seen when the sosreport is starting to capture dump. --- Traces --- root@ubuntulp2host:~# sosreport Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'. Redirecting to 'sos report ' sosreport (version 4.5.6) This command will collect system configuration and diagnostic information from this Ubuntu system. For more information on Canonical visit:         Community Website : https://www.ubuntu.com/         Commercial Support : https://www.canonical.com The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Press ENTER to continue, or CTRL-C to quit. Optionally, please enter the case id that you are generating this report for []:  Setting up archive ...  Setting up plugins ... [plugin:lxd] skipped command 'lxc image list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc network list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc profile list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc storage list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection. [plugin:networking] skipped command 'ss -peaonmi': required kmods missing: af_packet_diag, unix_diag, netlink_diag, udp_diag, inet_diag, tcp_diag, xsk_diag. Use '--allow-system-changes' to enable collection. Not all environment variables set. Source the environment file for the user intended to connect to the OpenStack environment. [plugin:ufw] skipped command 'ufw status numbered': required kmods missing: bpfilter, iptable_filter. [plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter, iptable_filter.  Running plugins. Please wait ...   Starting 21/75 firewall_tables [Running: cloud_init ebpf filesys firewall_tables] [ 1057.076626] Kernel attempted to read user page (0) - exploit attempt? (uid: 0) [ 1057.076645] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 1057.076650] Faulting instruction address: 0xc0000000016ff114 [ 1057.076655] Oops: Kernel access of bad area, sig: 11 [#1] [ 1057.076659] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [ 1057.076665] Modules linked in: rpcsec_gss_krb5 xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc rdma_ucm ib_uverbs qrtr rdma_cm iw_cm ib_cm ib_core cfg80211 binfmt_misc kvm_hv kvm vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace nf_tables nvme_fabrics dm_multipath nvme_core nvme_auth sunrpc nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 nx_compress_pseries nx_compress ibmvscsi 842_decompress ibmveth pseries_rng poly1305_p10_crypto chacha_p10_crypto libchacha crct10dif_vpmsum crc32c_vpmsum aes_gcm_p10_crypto [ 1057.076731] CPU: 25 PID: 6109 Comm: sosreport Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu [ 1057.076737] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_026) hv:phyp pSeries [ 1057.076743] NIP: c0000000016ff114 LR: c0000000016ff108 CTR: c0000000016ff0e0 [ 1057.076747] REGS: c000000067e63630 TRAP: 0300 Not tainted (6.8.0-31-generic) [ 1057.076752] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24044400 XER: 2004008c [ 1057.076761] CFAR: c0000000016fb6c8 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 [ 1057.076761] GPR00: 0000000000000000 c000000067e638d0 c000000002254800 0000000000000000 [ 1057.076761] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR08: 0000000000000000 0000000000000000 c000000057a07980 c008000005d39538 [ 1057.076761] GPR12: c0000000016ff0e0 c000000c1bc8ff00 0000000000000000 0000000000000000 [ 1057.076761] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR20: c00000006751a628 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR24: 0000000000000000 c00000006751a618 0000000000000000 c000000067e63a70 [ 1057.076761] GPR28: c000000067e63a98 0000000000000000 c00000006b4d9188 0000000000000000 [ 1057.076809] NIP [c0000000016ff114] mutex_lock+0x34/0x98 [ 1057.076816] LR [c0000000016ff108] mutex_lock+0x28/0x98 [ 1057.076821] Call Trace: [ 1057.076823] [c000000067e638d0] [c0000000016ff108] mutex_lock+0x28/0x98 (unreliable) [ 1057.076829] [c000000067e63900] [c008000005d2e480] svc_pool_stats_start+0x48/0xf8 [sunrpc] [ 1057.076866] [c000000067e63970] [c0000000007196a0] seq_read_iter+0x16c/0x6a4 [ 1057.076871] [c000000067e63a40] [c000000000719d00] seq_read+0x128/0x1a8 [ 1057.076875] [c000000067e63ae0] [c0000000006c8254] vfs_read+0xe4/0x3e0 [ 1057.076881] [c000000067e63b90] [c0000000006c94a0] ksys_read+0x90/0x168 [ 1057.076886] [c000000067e63be0] [c000000000033248] system_call_exception+0xf8/0x290 [ 1057.076892] [c000000067e63e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec [ 1057.076899] --- interrupt: 3000 at 0x689080b5b504 [ 1057.076903] NIP: 0000689080b5b504 LR: 0000689080b5b504 CTR: 0000000000000000 [ 1057.076907] REGS: c000000067e63e80 TRAP: 3000 Not tainted (6.8.0-31-generic) [ 1057.076911] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 42044402 XER: 00000000 [ 1057.076922] IRQMASK: 0 [ 1057.076922] GPR00: 0000000000000003 000068907600da50 0000689080c96d00 0000000000000008 [ 1057.076922] GPR04: 000068905c014660 0000000000010000 000068907ca613c8 00006890760168e0 [ 1057.076922] GPR08: 000068907600f228 0000000000000000 0000000000000000 0000000000000000 [ 1057.076922] GPR12: 0000000000000000 00006890760168e0 0000000000000001 0000000000000000 [ 1057.076922] GPR16: 000068907c89bb50 000068907ff10968 000068907ff10978 0000689080372f7a [ 1057.076922] GPR20: 0000689080372f78 000068907ff10938 000068907ff108f0 0000000010493180 [ 1057.076922] GPR24: 000068905c014660 0000000000000008 0000000000000000 0000000000000000 [ 1057.076922] GPR28: 000068905c014660 0000000000010000 0000000000000008 000068907600da50 [ 1057.076965] NIP [0000689080b5b504] 0x689080b5b504 [ 1057.076969] LR [0000689080b5b504] 0x689080b5b504 [ 1057.076972] --- interrupt: 3000 [ 1057.076975] Code: 38425720 7c0802a6 60000000 7c0802a6 fbe1fff8 7c7f1b78 f8010010 f821ffd1 4bffc575 60000000 39200000 e94d0908 <7d00f8a8> 7c284800 40c20010 7d40f9ad [ 1057.076990] ---[ end trace 0000000000000000 ]--- == Comment: #1 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-05-28 04:39:47 == Placed the dump file and dmesg file in the junebug server ssh dump@junebug1.isst.aus.stglabs.ibm.com Location to the dump dile is present : /home/dump/dumps/206751 == Comment: #5 - Sourabh Jain <sjain014@in.ibm.com> - 2024-05-29 09:23:29 == Hello Team, Here is my observation on this issue: The kernel crash is due to sos trying to get data from below sysfs file: /proc/fs/nfsd/pool_stats This issue is also reproducible with current upstream kernel 6.10-rc1. So there is nothing wrong with sos tool, it is a kernel bug. Here is the first kernel bad commit which introduced this issue: 7b207ccd9833 svc: don't hold reference for poolstats, only mutex. Here are the steps to reproduce this issue without sos tool: Requirements:  1. Kernel must have "7b207ccd9833 svc: don't hold reference for poolstats, only mutex." commit  2. CONFIG_NFSD=m must be enabled  3. mount nfsd if not already using "$ mount -t nfsd nfsd /proc/fs/nfsd" command Run the below command reproduce the issue: $ cat /proc/fs/nfsd/pool_stats NOTE: the above command will crash the kernel. Thanks, Sourabh Jain == Comment: #9 - Sourabh Jain <sjain014@in.ibm.com> - 2024-06-17 08:57:19 == Hello Team, NFSD maintainer has provided the fix. https://lore.kernel.org/all/20240617-nfsd-next-v1-1-5833b297015a@kernel.org/ Feel free try the above fix. Note: the fix is for Linux kernel and not for sosreport tool. Thanks, Sourabh Jain == Comment: #10 - Sourabh Jain <sjain014@in.ibm.com> - 2024-06-17 22:07:11 == Hello Team, Fix is applied to nfsd-next kernel. Likely to hit mainline kernel in next rc. https://lore.kernel.org/all/ZnBGbmIQy52IDC9L@tissot.1015granger.net/ Thanks, Sourabh Jain == Comment: #14 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-06-25 03:38:16 == Team, I have tested the fix on custom kernel "6.9.0-rc7nfsd-fix+" and the issue is not reproducible. ---- uname ---- Linux ubuntulp2host 6.9.0-rc7nfsd-fix+ #2 SMP Tue Jun 25 06:49:48 UTC 2024 ppc64le ppc64le ppc64le GNU/Linux 1. sosreport is generated as expected ------------------- logs --------------------------- Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'. Redirecting to 'sos report ' sosreport (version 4.5.6) This command will collect system configuration and diagnostic information from this Ubuntu system. For more information on Canonical visit:         Community Website : https://www.ubuntu.com/         Commercial Support : https://www.canonical.com The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Press ENTER to continue, or CTRL-C to quit. Optionally, please enter the case id that you are generating this report for []:  Setting up archive ...  Setting up plugins ... [plugin:lxd] skipped command 'lxc image list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc network list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc profile list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc storage list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection. [plugin:networking] skipped command 'ss -peaonmi': required kmods missing: unix_diag, xsk_diag, af_packet_diag, tcp_diag, udp_diag, netlink_diag, inet_diag. Use '--allow-system-changes' to enable collection. Not all environment variables set. Source the environment file for the user intended to connect to the OpenStack environment. [plugin:ufw] skipped command 'ufw status numbered': required kmods missing: bpfilter, iptable_filter. [plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter, iptable_filter.  Running plugins. Please wait ...   Finishing plugins [Running: logs]   Finished running plugins Creating compressed archive... Your sosreport has been generated and saved in:  /tmp/sosreport-ubuntulp2host-2024-06-25-cussrcx.tar.xz  Size 5.99MiB  Owner root  sha256 192c04e45142382038adb223d6dc4aa95edc8edf5d37a576cdd2912e71cdd98b Please send this file to your support representative. 2. As mentioned by Sourabh in the above comments the below command is not giving crash/OOPS . cat /proc/fs/nfsd/pool_stats # pool packets-arrived sockets-enqueued threads-woken threads-timedout 0 0 2 0 0 SRU Justification: [Impact]  * When the sosreport command is executed, a kernel OOPS happens and the system is crashing,   depending on the configuration (but default) the system/LPAR is rebooting. [Fix]  * e0011bca603c101f2a3c007bdb77f7006fa78fb1 e0011bca603c "nfsd: initialise nfsd_info.mutex early" [Test Case]  * Have a Ubuntu Server 24.04 LTS installation on ppc64el.  * one option is only running sosreport on the system - and  the crash is seen when the sosreport is starting to capture dump  * second option (without sosreport) is:  * CONFIG_NFSD=m (or y) must be set  * mount nfsd if not already, using "$ mount -t nfsd nfsd /proc/fs/nfsd" command  * The kernel oops will happen and the logs will show:    ...    BUG: Kernel NULL pointer dereference on read at 0x00000000    Faulting instruction address: 0xc0000000016ff114    Oops: Kernel access of bad area, sig: 11 [#1]    ...  * On a system with that kernel that incl. the above patch    no oops will occur and the sosreport command will execute normally. [Regression Potential] * There is a certain risk of a regression, with any code modification,   and here because the mutex handling in nfsd is modified. * But the changes are pretty traceable. * On top the commit is already upstream reviewed and accepted. * The modifications were done by the NFSD maintainer and also tested by IBM. [Other] * The fix/commit got upstream accepted with kernel v6.10-rc7,   hence Oracular (with a planned kernel of >=6.10) is not affected. == Comment: #0 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-05-28 04:35:50 == --- Description --- When sosreport command is executed the kernel OOPS crash is happening and lpar is rebooting. As kdump was enabled the dump is captured. Note : The bug looks similar Bug 206504 Which is seen on z lpars. --- Lpar Details --- 1. PowerVM 2. FW: FW1060.00 (NH1060_026) 3. OS: Ubuntu 24.04 4. Kernel: 6.8.0-31-generic 5. Mem (free -mh): 47Gi 6. cpus: 40 --- Steps to reproduce --- 1. run sosreport command on the lpar and the crash is seen when the sosreport is starting to capture dump. --- Traces --- root@ubuntulp2host:~# sosreport Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'. Redirecting to 'sos report ' sosreport (version 4.5.6) This command will collect system configuration and diagnostic information from this Ubuntu system. For more information on Canonical visit:         Community Website : https://www.ubuntu.com/         Commercial Support : https://www.canonical.com The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Press ENTER to continue, or CTRL-C to quit. Optionally, please enter the case id that you are generating this report for []:  Setting up archive ...  Setting up plugins ... [plugin:lxd] skipped command 'lxc image list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc network list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc profile list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:lxd] skipped command 'lxc storage list': required kmods missing: ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables, ip6table_filter. [plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection. [plugin:networking] skipped command 'ss -peaonmi': required kmods missing: af_packet_diag, unix_diag, netlink_diag, udp_diag, inet_diag, tcp_diag, xsk_diag. Use '--allow-system-changes' to enable collection. Not all environment variables set. Source the environment file for the user intended to connect to the OpenStack environment. [plugin:ufw] skipped command 'ufw status numbered': required kmods missing: bpfilter, iptable_filter. [plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter, iptable_filter.  Running plugins. Please wait ...   Starting 21/75 firewall_tables [Running: cloud_init ebpf filesys firewall_tables] [ 1057.076626] Kernel attempted to read user page (0) - exploit attempt? (uid: 0) [ 1057.076645] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 1057.076650] Faulting instruction address: 0xc0000000016ff114 [ 1057.076655] Oops: Kernel access of bad area, sig: 11 [#1] [ 1057.076659] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [ 1057.076665] Modules linked in: rpcsec_gss_krb5 xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc rdma_ucm ib_uverbs qrtr rdma_cm iw_cm ib_cm ib_core cfg80211 binfmt_misc kvm_hv kvm vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace nf_tables nvme_fabrics dm_multipath nvme_core nvme_auth sunrpc nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 nx_compress_pseries nx_compress ibmvscsi 842_decompress ibmveth pseries_rng poly1305_p10_crypto chacha_p10_crypto libchacha crct10dif_vpmsum crc32c_vpmsum aes_gcm_p10_crypto [ 1057.076731] CPU: 25 PID: 6109 Comm: sosreport Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu [ 1057.076737] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_026) hv:phyp pSeries [ 1057.076743] NIP: c0000000016ff114 LR: c0000000016ff108 CTR: c0000000016ff0e0 [ 1057.076747] REGS: c000000067e63630 TRAP: 0300 Not tainted (6.8.0-31-generic) [ 1057.076752] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24044400 XER: 2004008c [ 1057.076761] CFAR: c0000000016fb6c8 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 [ 1057.076761] GPR00: 0000000000000000 c000000067e638d0 c000000002254800 0000000000000000 [ 1057.076761] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR08: 0000000000000000 0000000000000000 c000000057a07980 c008000005d39538 [ 1057.076761] GPR12: c0000000016ff0e0 c000000c1bc8ff00 0000000000000000 0000000000000000 [ 1057.076761] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR20: c00000006751a628 0000000000000000 0000000000000000 0000000000000000 [ 1057.076761] GPR24: 0000000000000000 c00000006751a618 0000000000000000 c000000067e63a70 [ 1057.076761] GPR28: c000000067e63a98 0000000000000000 c00000006b4d9188 0000000000000000 [ 1057.076809] NIP [c0000000016ff114] mutex_lock+0x34/0x98 [ 1057.076816] LR [c0000000016ff108] mutex_lock+0x28/0x98 [ 1057.076821] Call Trace: [ 1057.076823] [c000000067e638d0] [c0000000016ff108] mutex_lock+0x28/0x98 (unreliable) [ 1057.076829] [c000000067e63900] [c008000005d2e480] svc_pool_stats_start+0x48/0xf8 [sunrpc] [ 1057.076866] [c000000067e63970] [c0000000007196a0] seq_read_iter+0x16c/0x6a4 [ 1057.076871] [c000000067e63a40] [c000000000719d00] seq_read+0x128/0x1a8 [ 1057.076875] [c000000067e63ae0] [c0000000006c8254] vfs_read+0xe4/0x3e0 [ 1057.076881] [c000000067e63b90] [c0000000006c94a0] ksys_read+0x90/0x168 [ 1057.076886] [c000000067e63be0] [c000000000033248] system_call_exception+0xf8/0x290 [ 1057.076892] [c000000067e63e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec [ 1057.076899] --- interrupt: 3000 at 0x689080b5b504 [ 1057.076903] NIP: 0000689080b5b504 LR: 0000689080b5b504 CTR: 0000000000000000 [ 1057.076907] REGS: c000000067e63e80 TRAP: 3000 Not tainted (6.8.0-31-generic) [ 1057.076911] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 42044402 XER: 00000000 [ 1057.076922] IRQMASK: 0 [ 1057.076922] GPR00: 0000000000000003 000068907600da50 0000689080c96d00 0000000000000008 [ 1057.076922] GPR04: 000068905c014660 0000000000010000 000068907ca613c8 00006890760168e0 [ 1057.076922] GPR08: 000068907600f228 0000000000000000 0000000000000000 0000000000000000 [ 1057.076922] GPR12: 0000000000000000 00006890760168e0 0000000000000001 0000000000000000 [ 1057.076922] GPR16: 000068907c89bb50 000068907ff10968 000068907ff10978 0000689080372f7a [ 1057.076922] GPR20: 0000689080372f78 000068907ff10938 000068907ff108f0 0000000010493180 [ 1057.076922] GPR24: 000068905c014660 0000000000000008 0000000000000000 0000000000000000 [ 1057.076922] GPR28: 000068905c014660 0000000000010000 0000000000000008 000068907600da50 [ 1057.076965] NIP [0000689080b5b504] 0x689080b5b504 [ 1057.076969] LR [0000689080b5b504] 0x689080b5b504 [ 1057.076972] --- interrupt: 3000 [ 1057.076975] Code: 38425720 7c0802a6 60000000 7c0802a6 fbe1fff8 7c7f1b78 f8010010 f821ffd1 4bffc575 60000000 39200000 e94d0908 <7d00f8a8> 7c284800 40c20010 7d40f9ad [ 1057.076990] ---[ end trace 0000000000000000 ]--- == Comment: #1 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-05-28 04:39:47 == Placed the dump file and dmesg file in the junebug server ssh dump@junebug1.isst.aus.stglabs.ibm.com Location to the dump dile is present : /home/dump/dumps/206751 == Comment: #5 - Sourabh Jain <sjain014@in.ibm.com> - 2024-05-29 09:23:29 == Hello Team, Here is my observation on this issue: The kernel crash is due to sos trying to get data from below sysfs file: /proc/fs/nfsd/pool_stats This issue is also reproducible with current upstream kernel 6.10-rc1. So there is nothing wrong with sos tool, it is a kernel bug. Here is the first kernel bad commit which introduced this issue: 7b207ccd9833 svc: don't hold reference for poolstats, only mutex. Here are the steps to reproduce this issue without sos tool: Requirements:  1. Kernel must have "7b207ccd9833 svc: don't hold reference for poolstats, only mutex." commit  2. CONFIG_NFSD=m must be enabled  3. mount nfsd if not already using "$ mount -t nfsd nfsd /proc/fs/nfsd" command Run the below command reproduce the issue: $ cat /proc/fs/nfsd/pool_stats NOTE: the above command will crash the kernel. Thanks, Sourabh Jain == Comment: #9 - Sourabh Jain <sjain014@in.ibm.com> - 2024-06-17 08:57:19 == Hello Team, NFSD maintainer has provided the fix. https://lore.kernel.org/all/20240617-nfsd-next-v1-1-5833b297015a@kernel.org/ Feel free try the above fix. Note: the fix is for Linux kernel and not for sosreport tool. Thanks, Sourabh Jain == Comment: #10 - Sourabh Jain <sjain014@in.ibm.com> - 2024-06-17 22:07:11 == Hello Team, Fix is applied to nfsd-next kernel. Likely to hit mainline kernel in next rc. https://lore.kernel.org/all/ZnBGbmIQy52IDC9L@tissot.1015granger.net/ Thanks, Sourabh Jain == Comment: #14 - Tasmiya Nalatwad <Tasmiya.Nalatwad@ibm.com> - 2024-06-25 03:38:16 == Team, I have tested the fix on custom kernel "6.9.0-rc7nfsd-fix+" and the issue is not reproducible. ---- uname ---- Linux ubuntulp2host 6.9.0-rc7nfsd-fix+ #2 SMP Tue Jun 25 06:49:48 UTC 2024 ppc64le ppc64le ppc64le GNU/Linux 1. sosreport is generated as expected ------------------- logs --------------------------- Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'. Redirecting to 'sos report ' sosreport (version 4.5.6) This command will collect system configuration and diagnostic information from this Ubuntu system. For more information on Canonical visit:         Community Website : https://www.ubuntu.com/         Commercial Support : https://www.canonical.com The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Press ENTER to continue, or CTRL-C to quit. Optionally, please enter the case id that you are generating this report for []:  Setting up archive ...  Setting up plugins ... [plugin:lxd] skipped command 'lxc image list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc network list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc profile list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:lxd] skipped command 'lxc storage list': required kmods missing: ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw. [plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection. [plugin:networking] skipped command 'ss -peaonmi': required kmods missing: unix_diag, xsk_diag, af_packet_diag, tcp_diag, udp_diag, netlink_diag, inet_diag. Use '--allow-system-changes' to enable collection. Not all environment variables set. Source the environment file for the user intended to connect to the OpenStack environment. [plugin:ufw] skipped command 'ufw status numbered': required kmods missing: bpfilter, iptable_filter. [plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter, iptable_filter.  Running plugins. Please wait ...   Finishing plugins [Running: logs]   Finished running plugins Creating compressed archive... Your sosreport has been generated and saved in:  /tmp/sosreport-ubuntulp2host-2024-06-25-cussrcx.tar.xz  Size 5.99MiB  Owner root  sha256 192c04e45142382038adb223d6dc4aa95edc8edf5d37a576cdd2912e71cdd98b Please send this file to your support representative. 2. As mentioned by Sourabh in the above comments the below command is not giving crash/OOPS . cat /proc/fs/nfsd/pool_stats # pool packets-arrived sockets-enqueued threads-woken threads-timedout 0 0 2 0 0
2024-07-19 09:56:08 Frank Heimes linux (Ubuntu): status In Progress Invalid
2024-07-19 09:56:23 Frank Heimes sosreport (Ubuntu Noble): status New Invalid
2024-08-08 15:13:21 Ubuntu Kernel Bot tags architecture-ppc64le bugnameltc-206751 severity-high targetmilestone-inin2404 architecture-ppc64le bugnameltc-206751 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-needed-noble-linux