Activity log for bug #1968096

Date Who What changed Old value New value Message
2022-04-06 22:19:15 bugproxy bug added bug
2022-04-06 22:19:17 bugproxy tags architecture-s39064 bugnameltc-197384 severity-high targetmilestone-inin---
2022-04-06 22:19:18 bugproxy ubuntu: assignee Skipper Bug Screeners (skipper-screen-team)
2022-04-06 22:19:21 bugproxy affects ubuntu linux (Ubuntu)
2022-04-07 05:13:56 Frank Heimes bug task added ubuntu-z-systems
2022-04-07 05:14:13 Frank Heimes ubuntu-z-systems: assignee Skipper Bug Screeners (skipper-screen-team)
2022-04-07 05:14:37 Frank Heimes ubuntu-z-systems: importance Undecided High
2022-04-07 06:24:25 Frank Heimes ubuntu-z-systems: status New Incomplete
2022-05-16 13:57:41 Frank Heimes ubuntu-z-systems: status Incomplete In Progress
2022-05-16 13:57:48 Frank Heimes linux (Ubuntu): assignee Skipper Bug Screeners (skipper-screen-team) Frank Heimes (fheimes)
2022-05-16 13:57:57 Frank Heimes nominated for series Ubuntu Impish
2022-05-16 13:57:57 Frank Heimes bug task added linux (Ubuntu Impish)
2022-05-16 13:57:57 Frank Heimes nominated for series Ubuntu Jammy
2022-05-16 13:57:57 Frank Heimes bug task added linux (Ubuntu Jammy)
2022-05-16 13:57:57 Frank Heimes nominated for series Ubuntu Focal
2022-05-16 13:57:57 Frank Heimes bug task added linux (Ubuntu Focal)
2022-05-16 16:45:58 Frank Heimes bug added subscriber Pedro Principeza
2022-05-16 16:47:04 Frank Heimes bug added subscriber Marcelo Cerri
2022-05-16 16:48:28 Frank Heimes description State the component where the Bug is occuring: kernel Indicate the nature of the problem by answering the below questions: - Is this problem reproducible? No No, steps unknown, but we have seen these before - Is the system sitting at a debugger (kdb, or xmon)? No - Is the system hung? No No, dumped and rebooted - Are there any custom patches installed? Yes On base system level (CloudAppliance) we are still running with the zfpc_proc module loaded. But no recent changes in the module and is running absolutely stable in HA (same kernel and userspace, Ubuntu 20.04 LTS) - Is there any special hardware that may be relevant to this problem? Yes We are running with mlx (cloud network adapters) installed. - Is access information for the machine the problem was found on available? Yes - Is the bug occuring in a userspace application? No - Was a stack trace produced? Yes This is what mention in first comment by @Boris Barth - Did the system produce an Oops message on the console? Yes [556585.270902] illegal operation: 0001 ilc:1 [#10] SMP [556585.270905] Modules linked in: vhost_net macvtap macvlan tap rpcsec_gss_krb5 auth_rpcgss nfsv3 nfs_acl nfs lockd grace fscache veth xt_statistic ipt_REJECT nf_reject_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_mark sunrpc nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_set ip_set_hash_net ip_set_hash_ip ip_set tcp_diag inet_diag xt_comment xt_nat cls_cgroup sch_htb act_gact sch_multiq act_mirred act_pedit act_tunnel_key cls_flower act_police cls_u32 vxlan ip6_udp_tunnel udp_tunnel dummy nf_tables ebtable_filter ebtables xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key sch_ingress mlx5_ib ib_uverbs ib_core mlx5_core tls mlxfw ptp pps_core dm_integrity async_xor async_tx dm_bufio bonding xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat br_netfilter bridge vhost_vsock vmw_vsock_virtio_transport_common vhost vsock 8021q garp mrp stp llc xt_multiport xt_tcpudp qeth_l2 lcs ctcm fsm dasd_fba_mod aufs overlay scsi_dh_rdac [556585.270923] scsi_dh_emc s390_trng xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter sch_fq_codel zFPC_proc(OE) zFPC_diag(OE) vfio_ap vfio_mdev drm vfio_iommu_type1 drm_panel_orientation_quirks i2c_core ip_tables x_tables scsi_dh_alua pkey zcrypt ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common chsc_sch qeth ccwgroup eadm_sch vfio_ccw mdev vfio btrfs libcrc32c crc32_vx_s390 xor zstd_compress raid6_pq dm_crypt virtio_blk dm_service_time dm_multipath zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod zlib_deflate [last unloaded: tls] [556585.270945] CPU: 28 PID: 217741 Comm: worker Kdump: loaded Tainted: G D OE 5.4.0-90-generic #101-Ubuntu [556585.270947] Hardware name: IBM 8562 GT2 A00 (LPAR) [556585.270948] Krnl PSW : 0704d00180000000 0000000000000002 (0x2) [556585.270951] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [556585.270953] Krnl GPRS: 0000000000000000 0000000000000000 000003e010ebbcf8 00000071c45e1ec0 [556585.270954] 0000000000000000 0000002816f7b18c 00000078dd36a4a0 000000713a62f718 [556585.270955] 0000000000000000 000003e010ebbcf8 0000000000000068 00000071c45e1ec0 [556585.270957] 0000006090a12200 0000000000000c40 000003ff80d6fb54 000003e010ebbbf0 [556585.270959] Krnl Code:#0000000000000000: 0000 illegal >0000000000000002: 0000 illegal 0000000000000004: 0000 illegal 0000000000000006: 0000 illegal 0000000000000008: 0000 illegal 000000000000000a: 0000 illegal 000000000000000c: 0000 illegal 000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 - Was a system dump produced ie kdump, netdumpmp, or LKCD? Yes That is the kdump where the stacktrace from. Enter data below to accurately describe the problem: - Problem description: Null Pointer issue in nfs code running Ubuntu Ubuntu 18.04 with HWE kernel 5.4 on IBM Z - Enter uname -a output: @lon1-qz1-sr4-rk101-s04> uname -a Linux lon1-qz1-sr4-rk101-s04 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 19:59:45 UTC 2021 s390x s390x s390x GNU/Linux - Enter failing machine type and model (ie p520 9111-520 lpar, x336 47U-8637): Manufacturer: IBM Type: 8562 Model: A00 GT2 Model Capacity: A00 00000000 Capacity Adj. Ind.: 100 LPAR CPUs Total: 16 LPAR CPUs Configured: 16 LPAR CPUs Standby: 0 LPAR CPUs Reserved: 0 LPAR CPUs Dedicated: 0 LPAR CPUs Shared: 16 LPAR CPUs G-MTID: 0 LPAR CPUs S-MTID: 1 LPAR CPUs PS-MTID: 1 - Enter primary and backup contact information (name/email): Prabhat Ranjan pranjank@in.ibm.com Christoph Schlameu? schlameuss@de.ibm.com - Detail the configuration of the additonal hardware - Enter common userspace tool name: N/A - Enter name of userspace RPM: N/A - If failing tool is obtained from project website vs RPM install, what is the version/release/mod. If from the project's CVS, what is the branch tag and date of checkout (put "na" if not applicable)? N/A - Is the failing userspace tool 32-bit, 64-bit, or both? N/A - Describe how unresponsive the system is. What steps have you taken to reclaim the system: kernel oops was detected and automatically dumped and restarted - Is a debugger configured (xmon or kdb enabled)? No - Enter Oops message from console: [556585.270902] illegal operation: 0001 ilc:1 [#10] SMP [556585.270905] Modules linked in: vhost_net macvtap macvlan tap rpcsec_gss_krb5 auth_rpcgss nfsv3 nfs_acl nfs lockd grace fscache veth xt_statistic ipt_REJECT nf_reject_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_mark sunrpc nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_set ip_set_hash_net ip_set_hash_ip ip_set tcp_diag inet_diag xt_comment xt_nat cls_cgroup sch_htb act_gact sch_multiq act_mirred act_pedit act_tunnel_key cls_flower act_police cls_u32 vxlan ip6_udp_tunnel udp_tunnel dummy nf_tables ebtable_filter ebtables xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key sch_ingress mlx5_ib ib_uverbs ib_core mlx5_core tls mlxfw ptp pps_core dm_integrity async_xor async_tx dm_bufio bonding xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat br_netfilter bridge vhost_vsock vmw_vsock_virtio_transport_common vhost vsock 8021q garp mrp stp llc xt_multiport xt_tcpudp qeth_l2 lcs ctcm fsm dasd_fba_mod aufs overlay scsi_dh_rdac [556585.270923] scsi_dh_emc s390_trng xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter sch_fq_codel zFPC_proc(OE) zFPC_diag(OE) vfio_ap vfio_mdev drm vfio_iommu_type1 drm_panel_orientation_quirks i2c_core ip_tables x_tables scsi_dh_alua pkey zcrypt ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common chsc_sch qeth ccwgroup eadm_sch vfio_ccw mdev vfio btrfs libcrc32c crc32_vx_s390 xor zstd_compress raid6_pq dm_crypt virtio_blk dm_service_time dm_multipath zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod zlib_deflate [last unloaded: tls] [556585.270945] CPU: 28 PID: 217741 Comm: worker Kdump: loaded Tainted: G D OE 5.4.0-90-generic #101-Ubuntu [556585.270947] Hardware name: IBM 8562 GT2 A00 (LPAR) [556585.270948] Krnl PSW : 0704d00180000000 0000000000000002 (0x2) [556585.270951] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [556585.270953] Krnl GPRS: 0000000000000000 0000000000000000 000003e010ebbcf8 00000071c45e1ec0 [556585.270954] 0000000000000000 0000002816f7b18c 00000078dd36a4a0 000000713a62f718 [556585.270955] 0000000000000000 000003e010ebbcf8 0000000000000068 00000071c45e1ec0 [556585.270957] 0000006090a12200 0000000000000c40 000003ff80d6fb54 000003e010ebbbf0 [556585.270959] Krnl Code:#0000000000000000: 0000 illegal >0000000000000002: 0000 illegal 0000000000000004: 0000 illegal 0000000000000006: 0000 illegal 0000000000000008: 0000 illegal 000000000000000a: 0000 illegal 000000000000000c: 0000 illegal 000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 - Detail the steps to reproduce this problem: unknown - Was the system configured to capture a system dump? Yes SRU Justification: ================== [Impact] * The kernel crashed under load with a null pointer issue in nfs code: [556585.270959] Krnl Code:#0000000000000000: 0000 illegal >0000000000000002: 0000 illegal 0000000000000004: 0000 illegal 0000000000000006: 0000 illegal 0000000000000008: 0000 illegal 000000000000000a: 0000 illegal 000000000000000c: 0000 illegal 000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 * Several dumps were generated and shared with Canonical. * Analysis (done by kernel and SEG) point to refcount leaks fixed, that are already fixed in the following commit/fix: [Fix] * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()" [Test Case] * There is unfortunately no reproducer or trigger available for this issue. * It just happens now and then under higher load. * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and ran for more than a week in a special staging environment (at IBM) without further crashes. * Hence the test and verification will be done by the IBM Z team. [Where problems could occur] * The inode handling can become broken, in case the changes on the pointers are erroneous. * Problems with the authentication and/or the credentials could occur due to the modifications in put_rpccred, rpc_cred and rpc_auth. * The expiration of the cached credentials could be harmed as well, due to the changes in nfs_ctx_key_to_expire. * The different pointer arithmetic may cause further issues - wrong or null pointer references. * Positive is that the original commit was brought upstream by nfs experts. * A patched test kernel sustained day long runs under load in a staging and test environment. * The author of the upstream commit/patch is well known in the NFS area. [Other] * The Salesforce Case Number 00334334 is associated with this bug. * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1. * But commit ca05cbae2a04 was unfortunately not tagged as stable, hence it was not picked automatically. * Since kinetic's (22.10) target kernel is 5.18, it will have the patch included, hence no dedicated PATCH request for kinetic. __________ State the component where the Bug is occurring:   kernel Indicate the nature of the problem by answering the below questions: - Is this problem reproducible? No No, steps unknown, but we have seen these before - Is the system sitting at a debugger (kdb, or xmon)? No - Is the system hung? No No, dumped and rebooted - Are there any custom patches installed? Yes On base system level (CloudAppliance) we are still running with the zfpc_proc module loaded. But no recent changes in the module and is running absolutely stable in HA (same kernel and userspace, Ubuntu 20.04 LTS) - Is there any special hardware that may be relevant to this problem? Yes We are running with mlx (cloud network adapters) installed. - Is access information for the machine the problem was found on available? Yes - Is the bug occuring in a userspace application? No - Was a stack trace produced? Yes This is what mention in first comment by @Boris Barth - Did the system produce an Oops message on the console? Yes [556585.270902] illegal operation: 0001 ilc:1 [#10] SMP [556585.270905] Modules linked in: vhost_net macvtap macvlan tap rpcsec_gss_krb5 auth_rpcgss nfsv3 nfs_acl nfs lockd grace fscache veth xt_statistic ipt_REJECT nf_reject_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_mark sunrpc nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_set ip_set_hash_net ip_set_hash_ip ip_set tcp_diag inet_diag xt_comment xt_nat cls_cgroup sch_htb act_gact sch_multiq act_mirred act_pedit act_tunnel_key cls_flower act_police cls_u32 vxlan ip6_udp_tunnel udp_tunnel dummy nf_tables ebtable_filter ebtables xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key sch_ingress mlx5_ib ib_uverbs ib_core mlx5_core tls mlxfw ptp pps_core dm_integrity async_xor async_tx dm_bufio bonding xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat br_netfilter bridge vhost_vsock vmw_vsock_virtio_transport_common vhost vsock 8021q garp mrp stp llc xt_multiport xt_tcpudp qeth_l2 lcs ctcm fsm dasd_fba_mod aufs overlay scsi_dh_rdac [556585.270923] scsi_dh_emc s390_trng xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter sch_fq_codel zFPC_proc(OE) zFPC_diag(OE) vfio_ap vfio_mdev drm vfio_iommu_type1 drm_panel_orientation_quirks i2c_core ip_tables x_tables scsi_dh_alua pkey zcrypt ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common chsc_sch qeth ccwgroup eadm_sch vfio_ccw mdev vfio btrfs libcrc32c crc32_vx_s390 xor zstd_compress raid6_pq dm_crypt virtio_blk dm_service_time dm_multipath zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod zlib_deflate [last unloaded: tls] [556585.270945] CPU: 28 PID: 217741 Comm: worker Kdump: loaded Tainted: G D OE 5.4.0-90-generic #101-Ubuntu [556585.270947] Hardware name: IBM 8562 GT2 A00 (LPAR) [556585.270948] Krnl PSW : 0704d00180000000 0000000000000002 (0x2) [556585.270951] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [556585.270953] Krnl GPRS: 0000000000000000 0000000000000000 000003e010ebbcf8 00000071c45e1ec0 [556585.270954] 0000000000000000 0000002816f7b18c 00000078dd36a4a0 000000713a62f718 [556585.270955] 0000000000000000 000003e010ebbcf8 0000000000000068 00000071c45e1ec0 [556585.270957] 0000006090a12200 0000000000000c40 000003ff80d6fb54 000003e010ebbbf0 [556585.270959] Krnl Code:#0000000000000000: 0000 illegal                           >0000000000000002: 0000 illegal                            0000000000000004: 0000 illegal                            0000000000000006: 0000 illegal                            0000000000000008: 0000 illegal                            000000000000000a: 0000 illegal                            000000000000000c: 0000 illegal                            000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 - Was a system dump produced ie kdump, netdumpmp, or LKCD? Yes That is the kdump where the stacktrace from. Enter data below to accurately describe the problem: - Problem description: Null Pointer issue in nfs code running Ubuntu Ubuntu 18.04 with HWE kernel 5.4 on IBM Z - Enter uname -a output: @lon1-qz1-sr4-rk101-s04> uname -a Linux lon1-qz1-sr4-rk101-s04 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 19:59:45 UTC 2021 s390x s390x s390x GNU/Linux - Enter failing machine type and model (ie p520 9111-520 lpar, x336 47U-8637): Manufacturer: IBM Type: 8562 Model: A00 GT2 Model Capacity: A00 00000000 Capacity Adj. Ind.: 100 LPAR CPUs Total: 16 LPAR CPUs Configured: 16 LPAR CPUs Standby: 0 LPAR CPUs Reserved: 0 LPAR CPUs Dedicated: 0 LPAR CPUs Shared: 16 LPAR CPUs G-MTID: 0 LPAR CPUs S-MTID: 1 LPAR CPUs PS-MTID: 1 - Enter primary and backup contact information (name/email): Prabhat Ranjan pranjank@in.ibm.com Christoph Schlameu? schlameuss@de.ibm.com - Detail the configuration of the additonal hardware - Enter common userspace tool name: N/A - Enter name of userspace RPM: N/A - If failing tool is obtained from project website vs RPM install, what is the version/release/mod.   If from the project's CVS, what is the branch tag and date of checkout (put "na" if not applicable)? N/A - Is the failing userspace tool 32-bit, 64-bit, or both? N/A - Describe how unresponsive the system is. What steps have you taken to reclaim the system: kernel oops was detected and automatically dumped and restarted - Is a debugger configured (xmon or kdb enabled)? No - Enter Oops message from console: [556585.270902] illegal operation: 0001 ilc:1 [#10] SMP [556585.270905] Modules linked in: vhost_net macvtap macvlan tap rpcsec_gss_krb5 auth_rpcgss nfsv3 nfs_acl nfs lockd grace fscache veth xt_statistic ipt_REJECT nf_reject_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_mark sunrpc nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_set ip_set_hash_net ip_set_hash_ip ip_set tcp_diag inet_diag xt_comment xt_nat cls_cgroup sch_htb act_gact sch_multiq act_mirred act_pedit act_tunnel_key cls_flower act_police cls_u32 vxlan ip6_udp_tunnel udp_tunnel dummy nf_tables ebtable_filter ebtables xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key sch_ingress mlx5_ib ib_uverbs ib_core mlx5_core tls mlxfw ptp pps_core dm_integrity async_xor async_tx dm_bufio bonding xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat br_netfilter bridge vhost_vsock vmw_vsock_virtio_transport_common vhost vsock 8021q garp mrp stp llc xt_multiport xt_tcpudp qeth_l2 lcs ctcm fsm dasd_fba_mod aufs overlay scsi_dh_rdac [556585.270923] scsi_dh_emc s390_trng xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter sch_fq_codel zFPC_proc(OE) zFPC_diag(OE) vfio_ap vfio_mdev drm vfio_iommu_type1 drm_panel_orientation_quirks i2c_core ip_tables x_tables scsi_dh_alua pkey zcrypt ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common chsc_sch qeth ccwgroup eadm_sch vfio_ccw mdev vfio btrfs libcrc32c crc32_vx_s390 xor zstd_compress raid6_pq dm_crypt virtio_blk dm_service_time dm_multipath zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod zlib_deflate [last unloaded: tls] [556585.270945] CPU: 28 PID: 217741 Comm: worker Kdump: loaded Tainted: G D OE 5.4.0-90-generic #101-Ubuntu [556585.270947] Hardware name: IBM 8562 GT2 A00 (LPAR) [556585.270948] Krnl PSW : 0704d00180000000 0000000000000002 (0x2) [556585.270951] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [556585.270953] Krnl GPRS: 0000000000000000 0000000000000000 000003e010ebbcf8 00000071c45e1ec0 [556585.270954] 0000000000000000 0000002816f7b18c 00000078dd36a4a0 000000713a62f718 [556585.270955] 0000000000000000 000003e010ebbcf8 0000000000000068 00000071c45e1ec0 [556585.270957] 0000006090a12200 0000000000000c40 000003ff80d6fb54 000003e010ebbbf0 [556585.270959] Krnl Code:#0000000000000000: 0000 illegal                           >0000000000000002: 0000 illegal                            0000000000000004: 0000 illegal                            0000000000000006: 0000 illegal                            0000000000000008: 0000 illegal                            000000000000000a: 0000 illegal                            000000000000000c: 0000 illegal                            000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 - Detail the steps to reproduce this problem: unknown - Was the system configured to capture a system dump? Yes
2022-05-17 05:52:10 Frank Heimes linux (Ubuntu Focal): status New In Progress
2022-05-17 05:52:15 Frank Heimes linux (Ubuntu Impish): status New In Progress
2022-05-17 05:52:17 Frank Heimes linux (Ubuntu Jammy): status New In Progress
2022-05-17 05:52:37 Frank Heimes linux (Ubuntu Focal): assignee Canonical Kernel Team (canonical-kernel-team)
2022-05-17 05:52:46 Frank Heimes linux (Ubuntu Impish): assignee Canonical Kernel Team (canonical-kernel-team)
2022-05-17 05:52:54 Frank Heimes linux (Ubuntu Jammy): assignee Canonical Kernel Team (canonical-kernel-team)
2022-05-17 06:46:53 Stefan Bader linux (Ubuntu Focal): importance Undecided Medium
2022-05-17 06:46:57 Stefan Bader linux (Ubuntu Impish): importance Undecided Medium
2022-05-17 06:47:02 Stefan Bader linux (Ubuntu Jammy): importance Undecided Medium
2022-05-17 12:11:45 Frank Heimes description SRU Justification: ================== [Impact] * The kernel crashed under load with a null pointer issue in nfs code: [556585.270959] Krnl Code:#0000000000000000: 0000 illegal >0000000000000002: 0000 illegal 0000000000000004: 0000 illegal 0000000000000006: 0000 illegal 0000000000000008: 0000 illegal 000000000000000a: 0000 illegal 000000000000000c: 0000 illegal 000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 * Several dumps were generated and shared with Canonical. * Analysis (done by kernel and SEG) point to refcount leaks fixed, that are already fixed in the following commit/fix: [Fix] * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()" [Test Case] * There is unfortunately no reproducer or trigger available for this issue. * It just happens now and then under higher load. * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and ran for more than a week in a special staging environment (at IBM) without further crashes. * Hence the test and verification will be done by the IBM Z team. [Where problems could occur] * The inode handling can become broken, in case the changes on the pointers are erroneous. * Problems with the authentication and/or the credentials could occur due to the modifications in put_rpccred, rpc_cred and rpc_auth. * The expiration of the cached credentials could be harmed as well, due to the changes in nfs_ctx_key_to_expire. * The different pointer arithmetic may cause further issues - wrong or null pointer references. * Positive is that the original commit was brought upstream by nfs experts. * A patched test kernel sustained day long runs under load in a staging and test environment. * The author of the upstream commit/patch is well known in the NFS area. [Other] * The Salesforce Case Number 00334334 is associated with this bug. * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1. * But commit ca05cbae2a04 was unfortunately not tagged as stable, hence it was not picked automatically. * Since kinetic's (22.10) target kernel is 5.18, it will have the patch included, hence no dedicated PATCH request for kinetic. __________ State the component where the Bug is occurring:   kernel Indicate the nature of the problem by answering the below questions: - Is this problem reproducible? No No, steps unknown, but we have seen these before - Is the system sitting at a debugger (kdb, or xmon)? No - Is the system hung? No No, dumped and rebooted - Are there any custom patches installed? Yes On base system level (CloudAppliance) we are still running with the zfpc_proc module loaded. But no recent changes in the module and is running absolutely stable in HA (same kernel and userspace, Ubuntu 20.04 LTS) - Is there any special hardware that may be relevant to this problem? Yes We are running with mlx (cloud network adapters) installed. - Is access information for the machine the problem was found on available? Yes - Is the bug occuring in a userspace application? No - Was a stack trace produced? Yes This is what mention in first comment by @Boris Barth - Did the system produce an Oops message on the console? Yes [556585.270902] illegal operation: 0001 ilc:1 [#10] SMP [556585.270905] Modules linked in: vhost_net macvtap macvlan tap rpcsec_gss_krb5 auth_rpcgss nfsv3 nfs_acl nfs lockd grace fscache veth xt_statistic ipt_REJECT nf_reject_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_mark sunrpc nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_set ip_set_hash_net ip_set_hash_ip ip_set tcp_diag inet_diag xt_comment xt_nat cls_cgroup sch_htb act_gact sch_multiq act_mirred act_pedit act_tunnel_key cls_flower act_police cls_u32 vxlan ip6_udp_tunnel udp_tunnel dummy nf_tables ebtable_filter ebtables xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key sch_ingress mlx5_ib ib_uverbs ib_core mlx5_core tls mlxfw ptp pps_core dm_integrity async_xor async_tx dm_bufio bonding xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat br_netfilter bridge vhost_vsock vmw_vsock_virtio_transport_common vhost vsock 8021q garp mrp stp llc xt_multiport xt_tcpudp qeth_l2 lcs ctcm fsm dasd_fba_mod aufs overlay scsi_dh_rdac [556585.270923] scsi_dh_emc s390_trng xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter sch_fq_codel zFPC_proc(OE) zFPC_diag(OE) vfio_ap vfio_mdev drm vfio_iommu_type1 drm_panel_orientation_quirks i2c_core ip_tables x_tables scsi_dh_alua pkey zcrypt ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common chsc_sch qeth ccwgroup eadm_sch vfio_ccw mdev vfio btrfs libcrc32c crc32_vx_s390 xor zstd_compress raid6_pq dm_crypt virtio_blk dm_service_time dm_multipath zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod zlib_deflate [last unloaded: tls] [556585.270945] CPU: 28 PID: 217741 Comm: worker Kdump: loaded Tainted: G D OE 5.4.0-90-generic #101-Ubuntu [556585.270947] Hardware name: IBM 8562 GT2 A00 (LPAR) [556585.270948] Krnl PSW : 0704d00180000000 0000000000000002 (0x2) [556585.270951] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [556585.270953] Krnl GPRS: 0000000000000000 0000000000000000 000003e010ebbcf8 00000071c45e1ec0 [556585.270954] 0000000000000000 0000002816f7b18c 00000078dd36a4a0 000000713a62f718 [556585.270955] 0000000000000000 000003e010ebbcf8 0000000000000068 00000071c45e1ec0 [556585.270957] 0000006090a12200 0000000000000c40 000003ff80d6fb54 000003e010ebbbf0 [556585.270959] Krnl Code:#0000000000000000: 0000 illegal                           >0000000000000002: 0000 illegal                            0000000000000004: 0000 illegal                            0000000000000006: 0000 illegal                            0000000000000008: 0000 illegal                            000000000000000a: 0000 illegal                            000000000000000c: 0000 illegal                            000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 - Was a system dump produced ie kdump, netdumpmp, or LKCD? Yes That is the kdump where the stacktrace from. Enter data below to accurately describe the problem: - Problem description: Null Pointer issue in nfs code running Ubuntu Ubuntu 18.04 with HWE kernel 5.4 on IBM Z - Enter uname -a output: @lon1-qz1-sr4-rk101-s04> uname -a Linux lon1-qz1-sr4-rk101-s04 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 19:59:45 UTC 2021 s390x s390x s390x GNU/Linux - Enter failing machine type and model (ie p520 9111-520 lpar, x336 47U-8637): Manufacturer: IBM Type: 8562 Model: A00 GT2 Model Capacity: A00 00000000 Capacity Adj. Ind.: 100 LPAR CPUs Total: 16 LPAR CPUs Configured: 16 LPAR CPUs Standby: 0 LPAR CPUs Reserved: 0 LPAR CPUs Dedicated: 0 LPAR CPUs Shared: 16 LPAR CPUs G-MTID: 0 LPAR CPUs S-MTID: 1 LPAR CPUs PS-MTID: 1 - Enter primary and backup contact information (name/email): Prabhat Ranjan pranjank@in.ibm.com Christoph Schlameu? schlameuss@de.ibm.com - Detail the configuration of the additonal hardware - Enter common userspace tool name: N/A - Enter name of userspace RPM: N/A - If failing tool is obtained from project website vs RPM install, what is the version/release/mod.   If from the project's CVS, what is the branch tag and date of checkout (put "na" if not applicable)? N/A - Is the failing userspace tool 32-bit, 64-bit, or both? N/A - Describe how unresponsive the system is. What steps have you taken to reclaim the system: kernel oops was detected and automatically dumped and restarted - Is a debugger configured (xmon or kdb enabled)? No - Enter Oops message from console: [556585.270902] illegal operation: 0001 ilc:1 [#10] SMP [556585.270905] Modules linked in: vhost_net macvtap macvlan tap rpcsec_gss_krb5 auth_rpcgss nfsv3 nfs_acl nfs lockd grace fscache veth xt_statistic ipt_REJECT nf_reject_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_mark sunrpc nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_set ip_set_hash_net ip_set_hash_ip ip_set tcp_diag inet_diag xt_comment xt_nat cls_cgroup sch_htb act_gact sch_multiq act_mirred act_pedit act_tunnel_key cls_flower act_police cls_u32 vxlan ip6_udp_tunnel udp_tunnel dummy nf_tables ebtable_filter ebtables xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key sch_ingress mlx5_ib ib_uverbs ib_core mlx5_core tls mlxfw ptp pps_core dm_integrity async_xor async_tx dm_bufio bonding xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat br_netfilter bridge vhost_vsock vmw_vsock_virtio_transport_common vhost vsock 8021q garp mrp stp llc xt_multiport xt_tcpudp qeth_l2 lcs ctcm fsm dasd_fba_mod aufs overlay scsi_dh_rdac [556585.270923] scsi_dh_emc s390_trng xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter sch_fq_codel zFPC_proc(OE) zFPC_diag(OE) vfio_ap vfio_mdev drm vfio_iommu_type1 drm_panel_orientation_quirks i2c_core ip_tables x_tables scsi_dh_alua pkey zcrypt ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common chsc_sch qeth ccwgroup eadm_sch vfio_ccw mdev vfio btrfs libcrc32c crc32_vx_s390 xor zstd_compress raid6_pq dm_crypt virtio_blk dm_service_time dm_multipath zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod zlib_deflate [last unloaded: tls] [556585.270945] CPU: 28 PID: 217741 Comm: worker Kdump: loaded Tainted: G D OE 5.4.0-90-generic #101-Ubuntu [556585.270947] Hardware name: IBM 8562 GT2 A00 (LPAR) [556585.270948] Krnl PSW : 0704d00180000000 0000000000000002 (0x2) [556585.270951] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [556585.270953] Krnl GPRS: 0000000000000000 0000000000000000 000003e010ebbcf8 00000071c45e1ec0 [556585.270954] 0000000000000000 0000002816f7b18c 00000078dd36a4a0 000000713a62f718 [556585.270955] 0000000000000000 000003e010ebbcf8 0000000000000068 00000071c45e1ec0 [556585.270957] 0000006090a12200 0000000000000c40 000003ff80d6fb54 000003e010ebbbf0 [556585.270959] Krnl Code:#0000000000000000: 0000 illegal                           >0000000000000002: 0000 illegal                            0000000000000004: 0000 illegal                            0000000000000006: 0000 illegal                            0000000000000008: 0000 illegal                            000000000000000a: 0000 illegal                            000000000000000c: 0000 illegal                            000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 - Detail the steps to reproduce this problem: unknown - Was the system configured to capture a system dump? Yes SRU Justification: ================== [Impact] * The kernel crashed under load with a null pointer issue in nfs code:     [556585.270959] Krnl Code:#0000000000000000: 0000 illegal                               >0000000000000002: 0000 illegal                                0000000000000004: 0000 illegal                                0000000000000006: 0000 illegal                                0000000000000008: 0000 illegal                                000000000000000a: 0000 illegal                                000000000000000c: 0000 illegal                                000000000000000e: 0000 illegal     [556585.270967] Call Trace:     [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc])     [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs]     [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs]     [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs]     [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0     [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0     [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0     [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 * Several dumps were generated and shared with Canonical. * Analysis (done by kernel and SEG) point to refcount leaks fixed,   that are already fixed in the following commit/fix: [Fix] * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()" [Test Case] * There is unfortunately no reproducer or trigger available for this issue. * It just happens now and then under higher load. * Patched test kernels (focal 5.4 and bionic 5.4-hwe) were created and   ran for more than a week in a special staging environment (at IBM)   without further crashes. * Hence the test and verification will be done by the IBM Z team. [Where problems could occur] * The inode handling can become broken, in case the changes   on the pointers are erroneous. * Problems with the authentication and/or the credentials could occur   due to the modifications in put_rpccred, rpc_cred and rpc_auth. * The expiration of the cached credentials could be harmed as well,   due to the changes in nfs_ctx_key_to_expire. * The different pointer arithmetic may cause further issues - wrong   or null pointer references. * Positive is that the original commit was brought upstream by nfs experts. * A patched test kernel sustained day long runs under load in a staging   and test environment. * The author of the upstream commit/patch is well known in the NFS area. [Other] * The Salesforce Case Number 00334334 is associated with this bug. * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1. * But commit ca05cbae2a04 was unfortunately not tagged as stable,   hence it was not picked automatically. * Since kinetic's (22.10) target kernel is 5.18,   it will have the patch included,   hence no dedicated PATCH request for kinetic. __________ State the component where the Bug is occurring:   kernel Indicate the nature of the problem by answering the below questions: - Is this problem reproducible? No No, steps unknown, but we have seen these before - Is the system sitting at a debugger (kdb, or xmon)? No - Is the system hung? No No, dumped and rebooted - Are there any custom patches installed? Yes On base system level (CloudAppliance) we are still running with the zfpc_proc module loaded. But no recent changes in the module and is running absolutely stable in HA (same kernel and userspace, Ubuntu 20.04 LTS) - Is there any special hardware that may be relevant to this problem? Yes We are running with mlx (cloud network adapters) installed. - Is access information for the machine the problem was found on available? Yes - Is the bug occuring in a userspace application? No - Was a stack trace produced? Yes This is what mention in first comment by @Boris Barth - Did the system produce an Oops message on the console? Yes [556585.270902] illegal operation: 0001 ilc:1 [#10] SMP [556585.270905] Modules linked in: vhost_net macvtap macvlan tap rpcsec_gss_krb5 auth_rpcgss nfsv3 nfs_acl nfs lockd grace fscache veth xt_statistic ipt_REJECT nf_reject_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_mark sunrpc nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_set ip_set_hash_net ip_set_hash_ip ip_set tcp_diag inet_diag xt_comment xt_nat cls_cgroup sch_htb act_gact sch_multiq act_mirred act_pedit act_tunnel_key cls_flower act_police cls_u32 vxlan ip6_udp_tunnel udp_tunnel dummy nf_tables ebtable_filter ebtables xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key sch_ingress mlx5_ib ib_uverbs ib_core mlx5_core tls mlxfw ptp pps_core dm_integrity async_xor async_tx dm_bufio bonding xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat br_netfilter bridge vhost_vsock vmw_vsock_virtio_transport_common vhost vsock 8021q garp mrp stp llc xt_multiport xt_tcpudp qeth_l2 lcs ctcm fsm dasd_fba_mod aufs overlay scsi_dh_rdac [556585.270923] scsi_dh_emc s390_trng xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter sch_fq_codel zFPC_proc(OE) zFPC_diag(OE) vfio_ap vfio_mdev drm vfio_iommu_type1 drm_panel_orientation_quirks i2c_core ip_tables x_tables scsi_dh_alua pkey zcrypt ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common chsc_sch qeth ccwgroup eadm_sch vfio_ccw mdev vfio btrfs libcrc32c crc32_vx_s390 xor zstd_compress raid6_pq dm_crypt virtio_blk dm_service_time dm_multipath zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod zlib_deflate [last unloaded: tls] [556585.270945] CPU: 28 PID: 217741 Comm: worker Kdump: loaded Tainted: G D OE 5.4.0-90-generic #101-Ubuntu [556585.270947] Hardware name: IBM 8562 GT2 A00 (LPAR) [556585.270948] Krnl PSW : 0704d00180000000 0000000000000002 (0x2) [556585.270951] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [556585.270953] Krnl GPRS: 0000000000000000 0000000000000000 000003e010ebbcf8 00000071c45e1ec0 [556585.270954] 0000000000000000 0000002816f7b18c 00000078dd36a4a0 000000713a62f718 [556585.270955] 0000000000000000 000003e010ebbcf8 0000000000000068 00000071c45e1ec0 [556585.270957] 0000006090a12200 0000000000000c40 000003ff80d6fb54 000003e010ebbbf0 [556585.270959] Krnl Code:#0000000000000000: 0000 illegal                           >0000000000000002: 0000 illegal                            0000000000000004: 0000 illegal                            0000000000000006: 0000 illegal                            0000000000000008: 0000 illegal                            000000000000000a: 0000 illegal                            000000000000000c: 0000 illegal                            000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 - Was a system dump produced ie kdump, netdumpmp, or LKCD? Yes That is the kdump where the stacktrace from. Enter data below to accurately describe the problem: - Problem description: Null Pointer issue in nfs code running Ubuntu Ubuntu 18.04 with HWE kernel 5.4 on IBM Z - Enter uname -a output: @lon1-qz1-sr4-rk101-s04> uname -a Linux lon1-qz1-sr4-rk101-s04 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 19:59:45 UTC 2021 s390x s390x s390x GNU/Linux - Enter failing machine type and model (ie p520 9111-520 lpar, x336 47U-8637): Manufacturer: IBM Type: 8562 Model: A00 GT2 Model Capacity: A00 00000000 Capacity Adj. Ind.: 100 LPAR CPUs Total: 16 LPAR CPUs Configured: 16 LPAR CPUs Standby: 0 LPAR CPUs Reserved: 0 LPAR CPUs Dedicated: 0 LPAR CPUs Shared: 16 LPAR CPUs G-MTID: 0 LPAR CPUs S-MTID: 1 LPAR CPUs PS-MTID: 1 - Enter primary and backup contact information (name/email): Prabhat Ranjan pranjank@in.ibm.com Christoph Schlameu? schlameuss@de.ibm.com - Detail the configuration of the additonal hardware - Enter common userspace tool name: N/A - Enter name of userspace RPM: N/A - If failing tool is obtained from project website vs RPM install, what is the version/release/mod.   If from the project's CVS, what is the branch tag and date of checkout (put "na" if not applicable)? N/A - Is the failing userspace tool 32-bit, 64-bit, or both? N/A - Describe how unresponsive the system is. What steps have you taken to reclaim the system: kernel oops was detected and automatically dumped and restarted - Is a debugger configured (xmon or kdb enabled)? No - Enter Oops message from console: [556585.270902] illegal operation: 0001 ilc:1 [#10] SMP [556585.270905] Modules linked in: vhost_net macvtap macvlan tap rpcsec_gss_krb5 auth_rpcgss nfsv3 nfs_acl nfs lockd grace fscache veth xt_statistic ipt_REJECT nf_reject_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_mark sunrpc nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_set ip_set_hash_net ip_set_hash_ip ip_set tcp_diag inet_diag xt_comment xt_nat cls_cgroup sch_htb act_gact sch_multiq act_mirred act_pedit act_tunnel_key cls_flower act_police cls_u32 vxlan ip6_udp_tunnel udp_tunnel dummy nf_tables ebtable_filter ebtables xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key sch_ingress mlx5_ib ib_uverbs ib_core mlx5_core tls mlxfw ptp pps_core dm_integrity async_xor async_tx dm_bufio bonding xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat br_netfilter bridge vhost_vsock vmw_vsock_virtio_transport_common vhost vsock 8021q garp mrp stp llc xt_multiport xt_tcpudp qeth_l2 lcs ctcm fsm dasd_fba_mod aufs overlay scsi_dh_rdac [556585.270923] scsi_dh_emc s390_trng xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter sch_fq_codel zFPC_proc(OE) zFPC_diag(OE) vfio_ap vfio_mdev drm vfio_iommu_type1 drm_panel_orientation_quirks i2c_core ip_tables x_tables scsi_dh_alua pkey zcrypt ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common chsc_sch qeth ccwgroup eadm_sch vfio_ccw mdev vfio btrfs libcrc32c crc32_vx_s390 xor zstd_compress raid6_pq dm_crypt virtio_blk dm_service_time dm_multipath zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod zlib_deflate [last unloaded: tls] [556585.270945] CPU: 28 PID: 217741 Comm: worker Kdump: loaded Tainted: G D OE 5.4.0-90-generic #101-Ubuntu [556585.270947] Hardware name: IBM 8562 GT2 A00 (LPAR) [556585.270948] Krnl PSW : 0704d00180000000 0000000000000002 (0x2) [556585.270951] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [556585.270953] Krnl GPRS: 0000000000000000 0000000000000000 000003e010ebbcf8 00000071c45e1ec0 [556585.270954] 0000000000000000 0000002816f7b18c 00000078dd36a4a0 000000713a62f718 [556585.270955] 0000000000000000 000003e010ebbcf8 0000000000000068 00000071c45e1ec0 [556585.270957] 0000006090a12200 0000000000000c40 000003ff80d6fb54 000003e010ebbbf0 [556585.270959] Krnl Code:#0000000000000000: 0000 illegal                           >0000000000000002: 0000 illegal                            0000000000000004: 0000 illegal                            0000000000000006: 0000 illegal                            0000000000000008: 0000 illegal                            000000000000000a: 0000 illegal                            000000000000000c: 0000 illegal                            000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 - Detail the steps to reproduce this problem: unknown - Was the system configured to capture a system dump? Yes
2022-05-27 08:36:10 Kleber Sacilotto de Souza linux (Ubuntu Focal): status In Progress Fix Committed
2022-05-27 08:36:12 Kleber Sacilotto de Souza linux (Ubuntu Impish): status In Progress Fix Committed
2022-05-27 08:36:14 Kleber Sacilotto de Souza linux (Ubuntu Jammy): status In Progress Fix Committed
2022-05-27 08:40:33 Frank Heimes ubuntu-z-systems: status In Progress Fix Committed
2022-06-03 10:41:53 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-197384 severity-high targetmilestone-inin--- architecture-s39064 bugnameltc-197384 severity-high targetmilestone-inin--- verification-needed-jammy
2022-06-16 22:16:33 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-197384 severity-high targetmilestone-inin--- verification-needed-jammy architecture-s39064 bugnameltc-197384 severity-high targetmilestone-inin--- verification-needed-focal verification-needed-jammy
2022-06-16 22:42:45 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-197384 severity-high targetmilestone-inin--- verification-needed-focal verification-needed-jammy architecture-s39064 bugnameltc-197384 severity-high targetmilestone-inin--- verification-needed-focal verification-needed-impish verification-needed-jammy
2022-06-20 17:42:48 Frank Heimes tags architecture-s39064 bugnameltc-197384 severity-high targetmilestone-inin--- verification-needed-focal verification-needed-impish verification-needed-jammy architecture-s39064 bugnameltc-197384 severity-high targetmilestone-inin--- verification-done-focal verification-done-impish verification-done-jammy
2022-06-22 15:04:22 Launchpad Janitor linux (Ubuntu Focal): status Fix Committed Fix Released
2022-06-22 15:04:22 Launchpad Janitor cve linked 2022-28388
2022-06-22 15:04:36 Launchpad Janitor linux (Ubuntu Impish): status Fix Committed Fix Released
2022-06-22 15:04:51 Launchpad Janitor linux (Ubuntu Jammy): status Fix Committed Fix Released
2022-06-22 15:16:18 Frank Heimes ubuntu-z-systems: status Fix Committed Fix Released
2022-07-07 00:09:41 bugproxy tags architecture-s39064 bugnameltc-197384 severity-high targetmilestone-inin--- verification-done-focal verification-done-impish verification-done-jammy architecture-s39064 bugnameltc-197384 severity-high targetmilestone-inin2004 verification-done-focal verification-done-impish verification-done-jammy
2022-07-07 05:17:36 Frank Heimes linux (Ubuntu): status New Invalid