invalid opcode xdr_buf_read_netobj on nfs4+krb5i directory

Bug #1858832 reported by Michael
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Disco
Won't Fix
Medium
Po-Hsu Lin

Bug Description

== SRU Justification ==
The xdr_shrink_pagelen() added in commit 5f1bc39 (SUNRPC: Fix buffer
handling of GSS MIC without slack), which applied in the Disco tree via
stable update process, sometimes will raise the following kernel trace
when the bytes to remove from buf->pages is larger than buf->page_len:

[ 49.420081] ------------[ cut here ]------------
[ 49.420084] kernel BUG at /build/linux-hwe-FLYqTt/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
[ 49.420092] invalid opcode: 0000 [#1] SMP NOPTI
[ 49.420095] CPU: 16 PID: 469 Comm: kworker/u64:13 Tainted: P OE 5.0.0-37-generic #40~18.04.1-Ubuntu
[ 49.420096] Hardware name: System manufacturer System Product Name/ROG CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
[ 49.420109] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ 49.420123] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ 49.420124] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ 49.420126] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
[ 49.420128] RAX: 000000000000000c RBX: 000000000000006c RCX: 000000000000001c
[ 49.420129] RDX: 000000000000005c RSI: 0000000000000010 RDI: ffff8e1a87c56e50
[ 49.420130] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09: 0000000000000000
[ 49.420131] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12: ffff8e1a87c56e50
[ 49.420132] R13: ffffb93787be7c00 R14: 0000000000000058 R15: ffffffffc228e8c0
[ 49.420134] FS: 0000000000000000(0000) GS:ffff8e1b1ea00000(0000) knlGS:0000000000000000
[ 49.420135] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 49.420136] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4: 0000000000340ee0
[ 49.420137] Call Trace:
[ 49.420150] xdr_buf_read_netobj+0x122/0x180 [sunrpc]
[ 49.420154] ? kzfree+0x2d/0x40
[ 49.420158] ? crypto_destroy_tfm+0x73/0xb0
[ 49.420162] gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ 49.420164] ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ 49.420167] gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ 49.420170] ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ 49.420172] ? gss_validate+0x242/0x300 [auth_rpcgss]
[ 49.420184] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ 49.420194] rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
[ 49.420204] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ 49.420213] call_decode+0x1c4/0x880 [sunrpc]
[ 49.420216] ? __switch_to_asm+0x35/0x70
[ 49.420224] ? rpc_check_timeout+0x130/0x130 [sunrpc]
[ 49.420233] __rpc_execute+0x7a/0x3f0 [sunrpc]
[ 49.420242] rpc_async_schedule+0x12/0x20 [sunrpc]
[ 49.420245] process_one_work+0x1fd/0x400
[ 49.420247] worker_thread+0x34/0x410
[ 49.420249] kthread+0x121/0x140
[ 49.420250] ? process_one_work+0x400/0x400
[ 49.420252] ? kthread_park+0xb0/0xb0
[ 49.420254] ret_from_fork+0x22/0x40

== Fixes ==
* e8d70b32 (SUNRPC: Fix another issue with MIC buffer space)
Instead of calling BUG_ON, this patch will just cap the number of bytes
that xdr_shrink_pagelen() will move.

Only Disco kernel needs this patch, for Bionic and earlier they don't
have 5f1bc39, and this fix has been applied to Eoan and onward.

== Test ==
Test kernel can be found here:
https://people.canonical.com/~phlin/kernel/lp-1858832-sunrpc-bufferhandling/

And it's been stress-tested by the bug reporter, Michael, this issue
can no longer be reproduced.

== Regression Potential ==
Low. It's just changing the length of bytes to shrink, change limited
to a single driver with positive test result.

== Original Bug Report ==
RELEASE=19.3
CODENAME=tricia
EDITION="Cinnamon"
DESCRIPTION="Linux Mint 19.3 Tricia"
DESKTOP=Gnome
TOOLKIT=GTK
NEW_FEATURES_URL=https://www.linuxmint.com/rel_tricia_cinnamon_whatsnew.php
RELEASE_NOTES_URL=https://www.linuxmint.com/rel_tricia_cinnamon.php
USER_GUIDE_URL=https://www.linuxmint.com/documentation.php
GRUB_TITLE=Linux Mint 19.3 Cinnamon

My home dir is mounted through nfs on a local server via nfs4 and krb5i.
When stressing the mounted directory or its sub-directories (sometimes starting firefox, sometimes starting thunderbird, nearly guaranteed when compiling, sometimes the login itself), it will eventually lead to the following stack-trace. The corresponding process is then stuck and
accessing the mounted directory (like calling ls) easily yields further and similar stack trace and causing the process to also stuck.

Currently I am running an AMD 3950x on a ASUS Crosshair VII Hero Wifi (chipset x470), but I had the same issues with an Intel 6700K on a ASUS Crosshair VIII Hero in fall of 2019. I couldn't be bother back then to report the bug so I just kept running a working kernel (~5.0.0-15 I think) without updating it. After Christmas I updated said Intel machine with the AMD machine, re-installed Linux Mint, installed all updates and therefore ran into this issue again.

[ 49.420081] ------------[ cut here ]------------
[ 49.420084] kernel BUG at /build/linux-hwe-FLYqTt/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
[ 49.420092] invalid opcode: 0000 [#1] SMP NOPTI
[ 49.420095] CPU: 16 PID: 469 Comm: kworker/u64:13 Tainted: P OE 5.0.0-37-generic #40~18.04.1-Ubuntu
[ 49.420096] Hardware name: System manufacturer System Product Name/ROG CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
[ 49.420109] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ 49.420123] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ 49.420124] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ 49.420126] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
[ 49.420128] RAX: 000000000000000c RBX: 000000000000006c RCX: 000000000000001c
[ 49.420129] RDX: 000000000000005c RSI: 0000000000000010 RDI: ffff8e1a87c56e50
[ 49.420130] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09: 0000000000000000
[ 49.420131] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12: ffff8e1a87c56e50
[ 49.420132] R13: ffffb93787be7c00 R14: 0000000000000058 R15: ffffffffc228e8c0
[ 49.420134] FS: 0000000000000000(0000) GS:ffff8e1b1ea00000(0000) knlGS:0000000000000000
[ 49.420135] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 49.420136] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4: 0000000000340ee0
[ 49.420137] Call Trace:
[ 49.420150] xdr_buf_read_netobj+0x122/0x180 [sunrpc]
[ 49.420154] ? kzfree+0x2d/0x40
[ 49.420158] ? crypto_destroy_tfm+0x73/0xb0
[ 49.420162] gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ 49.420164] ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ 49.420167] gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ 49.420170] ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ 49.420172] ? gss_validate+0x242/0x300 [auth_rpcgss]
[ 49.420184] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ 49.420194] rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
[ 49.420204] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ 49.420213] call_decode+0x1c4/0x880 [sunrpc]
[ 49.420216] ? __switch_to_asm+0x35/0x70
[ 49.420224] ? rpc_check_timeout+0x130/0x130 [sunrpc]
[ 49.420233] __rpc_execute+0x7a/0x3f0 [sunrpc]
[ 49.420242] rpc_async_schedule+0x12/0x20 [sunrpc]
[ 49.420245] process_one_work+0x1fd/0x400
[ 49.420247] worker_thread+0x34/0x410
[ 49.420249] kthread+0x121/0x140
[ 49.420250] ? process_one_work+0x400/0x400
[ 49.420252] ? kthread_park+0xb0/0xb0
[ 49.420254] ret_from_fork+0x22/0x40
[ 49.420255] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache edac_mce_amd snd_hda_codec_hdmi joydev kvm hid_roccat_koneplus hid_roccat irqbypass hid_roccat_common nvidia_uvm(OE) nvidia_drm(POE) nvidia_modeset(POE) snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_ca0132 snd_hda_intel snd_usb_audio snd_hda_codec snd_usbmidi_lib snd_hda_core crct10dif_pclmul snd_hwdep crc32_pclmul snd_seq_midi snd_pcm nvidia(POE) ghash_clmulni_intel snd_seq_midi_event eeepc_wmi aesni_intel snd_rawmidi asus_wmi sparse_keymap aes_x86_64 crypto_simd cryptd video glue_helper snd_seq drm_kms_helper snd_seq_device mxm_wmi wmi_bmof input_leds drm snd_timer ipmi_devintf snd serio_raw ccp ipmi_msghandler fb_sys_fops syscopyarea sysfillrect sysimgblt soundcore k10temp mac_hid sch_fq_codel asus_wmi_sensors(OE) parport_pc sunrpc ppdev lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c dm_mirror dm_region_hash dm_log hid_plantronics
[ 49.420282] hid_generic usbhid hid igb i2c_piix4 nvme dca ahci i2c_algo_bit nvme_core libahci gpio_amdpt wmi gpio_generic
[ 49.420293] ---[ end trace 75bda976d7f1c02d ]---
[ 49.420305] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ 49.420306] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ 49.420307] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
[ 49.420309] RAX: 000000000000000c RBX: 000000000000006c RCX: 000000000000001c
[ 49.420310] RDX: 000000000000005c RSI: 0000000000000010 RDI: ffff8e1a87c56e50
[ 49.420311] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09: 0000000000000000
[ 49.420312] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12: ffff8e1a87c56e50
[ 49.420312] R13: ffffb93787be7c00 R14: 0000000000000058 R15: ffffffffc228e8c0
[ 49.420314] FS: 0000000000000000(0000) GS:ffff8e1b1ea00000(0000) knlGS:0000000000000000
[ 49.420315] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 49.420316] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4: 0000000000340ee0

.

[Jan 1 03:45] ------------[ cut here ]------------
[ +0,000002] kernel BUG at /build/linux-hwe-W9CF8Q/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
[ +0,000006] invalid opcode: 0000 [#1] SMP NOPTI
[ +0,000002] CPU: 4 PID: 28219 Comm: kworker/u64:2 Tainted: P OE 5.0.0-35-generic #38~18.04.1-Ubuntu
[ +0,000001] Hardware name: System manufacturer System Product Name/ROG CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
[ +0,000011] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ +0,000010] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ +0,000001] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ +0,000001] RSP: 0018:ffffa2dd18117b28 EFLAGS: 00010297
[ +0,000001] RAX: 0000000000000010 RBX: 0000000000000070 RCX: 000000000000001c
[ +0,000001] RDX: 000000000000005c RSI: 0000000000000014 RDI: ffff8b96c0856650
[ +0,000001] RBP: ffffa2dd18117b40 R08: ffff8b97d1f82e00 R09: 0000000000000000
[ +0,000000] R10: 1d1cc51b00000000 R11: ffff8b97cf00e520 R12: ffff8b96c0856650
[ +0,000001] R13: ffffa2dd18117bf0 R14: 0000000000000058 R15: ffffffffc0eb8920
[ +0,000001] FS: 0000000000000000(0000) GS:ffff8b97de700000(0000) knlGS:0000000000000000
[ +0,000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0,000001] CR2: 0000191e985bac88 CR3: 0000000fd656c000 CR4: 0000000000340ee0
[ +0,000001] Call Trace:
[ +0,000009] xdr_buf_read_netobj+0x122/0x180 [sunrpc]
[ +0,000003] ? kzfree+0x2d/0x40
[ +0,000002] ? crypto_destroy_tfm+0x73/0xb0
[ +0,000003] gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ +0,000002] ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ +0,000002] gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ +0,000002] ? kmem_cache_alloc_trace+0x42/0x1c0
[ +0,000002] ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ +0,000002] ? gss_validate+0x242/0x300 [auth_rpcgss]
[ +0,000008] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ +0,000008] rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
[ +0,000007] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ +0,000007] call_decode+0x166/0x8b0 [sunrpc]
[ +0,000002] ? __switch_to_asm+0x41/0x70
[ +0,000006] ? call_refreshresult+0x130/0x130 [sunrpc]
[ +0,000006] __rpc_execute+0x7a/0x3f0 [sunrpc]
[ +0,000007] rpc_async_schedule+0x12/0x20 [sunrpc]
[ +0,000002] process_one_work+0x1fd/0x400
[ +0,000002] worker_thread+0x34/0x410
[ +0,000001] kthread+0x121/0x140
[ +0,000001] ? process_one_work+0x400/0x400
[ +0,000002] ? kthread_park+0xb0/0xb0
[ +0,000001] ret_from_fork+0x22/0x40
[ +0,000001] Modules linked in: nls_utf8 udf crc_itu_t rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache edac_mce_amd snd_hda_codec_hdmi kvm irqbypass joydev crct10dif_pclmul nvidia_uvm(OE) crc32_pclmul hid_roccat_koneplus nvidia_drm(POE) hid_roccat ghash_clmulni_intel hid_roccat_common nvidia_modeset(POE) nvidia(POE) snd_usb_audio snd_hda_codec_realtek
 snd_usbmidi_lib snd_hda_codec_generic ledtrig_audio snd_hda_codec_ca0132 aesni_intel input_leds snd_hda_intel eeepc_wmi snd_hda_codec asus_wmi aes_x86_64 drm_kms_helper crypto_simd snd_hda_core snd_seq_midi cryptd sparse_keymap snd_hwdep snd_seq_midi_event video glue_helper wmi_bmof mxm_wmi serio_raw drm snd_rawmidi snd_pcm ipmi_devintf ipmi_msghandler snd_seq
fb_sys_fops syscopyarea sysfillrect snd_seq_device sysimgblt snd_timer k10temp ccp snd soundcore mac_hid sch_fq_codel asus_wmi_sensors(OE) parport_pc ppdev sunrpc lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c dm_mirror dm_region_hash dm_log
[ +0,000019] hid_plantronics hid_generic usbhid hid igb i2c_piix4 dca i2c_algo_bit ahci nvme libahci nvme_core wmi gpio_amdpt gpio_generic
[ +0,000008] ---[ end trace 4314523bc923f697 ]---
[ +0,000007] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ +0,000001] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ +0,000001] RSP: 0018:ffffa2dd18117b28 EFLAGS: 00010297
[ +0,000001] RAX: 0000000000000010 RBX: 0000000000000070 RCX: 000000000000001c
[ +0,000001] RDX: 000000000000005c RSI: 0000000000000014 RDI: ffff8b96c0856650
[ +0,000000] RBP: ffffa2dd18117b40 R08: ffff8b97d1f82e00 R09: 0000000000000000
[ +0,000001] R10: 1d1cc51b00000000 R11: ffff8b97cf00e520 R12: ffff8b96c0856650
[ +0,000001] R13: ffffa2dd18117bf0 R14: 0000000000000058 R15: ffffffffc0eb8920
[ +0,000001] FS: 0000000000000000(0000) GS:ffff8b97de700000(0000) knlGS:0000000000000000
[ +0,000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0,000000] CR2: 0000191e985bac88 CR3: 0000000fd656c000 CR4: 0000000000340ee0

.

With a little compile-stress-test, I have tested the following kernels which seem to run fine:
 * 4.15.0-69
 * 4.15.0-70
 * 4.15.0-72
 * 5.0.0-32 (current daily driver, runs without a hassle, max test length 2d 4h 33m - I am writing this bug report on it)

But the following kernels do not run stable:
 * 5.0.0-35 (second stack-trace from above)
 * 5.0.0-37 (fist stack-trace from above, as you can see 49s after boot will already throw the error)
 * 5.3.0-24

$ lspci | grep -i ether
06:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03

$ mount | grep filer
filer:/ on /share type nfs4 (rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=192.168.3.55,local_lock=none,addr=192.168.2.33)
filer:/home/michael on /share/home/michael type nfs4 (rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=192.168.3.55,local_lock=none,addr=192.168.2.33)

$ cat /etc/fstab | grep -i filer
filer:/ /share/ nfs4 nfsvers=4,sec=krb5i,rw,x-systemd.automount,soft,intr,tcp,noatime 0 0

Michael (miwait00)
summary: - invalid opcode xdr_buf_read_netobj on >= 5.0.0-35
+ invalid opcode xdr_buf_read_netobj on > 5.0.0-32
summary: - invalid opcode xdr_buf_read_netobj on > 5.0.0-32
+ invalid opcode xdr_buf_read_netobj
summary: - invalid opcode xdr_buf_read_netobj
+ invalid opcode xdr_buf_read_netobj on nfs4+krb5i directory
description: updated
description: updated
description: updated
description: updated
description: updated
Michael (miwait00)
description: updated
description: updated
description: updated
Michael (miwait00)
description: updated
Michael (miwait00)
description: updated
Revision history for this message
Michael (miwait00) wrote :

I am totally noob at this, looking at the source code (on this random website) line for the stack trace
https://elixir.bootlin.com/linux/v5.0/source/net/sunrpc/xdr.c#L434 there is a BUG_ON macro(?) up until kernel 5.4 - and has then been rewritten in kernel 5.5 https://elixir.bootlin.com/linux/v5.5-rc1/source/net/sunrpc/xdr.c#L447

Revision history for this message
Michael (miwait00) wrote :
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

The commit that fixed by e8d80b321 (SUNRPC: Fix another issue with MIC buffer space) was applied to 5.0 kernel via stable update process (bug 1848367), it's commit SHA1 in Disco tree is 035b82cb [1]

So if we look into the Disco tree, you can see that this was included since 5.0.0-33.35, and for 5.3 kernel, this fix was applied recently (not released yet).
this matches your compile test result.

git tag --contains 035b82cb0a809741b22aebb57d55a61559e6355a
Ubuntu-5.0.0-33.35
Ubuntu-5.0.0-34.36
Ubuntu-5.0.0-35.38
Ubuntu-5.0.0-36.39
Ubuntu-5.0.0-37.40
Ubuntu-5.0.0-38.41
Ubuntu-raspi2-5.0.0-1021.21
Ubuntu-raspi2-5.0.0-1023.24
Ubuntu-raspi2-5.0.0-1024.25
Ubuntu-snapdragon-5.0.0-1025.26
Ubuntu-snapdragon-5.0.0-1027.29
Ubuntu-snapdragon-5.0.0-1028.30

[1] https://kernel.ubuntu.com/git/ubuntu/ubuntu-disco.git/commit/?id=035b82cb0a809741b22aebb57d55a61559e6355a

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Can you help us to verify this 5.0 Disco kernel, which was built with e8d80b321 (SUNRPC: Fix another issue with MIC buffer space):
https://people.canonical.com/~phlin/kernel/lp-1858832-sunrpc-bufferhandling/

Thanks

Changed in linux-hwe (Ubuntu):
status: New → Incomplete
Revision history for this message
Michael (miwait00) wrote :

I ran my compile-stress-test 12 times without seeing any strack-traces in dmesg on the Kernel you linked. On "unstable" Kernels, the first - or second in some rare cases - run provoked the bug report.

I'll be running this Kernel for today and report back in a few hours or as soon as a bug is being reported.

Thanks :)

Revision history for this message
Michael (miwait00) wrote :

Running 9h 50m without any issues yet :)

Revision history for this message
Michael (miwait00) wrote :

Running for 36 hours straight without issues :)

System is now going to reboot and there isn't going to be a new "uptime record" anytime soon...

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Thanks for testing!

I will SRU this to our Disco kernel.
https://lists.ubuntu.com/archives/kernel-team/2020-January/106822.html

affects: linux-hwe (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Changed in linux (Ubuntu Disco):
status: New → In Progress
assignee: nobody → Po-Hsu Lin (cypressyew)
tags: added: disco
description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu Disco):
importance: Undecided → Medium
Changed in linux (Ubuntu Disco):
status: In Progress → Fix Committed
Revision history for this message
Michael (miwait00) wrote :
Download full text (54.6 KiB)

Having similar/same issues with Kernel 5.3. I was observing 5.3 since this patch was committed - because I guessed it'll be fixed there at some point eventually. I just checked 5.3.0-51-generic and it does not pass my mini-stress test (compiling something on a nfs share):

May 2 20:26:55 mwpc55 kernel: [ 71.492014] BUG: unable to handle page fault for address: 000000005c05ee77
May 2 20:26:55 mwpc55 kernel: [ 71.492017] #PF: supervisor read access in kernel mode
May 2 20:26:55 mwpc55 kernel: [ 71.492018] #PF: error_code(0x0000) - not-present page
May 2 20:26:55 mwpc55 kernel: [ 71.492019] PGD f3b7df067 P4D f3b7df067 PUD 0
May 2 20:26:55 mwpc55 kernel: [ 71.492022] Oops: 0000 [#1] SMP NOPTI
May 2 20:26:55 mwpc55 kernel: [ 71.492026] CPU: 5 PID: 2995 Comm: kworker/u64:17 Tainted: P OE 5.3.0-51-generic #44~18.04.2-Ubuntu
May 2 20:26:55 mwpc55 kernel: [ 71.492027] Hardware name: System manufacturer System Product Name/ROG CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
May 2 20:26:55 mwpc55 kernel: [ 71.492040] Workqueue: rpciod rpc_async_schedule [sunrpc]
May 2 20:26:55 mwpc55 kernel: [ 71.492045] RIP: 0010:kmem_cache_alloc+0x85/0x220
May 2 20:26:55 mwpc55 kernel: [ 71.492047] Code: 65 49 8b 50 08 65 4c 03 05 f0 42 f7 47 4d 8b 30 4d 85 f6 0f 84 5d 01 00 00 41 8b 5f 20 49 8b 3f 48 8d 4a 01 4c 89 f0 4c 01 f3 <48> 33 1b 49 33 9f 70 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74 bd
May 2 20:26:55 mwpc55 kernel: [ 71.492048] RSP: 0018:ffff9aa8c2ce7cd8 EFLAGS: 00010206
May 2 20:26:55 mwpc55 kernel: [ 71.492050] RAX: 000000005c05ee77 RBX: 000000005c05ee77 RCX: 00000000000001ba
May 2 20:26:55 mwpc55 kernel: [ 71.492051] RDX: 00000000000001b9 RSI: 0000000000092800 RDI: 0000000000031ed0
May 2 20:26:55 mwpc55 kernel: [ 71.492052] RBP: ffff9aa8c2ce7d08 R08: ffff8b709e771ed0 R09: ffff8b7098006d80
May 2 20:26:55 mwpc55 kernel: [ 71.492053] R10: ffff9aa8c2ce7dd0 R11: ffffffffc0d579f0 R12: 0000000000092800
May 2 20:26:55 mwpc55 kernel: [ 71.492054] R13: ffff8b7096a0ad80 R14: 000000005c05ee77 R15: ffff8b7096a0ad80
May 2 20:26:55 mwpc55 kernel: [ 71.492056] FS: 0000000000000000(0000) GS:ffff8b709e740000(0000) knlGS:0000000000000000
May 2 20:26:55 mwpc55 kernel: [ 71.492057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 2 20:26:55 mwpc55 kernel: [ 71.492058] CR2: 000000005c05ee77 CR3: 0000000f01398000 CR4: 0000000000340ee0
May 2 20:26:55 mwpc55 kernel: [ 71.492059] Call Trace:
May 2 20:26:55 mwpc55 kernel: [ 71.492063] ? mempool_alloc_slab+0x15/0x20
May 2 20:26:55 mwpc55 kernel: [ 71.492066] ? wait_woken+0x80/0x80
May 2 20:26:55 mwpc55 kernel: [ 71.492067] mempool_alloc_slab+0x15/0x20
May 2 20:26:55 mwpc55 kernel: [ 71.492068] mempool_alloc+0x71/0x190
May 2 20:26:55 mwpc55 kernel: [ 71.492078] rpc_malloc+0x9d/0xd0 [sunrpc]
May 2 20:26:55 mwpc55 kernel: [ 71.492089] call_allocate+0xbb/0x1f0 [sunrpc]
May 2 20:26:55 mwpc55 kernel: [ 71.492098] ? call_refreshresult+0x140/0x140 [sunrpc]
May 2 20:26:55 mwpc55 kernel: [ 71.492106] ? rpc_exit+0x30/0x30 [sunrpc]
May 2 20:26:55 mwpc55 kernel: [ 71.492114] __rpc_execute+0x8a/0x420 [sunrpc]
May 2 20:26:55 mwpc...

Revision history for this message
Michael (miwait00) wrote :
Revision history for this message
Michael (miwait00) wrote :

Same issue with Ubuntu 20.04 and kernel 5.4.0-28-generic

Revision history for this message
Michael (miwait00) wrote :

Created a new bug report for this separate issue

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1876567

Steve Langasek (vorlon)
Changed in linux (Ubuntu Disco):
status: Fix Committed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.